A Short-Term Building Load Prediction Method Based on Modal Decomposition and Deep Learning

Lu, Shengze; Yu, Dandan; Ding, Yan; Chen, Wanyue; Liang, Chuanzhi; Yuan, Jihui; Tian, Zhe; Lu, Yakai

doi:10.3390/buildings15244455

Open AccessArticle

A Short-Term Building Load Prediction Method Based on Modal Decomposition and Deep Learning

by

Shengze Lu

¹

,

Dandan Yu

¹,

Yan Ding

^1,2,*,

Wanyue Chen

³,

Chuanzhi Liang

⁴,

Jihui Yuan

⁵

,

Zhe Tian

^1,2 and

Yakai Lu

⁶

¹

School of Environmental Science and Engineering, Tianjin University, Tianjin 300350, China

²

Tianjin Key Laboratory of Built Environment and Energy Application, Tianjin University, Tianjin 300350, China

³

School of Mechanical Engineering, Tianjin University of Commerce, Tianjin 300134, China

⁴

Center of Science and Technology & Industrialization Development, Ministry of Housing and Urban-Rural Development, Beijing 100835, China

⁵

Department of Living Environment Design, Graduate School of Human Life and Ecology, Osaka Metropolitan University, Osaka 558-8585, Japan

⁶

School of Energy and Environmental Engineering, Hebei University of Technology, Tianjin 300401, China

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(24), 4455; https://doi.org/10.3390/buildings15244455

Submission received: 13 November 2025 / Revised: 30 November 2025 / Accepted: 6 December 2025 / Published: 10 December 2025

(This article belongs to the Special Issue Green Materials and Energy-Efficient Solutions for the Built Environment)

Download

Browse Figures

Versions Notes

Abstract

Accurate cooling load prediction is essential for energy-efficient HVAC system operation. However, the stochastic and nonlinear nature of load data challenges conventional neural networks, causing prediction delays and errors. To address this, a novel hybrid model is developed. The approach first applies a two-stage decomposition (CEEMDAN with K-means and VMD) to process complex cooling load data. Then, a CNN-BiLSTM network optimized by the Crested Porcupine Optimizer and integrated with an attention mechanism is constructed for prediction. Experimental results demonstrate the model’s high performance, achieving a 96.75% prediction accuracy with a MAPE of 3.25% and an R² of 0.9929. The proposed model shows strong robustness and generalization, providing a reliable reference for intelligent building energy management.

Keywords:

load prediction; two-stage decomposition; non-stationary data; data-driven model; hybrid algorithm

1. Introduction

As a core contributor to global energy consumption and carbon emissions, the building sector has rendered energy conservation and carbon reduction in related fields a key issue for sustainable development [1] while simultaneously undergoing a transformative shift from being merely energy-intensive to becoming smart and responsive [2,3]. This paradigm shift is fueled by advancements in data processing, storage, and transmission, which now enable the design and deployment of sophisticated control systems [4]. Among these, predictive control stands out by optimizing Heating, Ventilating and Air-conditioning (HVAC) system operation based on forecasts of future conditions. It is this critical dependence on accurate and anticipatory energy demand forecasts that elevates building load prediction from a supportive tool to a cornerstone technology for improving energy system performance and achieving significant energy efficiency [5].

Broadly, building load prediction methods fall into two categories: physics-based modeling and data-driven modeling [6]. Physics-based approaches incorporate detailed building envelope and system parameters to simulate building loads. Although these methods can represent physical processes with high fidelity, they require extensive, labor-intensive data collection, and inaccuracies in input parameters can lead to significant discrepancies between simulated and actual performance. Data-driven approaches avoid such dependencies by learning the mapping between operational and environmental variables and load values directly from historical data, using algorithms such as Artificial Neural Networks (ANN), Extreme Learning Machines (ELM), Regression Trees (RT), and Support Vector Machines (SVM). Representative work includes Hu et al. [7], who applied ANN to predict the thermal load of a zonally divided office building in Tianjin, enabling accurate room-level temperature estimation, and Guo et al. [8], who compared Back Propagation Neural Network (BPNN), SVM, and ELM for short-term load forecasting, showing superior performance for ELM when incorporating thermal response time as input.

Compared to shallow architectures, deep learning models capture more complex, non-linear relationships between features, enabling end-to-end learning without manual feature engineering [9]. LSTM networks, owing to their ability to handle long-sequence data, are especially valuable in load forecasting. Wang et al. [10] used LSTM to estimate internal heat gains in U.S. office buildings, identifying miscellaneous electrical loads as the most influential variable, and later compared shallow learning with deep learning, concluding that LSTM excels in short-term forecasting, whereas Extreme Gradient Boosting (XGBoost) performs better for long-term tasks.

Recent efforts increasingly combine complementary neural network architectures into hybrid frameworks. An et al. [11] employed a hybrid modeling framework that incorporated data processing and model optimization procedures with BP and ELMAN networks, resulting in accurate short-term residential load predictions. Li et al. [12] integrated Convolutional Neural Networks (CNN) with Gated Recurrent Units (GRU), adopting transfer learning to address small dataset scenarios. Bui et al. [13] proposed an electromagnetism-inspired Firefly Algorithm–ANN hybrid for thermal load prediction, assisting energy-efficient building design.

Alongside these developments, a wave of newer studies adopt more sophisticated feature engineering, decomposition techniques, and metaheuristic optimization. Feng et al. [14] demonstrated that CNN-based spatial feature extraction, BiLSTM temporal modeling, and multi-head self-attention, optimized via an improved Rime algorithm, substantially enhanced cooling load accuracy in public buildings. Kong et al. [15] addressed non-stationary loads from cascaded phase-change material buildings using SSA–VMD–PCA to preprocess data, thereby reducing prediction error by nearly 60%. Guo et al. [16] improved Artificial Rabbit Optimization with adaptive crossover to fine-tune LSTM hyperparameters, yielding performance gains in dynamically varying heat load scenarios. Song et al. [17] integrated gated multi-head temporal convolution with improved BiLSTM to robustly capture multi-scale temporal patterns, lowering prediction error over diverse forecasting horizons. Lu et al. [18] further showed how attention mechanisms coupled with time representation learning and XGBoost correction can improve both accuracy and robustness under varying occupancy schedules.

Other contributions emphasize interpretability, such as Zhang et al., [19] who combined clustering decision trees with adaptive multiple linear regression, matching black-box model accuracy while revealing that historical cooling loads and outdoor temperature are the most critical predictors. Salami et al. [20] leveraged Bayesian metaheuristic optimization in explainable tree-based models, achieving high alignment between design parameters and load outcomes. Additionally, Fouladfar et al. [21] developed adaptive ANN models retrained daily for residential thermal load prediction, enhancing flexibility and responsiveness. However, intelligent optimization algorithms often suffer from limitations such as slow convergence, premature convergence to local optima, and poor balance between global and local search—all affecting prediction accuracy. Therefore, integrating more advanced optimization algorithms with neural networks is essential to improve model accuracy and training speed. In the context of building energy consumption, load forecasting is generally classified into short-term and long-term horizons. Short-term load forecasting, typically spanning from minutes to several days, is particularly valuable for supporting real-time control of HVAC systems.

Accurate prediction of complex, dynamic building cooling loads remains a critical challenge in building energy efficiency. While data-driven models have advanced development, they still struggle with the stochastic, highly nonlinear, and non-smooth nature of cooling load data, leading to inadequate accuracy and robustness. To address these limitations, this study develops a high-accuracy hybrid model tailored to non-stationary and complex load characteristics. The proposed model provides more reliable support for intelligent control, real-time monitoring, and efficient management of HVAC systems. The shortcomings of existing research, along with the key contributions and innovations of this work, are summarized below:

(1) Limitations of Existing Research:

(a) Inadequate handling of non-stationary and complex cooling load data: Conventional single neural network models often struggle with the stochastic fluctuations, nonlinear variations, and non-stationary characteristics of building cooling load data, leading to compromised prediction accuracy and limited adaptability to complex real-world operational conditions.

(b) Room for improvement in data decomposition methods: Existing modal decomposition techniques may suffer from incomplete decomposition or insufficient processing of high-frequency components. As a result, data preprocessing may fail to adequately reduce data complexity, thereby constraining further improvements in prediction model performance.

(c) Suboptimal efficiency and accuracy in model parameter optimization: Parameter configuration in traditional neural networks often relies on empirical or trial-and-error approaches, which are inefficient and unlikely to yield optimal parameter sets. Meanwhile, intelligent optimization algorithms may exhibit slow convergence or a tendency to become trapped in local optima during the parameter search process.

(d) Lack of comprehensive validation of model robustness and generalizability: Many existing studies focus predominantly on accuracy metrics while paying insufficient attention to validating model robustness and generalization capability under varying operational and data conditions. This undermines the reliability of such models in practical engineering applications.

(2) Contributions and Innovations:

(a) Developed an innovative ICEEMDAN-Kmeans-VMD secondary decomposition framework that sequentially applies ICEEMDAN for initial decomposition, sample entropy-based K-means clustering for component aggregation, and VMD for refining high-frequency components. This multi-stage approach overcomes limitations of single decomposition methods, achieving more thorough data denoising and providing higher-quality inputs for prediction.

(b) Constructed a CPO-CNN-BiLSTM-Attention hybrid model that effectively combines CNN’s local feature extraction, BiLSTM’s long-term dependency capture, and Attention’s focus on critical temporal information. This integration enhances the model’s capability to learn complex cooling load patterns.

(c) Introduced the Crested Porcupine Optimizer for automated hyperparameter tuning, addressing the challenge of manual parameter adjustment in deep hybrid models. CPO demonstrates faster convergence, superior optimization results, and higher efficiency compared to traditional methods, significantly improving model performance.

(d) Established comprehensive validation through case studies using real building data with multiple metrics and benchmark comparisons. Extensive analysis across decomposition effectiveness, feature engineering impact, optimization performance, and model robustness confirms the superiority of the proposed approach in accuracy, robustness, and generalization capability.

2. Methodology

The proposed short-term building load prediction method integrating modal decomposition and deep learning is presented in Figure 1.

2.1. Data Processing

2.1.1. Primary Decomposition of Data

To initially reduce the nonlinearity and non-stationarity of the original load time series, the study employs the Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN) algorithm to decompose the raw load data. As an enhanced Empirical Mode Decomposition (EMD) technique, ICEEMDAN effectively addresses mode mixing and residual noise issues present in conventional EMD and Ensemble EMD (EEMD) methods by incorporating adaptive noise during each decomposition stage and utilizing local mean values for signal processing. This enables more accurate decomposition of complex signals into a series of Intrinsic Mode Functions (IMFs) and one residual component. The ICEEMDAN procedure can be outlined as follows:

(1) Initialization: Let the residual signal be r₀(t) = X(t), where X(t) is the original load time series. Set the iteration count i = 1.

(2) First-stage decomposition: Add white noise n^(j)(t) with zero mean and standard deviation ϵ_i to the residual signal r_i−1(t), obtaining the noise-added signal:

X^{(j)} (t) = r_{i - 1} (t) + \in_{i} n^{(j)} (t), j = 1, 2, ..., N

(1)

where N is the number of noise ensembles and ϵ_i specifies the noise amplitude for the i-th step.

Perform EMD decomposition on the noise-added signal X^(j)(t) and extract the first IMF component. Calculate the average of the first IMF components obtained from all noise-added signal decompositions, which serves as the first IMF component IMF₁(t) of the ICEEMDAN algorithm:

I M F_{1} (t) = \frac{1}{N} \sum_{j = 1}^{N} {\tilde{I M F}}_{1}^{(j)} (t)

(2)

Compute the first residual signal r₁(t)

r_{1} (t) = r_{0} (t) - I M F_{1} (t)

(3)

(3) Subsequent-stage decomposition: Use r_i−1(t) as the input, repeat the preceding step to obtain IMF_i(t) and r_i(t). Continue until r_m(t) becomes monotonic or satisfies the termination criteria.

(4) Final decomposition result: The original series X(t) is expressed as

X (t) = \sum_{i = 1}^{m} I M F_{i} (t) - R (t)

(4)

where m is the number of IMFs produced, and R(t) is the residual term representing the long-term trend or low-frequency content of the load signal.

2.1.2. Data Component Aggregation

To streamline the decomposition results and extract load subsequences of varying frequencies, the IMF components from ICEEMDAN decomposition are aggregated using K-means clustering. This unsupervised algorithm partitions components into K clusters by minimizing within-cluster variance. Each IMF is characterized by its Sample Entropy (SampEn), where higher values indicate more complex, high-frequency components while lower values correspond to more predictable, low-frequency patterns.

(1) Sample entropy calculation

For a time series {x(t)}_t₌₁^L of length L, the procedure is as follows:

a. Parameters: Set embedding dimension m and tolerance r.

b. Vector formation: For t = 1, 2, ..., L − m + 1, form m-dimensional vectors X_m(t) = [x(t), x(t + 1), ..., x(t + m − 1)].

c. Distance calculation: Compute the chebyshev distance d[X_m(i), X_m(j)] between any two vectors X_m(i) and X_m(j):

d [X_{m} (i), X_{m} (j)] = \max_{k = 0, 1, ..., m - 1} |x (i + k) - x (j + k)|

(5)

d. Similarity count: For each i, count the number of vectors within tolerance r, denoted as B_i. Define B_m(r) as the average of B_i/(L − m).

B^{m} (r) = \frac{1}{L - m} \sum_{i = 1}^{L - m} \frac{B_{i}}{L - m}

(6)

e. Dimensional increase: Repeat steps b–d for dimension m + 1 to obtain A^m+1(r).

f. Sample entropy: Calculate as

S a m p E n (m, r, L) = - \ln (\frac{A^{m + 1} (r)}{B^{m} (r)})

(7)

(2) K-means clustering procedure

a. Compute sampEn: Calculate the Sample Entropy of each IMF.

b. Cluster specification: Set cluster number K = 3 (high-, medium-, and low-frequency).

c. Center initialization: Randomly select three SampEn values as initial centroids.

d. Cluster assignment: Assign each IMF to the nearest centroid based on SampEn distance.

e. Centroid update: Recalculate centroids as the mean SampEn of assigned IMFs.

f. Iteration: Repeat steps d–e until convergence or maximum iterations.

g. Component aggregation: Sum IMFs within each cluster to form three subsequences.

2.1.3. Secondary Decomposition of Data

To mitigate the complexity inherent in the high-frequency IMF subsequences, a secondary analysis was carried out through Variational Mode Decomposition (VMD). VMD is a variational principle-based approach capable of adaptively separating a signal into a fixed number of modes, each characterized by a distinct center frequency and finite bandwidth. In this framework, the goal is to partition the original signal into K sub-components (VMD-IMFs), with each mode localized around its own central frequency. This partitioning process is formulated as the following constrained variational optimization problem:

S a m p E n (m, r, L) = - \ln (\frac{A^{m + 1} (r)}{B^{m} (r)})

(8)

s . t . \sum_{k = 1}^{K} {∥ u_{k} (t) ∥}_{2}^{2} = I M F_{H F} (t)

(9)

where {u_k} = {u₁, u₂, ..., u_K} denotes the extracted VMD-IMFs, and {ω_k} = {ω₁, ω₂, ..., ω_K} represents the central frequencies for each mode. The term δ(t) is the Dirac delta function, ∗ stands for convolution, ∂_t refers to differentiation with respect to time t, and ∇ is the gradient operator, α is the penalty factor balancing data fidelity and mode smoothness, ||·|| represents the squared L2 norm.

Based on this methodology, an ICEEMDAN-Kmeans-VMD secondary decomposition framework was developed, as depicted in Figure 2. The process begins with initial noise reduction through ICEEMDAN decomposition, which generates multi-frequency IMF components. It then employs sample entropy-based dynamic clustering to identify high-frequency subsequences, and finally applies secondary VMD decomposition to these components to reduce their complexity. This dual-stage decomposition approach with dynamic clustering aims at smoothing input uncertainty while enhancing high-frequency processing capability, thereby significantly optimizing computational efficiency.

2.2. CPO-CNN-BiLSTM-Attention Model Establishment

2.2.1. Convolutional Neural Network

A Convolutional Neural Network (CNN) is a deep feedforward architecture designed to process grid-structured data through convolution-based feature extraction, as shown in Figure 3. CNNs form the cornerstone of modern computer vision and underpin numerous advanced frameworks, such as Generative Adversarial Networks (GANs). Typically, a CNN comprises three main components: convolutional layers for hierarchical feature representation, pooling layers for dimensionality reduction and spatial abstraction, and fully connected layers that convert extracted features into task-specific outputs. This hierarchical design effectively suppresses redundancy while preserving robust generalization under complex and large-scale data environments [22]. Although CNN have been extensively applied in the computer vision domain, their local feature extraction capability is equally applicable to one-dimensional time series such as building cooling loads [23]. Short-term cooling load variations often contain localized, high-frequency patterns induced by rapid outdoor environmental changes, adjustments in occupant schedules, or sudden increases in equipment usage. By sliding convolution kernels along the temporal axis, CNN can efficiently identify these local temporal-dependency features, enabling the model to capture and respond to transient peaks, troughs, and short-term fluctuations more effectively than purely recurrent architectures [24].

During the forward propagation, information flows sequentially from the input to the output through a chain of weighted linear operations followed by non-linear activations. The convolutional process for the i-th layer can be expressed as [25]

a^{i} = f (z^{i}) = f (W^{i} * a^{i - 1} + b^{i})

(10)

where Wⁱ denotes the convolution kernel (weight tensor), * indicates the convolution operator, aⁱ⁻¹ is the output from the (i − 1)-th layer, bⁱ is the bias term, zⁱ denotes the pre-activation linear output, f is the nonlinear activation function, and aⁱ represents the activated output of the i-th layer.

After the forward pass, backpropagation computes gradients to adjust network parameters by minimizing the difference between model predictions and actual labels. The residual (error term) at the output layer equals the partial derivative of the loss function L with respect to the output z_i:

δ^{i} = \frac{\partial L}{\partial z^{i}} = \frac{\partial L}{\partial a^{i}} \frac{\partial a^{i}}{\partial z^{i}} = \frac{\partial L}{\partial a^{i}} \frac{\partial f (z^{i})}{\partial z^{i}} = \frac{\partial L}{\partial a^{i}} f^{'} (z^{i})

(11)

where L quantifies the discrepancy between predicted outputs and ground truth.

Given the residual δⁱ of one convolutional layer, the residual δⁱ⁻¹ propagated to the preceding hidden layer (which may be convolutional or pooling) is expressed as

δ^{i - 1} = δ^{i} * r o t 180 (W^{i}) ° f^{'} (z^{i - 1})

(12)

where rot180 indicates a 180° rotation of the convolutional kernel, ° signifies the Hadamard (element-wise) product.

Once the residual δⁱ is obtained for any convolutional layer, the corresponding weights and biases are updated through the gradient descent mechanism as follows:

W^{i} = W^{i} - α \sum_{n = 1}^{m} δ^{i} * r o t 180 (a^{i - 1})

(13)

b^{i} = b^{i} - α \sum_{n = 1}^{m} \sum_{u, v} {(δ^{i})}_{u, v}

(14)

where u and v represent the spatial dimensions of the residual matrix, α is the learning rate, and m denotes the number of training samples.

2.2.2. Bidirectional Long Short-Term Memory Network

The Bidirectional Long Short-Term Memory (BiLSTM) network extends the capabilities of the conventional Long Short-Term Memory (LSTM) architecture to improve modeling of long-range temporal dependencies. Unlike a standard unidirectional LSTM, BiLSTM processes sequence data in both forward and backward directions, enabling the network to incorporate context from preceding and subsequent time steps simultaneously.

An LSTM unit comprises a memory cell and a set of gating mechanisms, namely the forget gate, input gate, and output gate [26], as depicted in Figure 4. These gates regulate the flow of information entering, leaving, or being preserved within the cell state, allowing selective retention of relevant features while discarding unnecessary ones. This design addresses limitations found in traditional RNNs—most notably the vanishing gradient issue—by maintaining a stable pathway for long-term information propagation. The cell state functions as a persistent memory vector that is updated at each time step through the combined action of the three gates. The forward computation within an LSTM cell is defined by the following set of equations [27]:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(15)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(16)

{\tilde{C}}_{t} = σ (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(17)

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

(18)

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(19)

h_{t} = o_{t} \cdot \tanh (C_{t})

(20)

where i_t, f_t, and o_t represent the input, forget, and output gates, respectively, σ denotes the sigmoid (logistic) activation function, tanh is the hyperbolic tangent activation function, C_t₋₁ and C_t are the internal states of the cell at time t − 1 and t, ℎ_t is the hidden state at time t, W_i, W_f, W_o, and W_c are weight matrices for the input gate, forget gate, output gate, and cell state, respectively, b_i, b_f, b_o, and b_c are bias vectors for the corresponding gates and cell state, * indicates element-wise multiplication.

2.2.3. Attention Mechanism

The Attention Mechanism is an essential concept in deep learning that enables neural networks to dynamically emphasize more relevant information within complex input sequences, similar to the way humans concentrate on salient details when perceiving large amounts of data. Rather than treating each input element equally, the attention framework assigns learnable coefficients—often termed attention weights or alignment scores—to quantify the relative importance of individual components in the input. Through this adaptive weighting, the model selectively enhances informative features while diminishing redundant or irrelevant signals. A general formulation of the attention operation can be represented in three main stages:

(1) Score generation: The relationship between a query and a collection of keys is assessed using a similarity or relevance function as

e_{t} = \tanh (W_{h} \cdot h_{t} + b_{h})

(21)

(2) Weight normalization: The raw scores are transformed into a probability distribution through a softmax activation, yielding normalized attention coefficients for every key:

α_{t} = \frac{\exp (e_{t})}{\sum_{t = 1}^{T} \exp (e_{t})}

(22)

(3) Context aggregation: The attention output is obtained by performing a weighted combination of the corresponding value vectors, resulting in the final context representation:

c = \sum_{t = 1}^{T} α_{t} \cdot h_{t}

(23)

where e_t denotes the attention score, α_t is the normalized attention weight, c is the aggregated context vector.

While CNNs are powerful for extracting regional features and local patterns from time-series inputs, their capability to capture extended temporal dependencies remains limited. In contrast, the BiLSTM network effectively models sequential relationships by incorporating both past and future contextual information through forward and backward propagation. To capitalize on their complementary advantages, a CNN–BiLSTM hybrid architecture is employed, where CNN layers handle spatial or short-term feature extraction, and BiLSTM layers refine temporal correlation learning. However, in scenarios involving high-dimensional or large-volume input sequences, the hybrid CNN–BiLSTM may underrepresent crucial information at certain moments, thereby weakening its learning efficiency and predictive accuracy. Integrating the Attention Mechanism into this framework resolves this limitation by adaptively quantifying the contribution of different time steps to the final prediction. Consequently, the model enhances interpretability and improves overall forecasting performance for complex time-dependent data.

As shown in Figure 5, the CNN-BiLSTM-Attention model processes input features through a structured pipeline: the CNN layer extracts and condenses local patterns via convolutional and pooling operations; the BiLSTM layer then captures long-term temporal dependencies using forward and backward LSTM units; subsequently, the Attention layer dynamically weights the significance of different time steps; and finally, the output layer synthesizes these refined representations to generate accurate predictions.

2.2.4. Crested Porcupine Optimizer

The Crested Porcupine Optimizer (CPO) is a recently proposed metaheuristic method that abstracts characteristic behaviors of crested porcupines, including adaptive foraging, defensive strategies, and social cooperation, into computational search rules. Within this framework, the optimization problem’s search space is analogized to the porcupines’ activity environment, and each candidate solution corresponds to an individual agent. In a D-dimensional search domain, the position of the i-th individual at iteration t is expressed as X_i^t = (x_i1^t, x_i2^t, ···, x_iD^t).

(1) Exploration phase

During the exploration of food resources, porcupines perform random long-distance movements to expand the search range. Position updating in this stage follows:

X_{i}^{t + 1} = X_{i}^{t} + α \times β \times (X_{b e s t}^{t} - X_{i}^{t})

(24)

where α is a parameter controlling step size, typically within (0, 1); β is a random variable sampled from the normal distribution N(0, 1); and

X_{b e s t}^{t}

represents the global best solution found up to the t-th iteration.

(2) Exploitation phase

When a porcupine identifies a potentially resource-rich area, it performs a local search to locate the food more precisely. The position update formula during the exploitation phase is

X_{i}^{t + 1} = X_{b e s t}^{t} + γ \times (X_{j}^{t} - X_{k}^{t})

(25)

where γ is a random number within (0, 1) and

X_{j}^{t}

, and

X_{k}^{t}

are the coordinates of two randomly chosen distinct agents from the current population.

(3) Collective cooperation mechanism

In CPO, a collective cooperation mechanism is introduced to enhance search capability. When a porcupine discovers a better position X_new, it signals nearby companions, prompting them to move toward this position. The position update formula for other porcupine individuals is

X_{i}^{t + 1} = (1 - δ) X_{b e s t}^{t} + δ X_{n e w}

(26)

where δ is a weight coefficient within (0, 1), balancing the influence of the individual’s current position and the newly discovered position.

2.2.5. Hybrid Prediction Model Establishment

Building on these strengths—CNN’s efficiency in local feature extraction, BiLSTM’s ability to model long-term dependencies, and Attention’s focus on influential time steps—this study integrates CPO as an efficient metaheuristic optimizer for hyperparameter tuning. The resulting CPO-CNN-BiLSTM-Attention hybrid model combines multi-stage decomposition to reduce time series complexity with adaptive hyperparameter optimization, enabling accurate prediction of complex temporal patterns. The implementation process, outlined in Figure 6, proceeds as follows:

Step 1: Primary modal decomposition

Apply ICEEMDAN to decompose the input time series into multiple IMF components, effectively handling nonlinear and non-stationary characteristics to reduce initial complexity.

Step 2: IMF grouping via K-means clustering

Calculate the sample entropy of each IMF to quantify complexity and randomness. Cluster all IMFs into high-, medium-, and low-frequency subsequences using K-means for targeted model construction.

Step 3: Secondary modal decomposition

Perform VMD on high-frequency IMFs to refine complex features. Combine the resulting new IMFs with existing medium- and low-frequency components for comprehensive decomposition.

Step 4: Hyperparameter optimization with CPO

Input each IMF into the CPO algorithm to adaptively optimize three key hyperparameters of the CNN-BiLSTM model: L2 regularization coefficient, initial learning rate, and number of hidden units, enhancing prediction performance per component.

Step 5: Sub-model construction

Build individual CNN-BiLSTM-Attention sub-models for each IMF using CPO-optimized hyperparameters. Each sub-model employs CNN for local feature extraction and BiLSTM for long-term dependency capture.

Step 6: Integrated prediction

Generate predictions for each IMF using its corresponding sub-model, then aggregate all results through weighted summation to produce the final prediction of the original time series.

Step 7: Performance evaluation

Conduct comprehensive assessment using metrics such as MAE and CV-RMSE, comparing the hybrid model against benchmark methods to validate its effectiveness and superiority.

3. Case Study

3.1. Case Building

A three-story office building in Tianjin, China, shown in Figure 7a, serves as the case study. The building measures 43 m × 35 m × 12 m (length × width × height), with key parameters listed in Table 1. Weather inputs, including ambient dry-bulb temperature and incident solar radiation, were sourced from the Chinese Standard Weather Data (CSWD).

The operational schedules and occupant behavior settings are critical behavioral characteristics of typical office buildings [28]. The operational schedules and occupant behavior settings are critical behavioral characteristics of typical office buildings. Internal heat gains from occupants, lighting, and equipment were determined based on operational schedules prescribed in the Chinese Design Standard for Energy Efficiency of Public Buildings (GB50189-2015 [29]) for standard office building operation patterns. For typical working days (Monday to Friday), the building operates between 08:00 and 18:00, with reduced lighting and equipment power during the lunch break. On weekends and public holidays, the building remains in a non-operational or low-load state. The designed occupant density is approximately 0.05–0.06 persons/m², with activity levels corresponding to light office work as defined by ASHRAE (metabolic rate ~1.2 met). Recommended equipment and lighting power densities are 10–15 W/m² and 8–12 W/m², respectively. These operational schedules directly affect occupant presence during working and non-working days, equipment usage frequency, and variations in internal heat gains; they were converted into building operation calendars for EnergyPlus simulation inputs. Cooling load data from June 1 to September 1 were simulated, with a time step of one hour, to build a dataset for developing and validating the CPO-CNN-BiLSTM-Attention prediction model. The dataset contains a total of 2208 records, with the building cooling load shown in Figure 7b, solar radiation variation in Figure 7c, and outdoor temperature together with relative humidity variation in Figure 7d.

3.2. Data Preprocessing

The cooling load dataset encompasses several categories of explanatory variables, including temporal descriptors, outdoor environmental factors, and the cooling load from the previous time step. In building load prediction, the current load is influenced not only by instantaneous meteorological and operational parameters but also by the pronounced thermal inertia of the building envelope and indoor air. Particularly over short prediction horizons, cooling load exhibits strong autocorrelation, while meteorological variables—such as solar radiation—demonstrate notable temporal continuity. Incorporating lagged features of cooling load and solar radiation from the previous time step into the model allows for more effective capture of these temporal dependencies, thereby enhancing prediction accuracy.

During feature engineering, categorical variables such as time type and weekday type were encoded into numerical values to facilitate model training. Specifically, the time type was sequentially assigned integers from 0 to 23 representing each hour of the day; the weekday type was assigned integers from 1 (Monday) to 7 (Sunday), and public holidays were uniformly encoded as 0 to distinguish their operational patterns from regular weekdays. Outdoor environmental parameters (temperature, humidity, and solar irradiance) retained their original numeric form, but missing entries were removed and gaps filled via linear interpolation. Lagged features for cooling load and solar radiation were constructed by shifting the dataset backward by one hour, appending these lagged values as independent predictors.

A Pearson correlation analysis was then performed on all collected feature parameters, with the results shown in Figure 8. Based on the correlation analysis, a threshold of ±0.3 was adopted for feature selection. The parameters ultimately chosen as model inputs include time type, weekday type, outdoor temperature, outdoor humidity, cooling load at the previous time step, and solar radiation at the previous time step, as detailed in Table 2.

To address the adverse effects of different feature scales on training speed, convergence stability, and predictive accuracy, all numerical features were first checked to ensure the absence of anomalies before normalization. Subsequently, a unified min–max scaling was applied to map values to the range [0, 1], as calculated by Equation (27).

{x^{'}}_{j} = \frac{x_{j} - x_{\min}}{x_{\max} - x_{\min}}

(27)

where x_j and x_j′ represent the original and normalized values, respectively, while x_max and x_min refer to the highest and lowest values within the dataset, respectively.

3.3. Evaluation Metrics

Four metrics were employed to evaluate prediction accuracy. The calculations are defined as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(28)

MAE = \frac{1}{n} \sum_{i = 1}^{n} |x_{i} - y_{i}|

(29)

MAPE = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - x_{i}}{y_{i}}|

(30)

CV - RMSE = \frac{n}{\sum_{i = 1}^{n} y_{i}} \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - x_{i})}^{2}}{n}}

(31)

where n represents the sample size, x_i denotes the i-th simulated value (kW), y_i denotes the i-th measured value (kW),

\bar{x}

represents the mean of simulated values (kW),

\bar{y}

represents the mean of measured values (kW).

4. Results and Discussion

4.1. Data Decomposition Results

The ICEEMDAN method was employed to process the load profile of the case-study building. The decomposition was performed with a white noise amplitude of 0.2, 50 realizations of noise addition, and 100 iterations, yielding multiple IMF components. As shown in Figure 9, the IMFs exhibit a distinct stratification in both frequency and amplitude. From IMF 1 to IMF 10, the oscillation frequency gradually decreases: IMF 1 to IMF 3 represent high-frequency oscillations, characterized by dense waveforms and large amplitude fluctuations. IMF 4 to IMF 6 show a marked reduction in frequency, with the waveforms transitioning from dense oscillations to relatively smooth fluctuations and a narrower amplitude range. IMF 7 to IMF 9 exhibit even lower-frequency, trend-like oscillations with further extended oscillation periods. Finally, IMF 10 reflects a slowly varying trend component with a relatively stable amplitude range and no apparent oscillations. This hierarchical decomposition demonstrates that the ICEEMDAN algorithm effectively separates the original cooling load signal into components with distinct frequency and amplitude characteristics, thereby laying the groundwork for precise prediction of each individual component in subsequent analysis.

To address the model complexity from excessive ICEEMDAN-generated IMFs, sample entropy analysis with K-means clustering was implemented. This grouped components into three frequency-based Co-IMFs (high, medium, low), effectively reducing the components from 10 to 3 as shown in Figure 10 and significantly cutting training costs.

The high-frequency time series obtained from K-means clustering undergoes secondary modal decomposition via VMD to reduce complexity. As shown in Figure 11, the Co-IMF components are decomposed into three VMD-IMFs. These are combined with the medium- and low-frequency components from the initial ICEEMDAN decomposition to form a new IMF set. Based on the classification criteria, VMD-IMF1 and VMD-IMF3 are categorized as high-frequency components, VMD-IMF4 as medium-frequency, and VMD-IMF5 as low-frequency. From the perspective of HVAC process dynamics, each frequency band corresponds to distinct physical phenomena influencing cooling load variations [30]. High-frequency components are primarily driven by rapid changes in outdoor meteorological conditions, such as solar radiation spikes and wind disturbances, as well as instantaneous internal gains from occupant activities or equipment cycling. Medium-frequency components reflect daily operational cycles dictated by the building schedule and moderate thermal inertia of the envelope. Low-frequency components capture long-term trends arising from substantial envelope thermal inertia, indoor air heat storage, and seasonal climate variations. Subsequent to decomposition, each VMD-IMF is combined with the existing feature variables to construct new component-specific sample datasets, which are later used for feature selection and prediction testing.

4.2. Prediction Results

After undergoing secondary modal decomposition and feature selection, the processed cooling load dataset was partitioned into training (80%) and testing (20%) subsets. Simulations were conducted in MATLAB R2023a to implement and evaluate four models: CPO-CNN, CPO-BiLSTM, CPO-CNN-BiLSTM, and CPO-CNN-BiLSTM-Attention. The prediction performance of the CPO-CNN-BiLSTM-Attention model was thoroughly validated and analyzed. Through extensive testing and adjustment, an optimal parameter set was determined for the case study, as detailed in Table 3. The results in Figure 12 indicate that the CPO-CNN-BiLSTM-Attention model produces forecasts that align closely with the actual load profiles, demonstrating strong predictive accuracy.

Table 4 provides a comparative overview of the evaluation results for the five predictive models. The CPO-CNN-BiLSTM-Attention approach attained a MAPE of 3.2462% (96.7538% accuracy), meeting practical requirements. This configuration consistently surpasses all alternatives in the assessed criteria, yielding MAE, MAPE, and CV-RMSE values of 31.0252, 3.2462%, and 6.7212%, respectively—representing reductions of 7.47–40.39, 2.07–6.82%, and 1.43–7.72% over other models. Its R² value of 0.9929 also exceeds others by 0.0033–0.0256. The attention mechanism contributes a 2.07% accuracy gain, while CPO-optimized CNN and BiLSTM models show marked improvement over their base versions, confirming CPO’s effectiveness in parameter tuning.

4.3. Prediction Performance Analysis

4.3.1. Performance of Modal Decomposition

Figure 13 compares the predictive performance of secondary modal decomposition, primary modal decomposition, and non-decomposition methods using MAPE and CV-RMSE as evaluation metrics. Experimental results demonstrate the significant advantages of the proposed secondary modal decomposition method in both prediction accuracy and stability. Specifically, its MAPE (3.246%) and CV-RMSE (6.721%) are reduced by 6.542% and 6.81%, respectively, compared to primary decomposition, and by 10.879% and 10.751% compared to the non-decomposition approach.

To address the limitations of traditional single-stage decomposition—particularly its inadequate handling of high-frequency components leading to suboptimal prediction accuracy—this study constructs a hierarchical optimization framework of “decomposition-clustering-re-decomposition”. The process begins with initial noise reduction via ICEEMDAN decomposition, generating multi-frequency IMF components. Based on sample entropy dynamic clustering (high-frequency components: entropy >1.5; low-frequency components: entropy <0.8), high-frequency subsequences are selected. These high-frequency components then undergo secondary decomposition using VMD, where adaptive bandwidth adjustment restricts mode mixing error below 0.8%, effectively reducing high-frequency complexity. Experimental evidence confirms that this method, through its dual-stage decomposition and dynamic clustering mechanism, effectively mitigates input uncertainty while enhancing high-frequency processing capability, thereby optimizing computational efficiency.

To quantitatively evaluate the contribution of modal decomposition and its smoothing effect to prediction performance, three experimental schemes were designed using the CNN model as the baseline. Scheme 1: a CNN model without any decomposition; Scheme 2: a CPO–CNN–BiLSTM–Attention model without decomposition; Scheme 3: a CNN model incorporating dual-stage modal decomposition. The results are presented in Table 5. Compared with the baseline CNN model, the deep model without decomposition reduced the MAPE by approximately 7.1% and the CV-RMSE by about 20.0%. In contrast, introducing dual-stage modal decomposition into the simple CNN framework reduced the MAPE by approximately 36.6% and the CV-RMSE by around 55.8%. These findings indicate that dual-stage modal decomposition plays a critical role in reducing the high-frequency complexity of the input signal and enhancing its predictability. Moreover, the improvement is not solely attributed to the smoothing effect; the decomposition also provides more stable and informative input features to the predictive model, thereby further improving forecasting accuracy.

4.3.2. Impact of Feature Engineering and Attention Mechanism on Model Prediction Accuracy

Feature engineering plays a crucial role in improving model prediction accuracy. To analyze its specific impact on load forecasting, a comparative study was designed with two model groups: an experimental group using input features processed by feature engineering—including fine-grained time types (e.g., hour, period), day type (weekday/weekend/holiday), outdoor temperature, outdoor humidity, cooling load at the previous timestep, and solar radiation at the previous timestep—and a control group using raw features without such processing. As shown in Figure 14, the engineered features were filtered and transformed to reduce redundancy and noise. In contrast, the raw feature set, while more comprehensive—including current solar radiation, wind direction, and wind speed in addition to the same temporal and environmental variables—may suffer from multicollinearity and scale variation, potentially impairing model training and prediction.

Meanwhile, the attention mechanism in the CNN-BiLSTM-Attention model also plays a vital role by adaptively highlighting the importance of different timesteps in the sequential data. When handling long and complex time series, it learns varying weights for each step, amplifying relevant information while suppressing less useful inputs. This helps the model capture truly salient temporal patterns and further improves predictive accuracy. Further analysis of the attention weight distribution reveals that several timesteps corresponding to peaks in solar radiation received notably higher weights. This indicates that the model successfully identified the strong coupling between solar gains and building cooling demand [31]. Solar radiation increases the heat load on the building envelope, directly affecting indoor thermal conditions and substantially raising cooling requirements during clear-sky periods. In this study, the close alignment between high-weight timesteps and solar radiation peaks further confirms that solar radiation is a critical physical driver of building load variations.

As summarized in Figure 14, the contributions of feature engineering and attention mechanism to prediction accuracy were quantified. Models trained with feature-engineered inputs showed significant improvement over those using raw features, with accuracy gains ranging from 7.17% to 10.74% and averaging 9.29%. This substantially exceeded the gains from the attention mechanism, which ranged from 2.06% to 6.81% with feature engineering and 2.33% to 3.67% without. These results demonstrate that for building cooling load forecasting, improving input quality through feature engineering—by removing redundancy, reducing noise, and constructing informative features—is essential. It substantially enhances the model’s ability to capture dynamic load patterns and leads to markedly more accurate predictions.

4.3.3. Performance of Different Optimization Algorithms

To evaluate the optimization capability of the Crested Porcupine Optimizer (CPO) algorithm on three key parameters—L2 regularization coefficient, initial learning rate, and number of hidden units—of the CNN and BiLSTM models, convergence iteration count, error metrics, and runtime were used as performance metrics. For benchmarking, conventional optimization approaches including Genetic Algorithm (GA), Particle Swarm Optimization (PSO), and Bayesian Optimization (BO) were implemented under identical experimental conditions. Specifically, the search space for the L2 regularization coefficient was set to [1 × 10⁻⁵, 1 × 10⁻²], the initial learning rate to [1 × 10⁻⁴, 1 × 10⁻²], and the number of hidden units to [32, 256]. The maximum number of iterations was fixed at 100, the population size was kept constant across algorithms, and the same random seed was applied for initialization to ensure reproducibility and fairness of comparison. As summarized in Table 6, CPO achieved superior optimization results with the shortest runtime and faster convergence speed compared to GA, PSO, and BO, demonstrating its effectiveness in tuning parameters for both CNN and BiLSTM. This approach significantly enhances model accuracy and accelerates the training process.

4.3.4. Robustness of the Models

The robustness assessment, illustrated in Figure 15, displays box plots of relative errors across six prediction approaches. Among these, the CPO-CNN-BiLSTM-Attention configuration produces the most compact error spread, characterized by the smallest interquartile distance and minimal presence of outliers. This outcome implies that the model consistently confines prediction deviations within a limited range, underscoring its strong resilience against variability.

To assess the stability of the proposed method under different data-splitting conditions, multiple random splits of the dataset were performed within the original framework of 80% training and 20% testing sets. A total of six independent experiments were conducted. As shown in Table 7, the R² values remained consistently high, ranging from 0.9911 to 0.9929, with an average of 0.9920, indicating strong fitting capability across different splits. The MAE and MAPE metrics were kept within low ranges of 30.6647–34.4826 and 4.2462–5.3559%, with averages of 32.9378 and 4.7431%, respectively. The average CV-RMSE was 7.1494%, with fluctuations across runs not exceeding 0.83%. These results demonstrate that the proposed modal decomposition and deep learning-based short-term building load forecasting method maintains stable and high predictive performance under varying random data splits, exhibiting strong generalization ability and robustness of conclusions.

4.3.5. Configuration of Hyperparameters

The hyperparameters for the CNN and BiLSTM components in the CPO-CNN-BiLSTM-Attention model were optimized following a predefined structure [32]. A sequential tuning approach was applied to four key parameters: convolutional filters (16, 32, 64, 128), kernel number (2, 4, 8, 16), pooling window size (2, 4, 8, 16), and dropout rate (0.05, 0.1, 0.2, 0.3). Initial tests identified kernel number 16 and Dropout rate 0.1 as optimal, which were subsequently fixed. The model was trained and evaluated on each IMF component, with results visualized in Figure 16. This systematic optimization established the final parameter set, enhancing the model’s cooling load prediction capability.

To verify the rationality of parameter selection and the stability of decomposition results, a sensitivity analysis was conducted on three key parameters in the decomposition framework: the white noise amplitude in ICEEMDAN, the number of clusters in K-means, and the preset number of modes in VMD. The relative prediction error of the proposed model was used as the evaluation metric, and the results are presented in Figure 17. In the ICEEMDAN stage, the white noise amplitude was tested at 0.1, 0.2, and 0.3. As shown in Figure 17a, an amplitude of 0.2 yielded a lower median prediction error and reduced dispersion, effectively suppressing mode mixing while preserving the dominant characteristics of the original signal. For K-means clustering, the cluster number K was set to 2, 3, and 4. As illustrated in Figure 17b, K = 3 provided a stable error range and a lower maximum relative error, while reasonably distinguishing high-, medium-, and low-frequency IMF groups, thereby facilitating targeted secondary decomposition and modeling for each frequency band. For the secondary VMD decomposition applied to the high-frequency IMF sequence, the preset number of modes K was varied between 3, 4, and 5. As shown in Figure 17c, K = 3 achieved lower errors and stable decomposition performance, effectively separating high-frequency components without excessively increasing computational complexity. Based on comparative analysis, the optimal parameter configuration in this study was determined as follows: the white noise standard deviation coefficient in ICEEMDAN was set to 0.2; the cluster number in K-means was fixed at K = 3, corresponding to high-, medium-, and low-frequency IMF groups; and the number of modes in the secondary VMD decomposition of high-frequency IMF sequences was set to K = 3, resulting in three VMD-IMF components.

4.3.6. Generalization Analysis of the Model

To evaluate the generalization capability of the proposed prediction model, the dataset was split into three non-overlapping subsets corresponding to 1–30 June, 1–31 July, and 1–31 August. Each subset was divided into training and testing sets at a ratio of 8:2, and simulations were conducted using the proposed hybrid prediction framework. As shown in Table 8, consistently high predictive performance was achieved across all periods. Overall, the model shows strong and stable generalization ability under diverse monthly and climatic conditions.

4.4. Limitations and Future Work

Although the proposed hybrid prediction framework demonstrated strong performance on simulated datasets, its practical deployment in real-world scenarios still faces several limitations and challenges. To advance the engineering implementation and broaden the applicability of the method, future studies will focus on aspects including data acquisition, model optimization, adaptation to multiple prediction horizons, robustness under complex scenarios, and deep integration with intelligent control systems. The key directions are outlined as follows:

(1) At present, model validation has been conducted primarily on simulated data, without leveraging real measured data from buildings. As a result, the model may not fully capture the complexities of real-world scenarios, such as random noise, load fluctuations driven by occupant behavior, and control deficiencies of HVAC systems. Future work will prioritize real-data-driven optimization by collecting hourly load profiles, meteorological parameters, occupant behavior patterns, and HVAC operation records from representative buildings, thereby constructing a multi-scenario real energy consumption dataset. Model input processing and training strategies will be tailored to the statistical properties of measured data, and a fault diagnosis module will be incorporated to improve robustness and practicality in engineering applications. Computational efficiency will also be systematically evaluated across different hardware platforms, accompanied by the development of lightweight or pruned versions of the model and exploration of simplified decomposition strategies within acceptable accuracy trade-offs. Finally, deployment tests will be conducted in actual building management systems to assess scalability, latency, and operational stability in real-world environments.

(2) Incorporating behavioral and psychological correction factors could further enhance the dynamic responsiveness and realism of the model [33,34]. Future studies will integrate building behavior monitoring data—such as variations in occupant density and equipment usage triggered by meetings, holidays, or unexpected events—along with psychological state indicators obtained through surveys or sensors, and investigate their integration into deep learning-based prediction models.

(3) Performance evaluation under varying temporal resolutions is a prerequisite for effective integration of the model with control strategies. In addition to the default one-hour prediction horizon, this study also assessed a one-day time step, showing that the proposed hybrid prediction framework maintained high accuracy for long-horizon forecasts (R² = 0.9857, MAE = 472.45, MAPE = 2.63%, CV-RMSE = 3.51%). Future research will systematically investigate performance across minute-, hourly, and daily level forecasts under diverse building load scenarios and climate zones. This will enable comprehensive assessment of model robustness and adaptability, thereby ensuring its capability to meet heterogeneous operational and planning needs in real-world applications.

(4) Complex or unexpected scenarios in building load forecasting can challenge the adaptability of prediction models. For instance, abrupt changes in occupant behavior—such as unscheduled meetings, gatherings, or extended absences—can lead to non-periodic load spikes that are difficult to capture in real time; equipment faults or control system failures may disrupt established load patterns; extreme weather events or other external disturbances could introduce unseen data distributions, undermining prediction reliability. Future work will address these challenges by incorporating real-time monitoring data, developing anomaly detection modules, and adopting online learning methods capable of rapidly adapting to new operational conditions, thereby enhancing adaptability and robustness in dynamic, non-stationary environments.

(5) The proposed hybrid prediction model is currently evaluated in an offline setting, and its forecasts have not yet been directly integrated into active HVAC control strategies. Future research will focus on deep integration of the prediction model with intelligent control systems, using real-time forecasts as inputs for load balancing, indoor temperature–humidity optimization, and peak shaving operations. Prediction-driven control experiments will be designed to quantitatively assess the actual benefits in terms of energy savings, peak load reduction, and occupant comfort improvement. In addition, applicability across various building types and climate conditions will be explored to ensure multi-scenario robustness and scalability of the approach.

5. Conclusions

This study proposes a novel ICEEMDAN-Kmeans-VMD-CPO-CNN-BiLSTM-Attention model for predicting building cooling loads. The model demonstrates outstanding predictive accuracy, along with excellent robustness and generalization capabilities. The main conclusions are as follows:

(1) The proposed secondary modal decomposition method significantly outperforms both primary modal decomposition and non-decomposition methods in terms of predictive accuracy and stability. Its MAPE (3.2462%) and CV-RMSE (6.7212%) are reduced by 6.542% and 6.81%, respectively, compared to primary modal decomposition, and by 10.879% and 10.751% compared to the non-decomposition approach. The proposed dual-stage decomposition and dynamic clustering strategy effectively smooths and reduces the uncertainty in the model inputs.

(2) The CPO-CNN-BiLSTM-Attention model achieved excellent predictive results for building load data across different periods, with R², MAE, MAPE, and CV-RMSE reaching 0.9929, 31.0252, 3.2462%, and 6.7212%, respectively. This indicates that the proposed model can fully leverage the strengths of both the CNN and the BiLSTM.

(3) Feature engineering contributed more significantly to improving predictive accuracy than the attention mechanism. Feature engineering yielded an average accuracy improvement of 9.29%, while the attention mechanism provided a maximum improvement of 6.81%. Nevertheless, both components enhanced the model’s predictive performance.

(4) Both the CPO-CNN and CPO-BiLSTM models outperformed their unoptimized, single neural network counterparts across performance metrics. Their box plots of relative errors showed narrower interquartile ranges and fewer outliers, indicating that the CPO algorithm provides a reliable method for optimizing parameters in both CNN and BiLSTM models. In comparison with classical approaches such as GA, PSO, and BO, the CPO framework achieved higher optimization efficiency, reduced computational overhead, and quicker convergence, demonstrating its suitability for fine-tuning deep learning architectures of this type.

This study will evolve along two primary pathways: validating the model’s generalizability across a wider range of buildings and multi-energy load (cooling, heating, electricity, and gas), and exploring advanced data-driven structures for enhanced prediction accuracy. These efforts are aimed at transforming high-fidelity load prediction into a practical tool for optimizing building energy systems, thereby directly contributing to energy efficiency improvement and carbon neutrality.

Author Contributions

Conceptualization, S.L. and Y.D.; methodology, S.L.; software, D.Y.; validation, J.Y. and C.L.; formal analysis, S.L.; investigation, Y.L.; resources, C.L.; data curation, W.C.; writing—original draft preparation, S.L.; writing—review and editing, Y.D., W.C., J.Y., Z.T. and Y.L.; visualization, S.L. and D.Y.; project administration, Z.T.; funding acquisition, Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Key R&D Program of Tianjin (20YFZCGX00950).

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	artificial neural network
BO	Bayesian optimization
BPNN	back propagation neural network
CALA-STL	Cauchy learning algorithm—seasonal-trend decomposition using loess
CNN	convolutional neural network
Co-IMF	clustered intrinsic mode function
CPO	crested porcupine optimizer
CSWD	Chinese standard weather data
CV-RMSE	coefficient of variation of the root mean square error
ELM	extreme learning machine
FA	firefly algorithm
GA	genetic algorithm
GRU	gated recurrent unit
HVAC	heating, ventilating and air-conditioning
ICEEMDAN	improved complete ensemble empirical mode decomposition with adaptive noise
IMF	intrinsic mode function
LSTM	long short-term memory
MAE	mean absolute error
MAPE	mean absolute percentage error
minSSE	minimum sum of squared errors
PCA	principal component analysis
PSO	particle swarm optimization
R²	coefficient of determination
RT	regression tree
SampEn	sample entropy
SSA	singular spectrum analysis
SVM	support vector machine
VMD	variational mode decomposition
XGBoost	extreme gradient boosting

References

Ding, Y.; Hu, L.; Wang, Q.; Bai, Y.; Tian, Z.; Yang, C. Model predictive control for ice-storage air conditioning systems with time delay compensation integration. Energy 2025, 320, 135336. [Google Scholar] [CrossRef]
Chen, Z.; Xiao, F.; Guo, F.; Yan, J. Interpretable machine learning for building energy management: A state-of-the-art review. Adv. Appl. Energy 2023, 9, 100123. [Google Scholar] [CrossRef]
Wu, C.; Pan, H.; Luo, Z.; Liu, C.; Huang, H. Multi-objective optimization of residential building energy consumption, daylighting, and thermal comfort based on BO-XGBoost-NSGA-II. Build. Environ. 2024, 254, 111386. [Google Scholar] [CrossRef]
Huang, H.; Hughes, B.R. Review of HVAC forecasting and control strategies for improved building performance. Build. Environ. 2026, 287, 113797. [Google Scholar] [CrossRef]
Lu, S.; Zhou, S.; Ding, Y.; Kim, M.K.; Yang, B.; Tian, Z.; Liu, J. Exploring the comprehensive integration of artificial intelligence in optimizing HVAC system operations: A review and future outlook. Results Eng. 2025, 25, 103765. [Google Scholar] [CrossRef]
Fan, C.; Sun, Y.; Zhao, Y.; Song, M.; Wang, J. Deep learning-based feature engineering methods for improved building energy prediction. Appl. Energy 2019, 240, 35–45. [Google Scholar] [CrossRef]
Hu, J.; Zheng, W.; Zhang, S.; Li, H.; Liu, Z.; Zhang, G.; Yang, X. Thermal load prediction and operation optimization of office building with a zone-level artificial neural network and rule-based control. Appl. Energy 2021, 300, 117429. [Google Scholar] [CrossRef]
Guo, Y.; Wang, J.; Chen, H.; Li, G.; Liu, J.; Xu, C.; Huang, R.; Huang, Y. Machine learning-based thermal response time ahead energy demand prediction for building heating systems. Appl. Energy 2018, 221, 16–27. [Google Scholar] [CrossRef]
Khalil, M.; McGough, A.S.; Pourmirza, Z.; Pazhoohesh, M.; Walker, S. Machine Learning, Deep Learning and Statistical Analysis for forecasting building energy consumption—A systematic review. Eng. Appl. Artif. Intell. 2022, 115, 105287. [Google Scholar] [CrossRef]
Wang, Z.; Hong, T.; Piette, M.A. Data fusion in predicting internal heat gains for office buildings through a deep learning approach. Appl. Energy 2019, 240, 386–398. [Google Scholar] [CrossRef]
An, W.; Zhu, X.; Yang, K.; Kim, M.K.; Liu, J. Hourly Heat Load Prediction for Residential Buildings Based on Multiple Combination Models: A Comparative Study. Buildings 2023, 13, 2340. [Google Scholar] [CrossRef]
Li, C.; Li, G.; Wang, K.; Han, B. A multi-energy load forecasting method based on parallel architecture CNN-GRU and transfer learning for data deficient integrated energy systems. Energy 2022, 259, 124967. [Google Scholar] [CrossRef]
Bui, D.-K.; Nguyen, T.N.; Ngo, T.D.; Nguyen-Xuan, H. An artificial neural network (ANN) expert system enhanced with the electromagnetism-based firefly algorithm (EFA) for predicting the energy consumption in buildings. Energy 2020, 190, 116370. [Google Scholar] [CrossRef]
Feng, Z.; Zhang, X.; Quan, W.; Liu, X.; An, J.; Wang, C.; Ji, X.; Kang, L. A hybrid deep learning model based on Rime optimization and multi-head attention for cooling load prediction in public buildings. Energy 2025, 339, 139100. [Google Scholar] [CrossRef]
Kong, X.; Zhao, C.; Dai, H.; Sun, Y.; Yuan, J. Research on dynamic cooling load prediction method of cascaded-CPCMs building based on machine learning. Appl. Therm. Eng. 2025, 274, 126644. [Google Scholar] [CrossRef]
Guo, Y.; Jia, M.; Su, C.; Darkwa, J.; Hou, S.; Pan, F.; Wang, H.; Liu, P. A novel CALA-STL algorithm for optimizing prediction of building energy heat load. Energy Build. 2025, 328, 115207. [Google Scholar] [CrossRef]
Song, C.; Yang, H.; Meng, X.-B.; Yang, P.; Cai, J.; Bao, H.; Xu, K. A novel deep-learning framework for short-term prediction of cooling load in public buildings. J. Clean. Prod. 2024, 434, 139796. [Google Scholar] [CrossRef]
Lu, C.; Gu, J.; Lu, W. An improved attention-based deep learning approach for robust cooling load prediction: Public building cases under diverse occupancy schedules. Sustain. Cities Soc. 2023, 96, 104679. [Google Scholar] [CrossRef]
Zhang, C.; Hoes, P.-J.; Wang, S.; Zhao, Y. Intrinsically interpretable machine learning-based building energy load prediction method with high accuracy and strong interpretability. Energy Built Environ. 2024, in press. [Google Scholar] [CrossRef]
Salami, B.A.; Abba, S.I.; Adewumi, A.A.; Dodo, U.A.; Otukogbe, G.K.; Oyedele, L.O. Building energy loads prediction using bayesian-based metaheuristic optimized-explainable tree-based model. Case Stud. Constr. Mater. 2023, 19, e02676. [Google Scholar] [CrossRef]
Fouladfar, M.H.; Soppelsa, A.; Nagpal, H.; Fedrizzi, R.; Franchini, G. Adaptive thermal load prediction in residential buildings using artificial neural networks. J. Build. Eng. 2023, 77, 107464. [Google Scholar] [CrossRef]
Zhou, L.; Yan, P.; Li, X.; Liu, T.; Liu, Z.; Jia, W. Research on prediction model of high geothermal tunnels temperature based on CNN-SVM. Energy Build. 2025, 347, 116285. [Google Scholar] [CrossRef]
Chiu, M.-C.; Hsu, H.-W.; Chen, K.-S.; Wen, C.-Y. A hybrid CNN-GRU based probabilistic model for load forecasting from individual household to commercial building. Energy Rep. 2023, 9, 94–105. [Google Scholar] [CrossRef]
Kim, D.; Lee, D.; Nam, H.; Joo, S.-K. Short-Term Load Forecasting for Commercial Building Using Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) Network with Similar Day Selection Model. J. Electr. Eng. Technol. 2023, 18, 4001–4009. [Google Scholar] [CrossRef]
Huang, Y.; Wu, H.; Fu, J.; Zhang, H.; Li, H. Convolutional neural network-based wind pressure prediction on low-rise buildings. Eng. Struct. 2024, 309, 118078. [Google Scholar] [CrossRef]
Han, J.; Zeng, P. Residual BiLSTM based hybrid model for short-term load forecasting in buildings. J. Build. Eng. 2025, 99, 111593. [Google Scholar] [CrossRef]
da Silva, D.G.; Meneses, A.A.d.M. Comparing Long Short-Term Memory (LSTM) and bidirectional LSTM deep neural networks for power consumption prediction. Energy Rep. 2023, 10, 3315–3334. [Google Scholar] [CrossRef]
Özbey, M.F.; Turhan, C. A novel comfort temperature determination model based on psychology of the participants for educational buildings in a temperate climate zone. J. Build. Eng. 2023, 76, 107415. [Google Scholar] [CrossRef]
GB 50189-2015; Chinese Design Standard for Energy Efficiency of Public Buildings. China Architecture & Building Press: Beijing, China, 2015.
Tian, Z.; Song, W.; Lu, Y.; Lin, X.; Niu, J. Load extraction from actual operation data for data-driven ultra-short-term room air-conditioning load prediction. Energy Build. 2023, 296, 113348. [Google Scholar] [CrossRef]
Hwang, R.-L.; Chen, W.-A. Identifying relative importance of solar design determinants on office building façade for cooling loads and thermal comfort in hot-humid climates. Build. Environ. 2022, 226, 109684. [Google Scholar] [CrossRef]
Qiao, D.; Li, P.; Ma, G.; Qi, X.; Yan, J.; Ning, D.; Li, B. Realtime prediction of dynamic mooring lines responses with LSTM neural network model. Ocean. Eng. 2021, 219, 108368. [Google Scholar] [CrossRef]
Özbey, M.F.; Turhan, C.; Alkan, N.; Akkurt, G.G. Latent Psychological Pathways in Thermal Comfort Perception: The Mediating Role of Cognitive Uncertainty on Depression and Vigour. Buildings 2025, 15, 2538. [Google Scholar] [CrossRef]
Turhan, C.; Carpino, C. Integrating Personalized Thermal Comfort Devices for Energy-Efficient and Occupant-Centric Buildings. Buildings 2025, 15, 1470. [Google Scholar] [CrossRef]

Figure 1. Technical roadmap.

Figure 2. Secondary modal decomposition flowchart.

Figure 3. CNN Hierarchical Structure.

Figure 4. BiLSTM structure.

Figure 5. Structure of the CNN-BiLSTM-Attention model.

Figure 6. Flowchart of the CPO-CNN-BiLSTM-Attention prediction model.

Figure 7. Overview of the case building simulation and key environmental variables: (a) building simulation model; (b) cooling load data; (c) solar radiation; (d) outdoor temperature and relative humidity.

Figure 8. Correlation analysis of input parameters.

Figure 9. Results of the initial ICEEMDAN modal decomposition.

Figure 10. K-means clustering results.

Figure 11. Results of VMD secondary modal decomposition.

Figure 12. Results from five prediction models.

Figure 13. Validation of modal decomposition effectiveness: (a) secondary modal decomposition results, (b) primary modal decomposition results, and (c) non-decomposition results, evaluated by MAPE and CV-RMSE metrics.

Figure 14. Impact of feature engineering and attention mechanism on model prediction accuracy: (a) comparison of prediction accuracy for different model architectures with and without feature engineering, and (b) comparison of prediction accuracy between models with and without feature engineering to quantify the contribution of the attention mechanism.

Figure 15. Plot of relative errors in prediction results: (a) box plots of relative errors for six prediction models on the test set, and (b) box plots of relative errors for six prediction models on the training set.

Figure 16. Comparison of hyperparameter configurations: (a) number of kernels, (b) dropout rate, (c) pooling window size, and (d) number of filters.

Figure 17. Sensitivity analysis results for parameters in ICEEMDAN, K-means, and VMD: (a) white noise amplitude, (b) number of clusters K, and (c) number of modes K.

Table 1. Building model parameter settings.

Parameter	Value	Parameter	Value
Length	43 m	External wall heat transfer coefficient	0.54 W/(m²·K)
Width	35 m	Window heat transfer coefficient	1.8 W/(m²·K)
Height	12 m	Occupant density	5 persons/100 m²
Window-to-wall ratio	0.7	Lighting power density	9.5 W/m²
Roof heat transfer coefficient	0.33 W/(m²·K)	Equipment power density	75 W/m²

Table 2. Input parameters.

Variable Name	Unit
Time type	0, 1, …, 23
Weekday type	Monday, Tuesday, …, Sunday
Outdoor temperature	°C
Outdoor humidity	%
Cooling load at previous time step	W/m²
Solar radiation at previous time step	W/m²

Table 3. Model hyperparameter settings.

Hyperparameter	Value	Hyperparameter	Value
Convolution kernel size	[3, 1]	Dropout probability	0.1
Number of convolution kernels	16	Optimizer type	Adam
Number of filters	128	Maximum training epochs	30
Stride	[1, 1]	Gradient clipping threshold	1
Pooling window size	2

Table 4. Prediction performance of the hybrid models.

Model	Evaluation Metric
Model	R²	MAE	MAPE (%)	CV-RMSE (%)
CPO-CNN-BiLSTM-Attention	0.9929	31.0252	3.2462	6.7212
CPO-CNN-BiLSTM	0.9896	38.4950	5.3146	8.1546
CPO-BiLSTM	0.9812	56.1133	8.4137	10.9487
CPO-CNN	0.9721	71.4175	10.0618	13.3426
BiLSTM	0.9713	68.5972	9.7881	13.5312
CNN	0.9673	69.9111	9.6380	9.6380

Table 5. Performance comparison under different decomposition schemes.

No.	Modal Decomposition	Model Type	MAPE (%)	CV-RMSE (%)
Scheme 1	No	CNN	15.2051	21.8377
Scheme 2	No	CPO–CNN–BiLSTM–Attention	14.1250	17.4720
Scheme 3	Yes	CNN	9.6380	9.6380

Table 6. Performance comparison results of different optimization algorithms.

Optimization Algorithm	Convergence Iterations	CV-RMSE	Runtime/s
GA	81	10.698	35.2
PSO	78	8.0597	33.7
BO	73	7.5762	26.8
CPO	63	6.7212	23.9

Table 7. Stability verification results of model prediction performance under different random data splits.

Run	Evaluation Metric
Run	R²	MAE	MAPE (%)	CV-RMSE (%)
Run 1	0.9929	31.0252	4.2462	6.7212
Run 2	0.9913	34.4574	4.7350	7.4467
Run 3	0.9925	30.6647	4.7616	6.9265
Run 4	0.9911	34.4826	5.3559	7.5470
Run 5	0.9923	33.6359	4.8135	7.0018
Run 6	0.9918	33.3607	4.5466	7.2530
Mean	0.9920	32.9378	4.7431	7.1494

Table 8. Generalization verification results of model prediction performance under different monthly datasets.

Time Period	R²	MAE	MAPE (%)	CV-RMSE (%)
1–30 June	0.9920	26.3135	4.1532	7.0188
1–31 July	0.9948	29.4909	3.7700	5.7339
1–31 August	0.9897	42.0436	5.7937	7.9132

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, S.; Yu, D.; Ding, Y.; Chen, W.; Liang, C.; Yuan, J.; Tian, Z.; Lu, Y. A Short-Term Building Load Prediction Method Based on Modal Decomposition and Deep Learning. Buildings 2025, 15, 4455. https://doi.org/10.3390/buildings15244455

AMA Style

Lu S, Yu D, Ding Y, Chen W, Liang C, Yuan J, Tian Z, Lu Y. A Short-Term Building Load Prediction Method Based on Modal Decomposition and Deep Learning. Buildings. 2025; 15(24):4455. https://doi.org/10.3390/buildings15244455

Chicago/Turabian Style

Lu, Shengze, Dandan Yu, Yan Ding, Wanyue Chen, Chuanzhi Liang, Jihui Yuan, Zhe Tian, and Yakai Lu. 2025. "A Short-Term Building Load Prediction Method Based on Modal Decomposition and Deep Learning" Buildings 15, no. 24: 4455. https://doi.org/10.3390/buildings15244455

APA Style

Lu, S., Yu, D., Ding, Y., Chen, W., Liang, C., Yuan, J., Tian, Z., & Lu, Y. (2025). A Short-Term Building Load Prediction Method Based on Modal Decomposition and Deep Learning. Buildings, 15(24), 4455. https://doi.org/10.3390/buildings15244455

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Short-Term Building Load Prediction Method Based on Modal Decomposition and Deep Learning

Abstract

1. Introduction

2. Methodology

2.1. Data Processing

2.1.1. Primary Decomposition of Data

2.1.2. Data Component Aggregation

2.1.3. Secondary Decomposition of Data

2.2. CPO-CNN-BiLSTM-Attention Model Establishment

2.2.1. Convolutional Neural Network

2.2.2. Bidirectional Long Short-Term Memory Network

2.2.3. Attention Mechanism

2.2.4. Crested Porcupine Optimizer

2.2.5. Hybrid Prediction Model Establishment

3. Case Study

3.1. Case Building

3.2. Data Preprocessing

3.3. Evaluation Metrics

4. Results and Discussion

4.1. Data Decomposition Results

4.2. Prediction Results

4.3. Prediction Performance Analysis

4.3.1. Performance of Modal Decomposition

4.3.2. Impact of Feature Engineering and Attention Mechanism on Model Prediction Accuracy

4.3.3. Performance of Different Optimization Algorithms

4.3.4. Robustness of the Models

4.3.5. Configuration of Hyperparameters

4.3.6. Generalization Analysis of the Model

4.4. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI