1. Introduction
Against the backdrop of continuously increasing global energy consumption and increasingly severe environmental problems, improving the energy efficiency of the building sector has become crucial to achieving energy conservation and emission-reduction targets. Studies have shown that the building sector accounts for approximately 40% of global energy use, while its associated carbon dioxide emissions account for about 36% of global emissions [
1]. This large share of energy consumption implies that the sector also holds considerable potential for energy savings. Consequently, intelligent energy-saving strategies that integrate big data analytics and AI techniques have become an important research frontier [
2]. Among various technical approaches, accurate building load forecasting forms the basis for fine-grained energy management and demand-side response. In recent years, deep learning methods such as Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTM), and their hybrid models have demonstrated outstanding performance in load-forecasting tasks and have significantly improved prediction accuracy, owing to their strong capability for automatic feature extraction and temporal-dependency modeling [
3].
Because actual building load data typically require long acquisition periods, incur high costs for sensor deployment, operation and maintenance, and are often affected by missing and noisy records, an increasing number of studies have employed building energy simulation tools to generate load samples. Pan et al. [
4] noted that Building Performance Simulation (BPS), supported by Building Energy Modeling (BEM), is one of the most advanced and important carbon-reduction technologies in the building sector, and that it plays an increasingly important role in the energy-efficient design, operation, and retrofitting of buildings. Dong et al. [
5] used EnergyPlus to develop a detailed building model and conducted a case study to investigate how changes in thermostat setpoints affect building energy consumption, thereby demonstrating the importance of optimizing design factors (such as thermostat-setpoint schedules) for reducing energy use. Vermette et al. [
6] employed EnergyPlus to simulate and analyze the energy performance of a solar-integrated community. Although EnergyPlus can provide detailed and reliable simulations of building energy consumption, it primarily relies on text-based Input Data Files (IDFs) for user interaction, making model development and configuration relatively cumbersome and placing high demands on users. To reduce the modeling effort and improve the efficiency of model modification and parametric analysis, the open-source modeling platform OpenStudio was developed on top of the EnergyPlus simulation engine. OpenStudio preserves the computational accuracy of EnergyPlus while providing a graphical modeling interface. For example, Kobeyev et al. [
7] used OpenStudio together with EnergyPlus to perform energy simulations and analyzed a set of building-envelope parameters (such as insulation thickness, window performance, and building orientation) to identify their impact on energy consumption.
Although physics-based simulation methods provide a rigorous theoretical foundation for the analysis of building cooling loads, their high computational complexity and long processing times make it difficult for them to satisfy the requirements for rapid response and timeliness that are often imposed on predictive models in practical engineering applications. Consequently, data-driven methods that can directly learn patterns from historical data have emerged as powerful alternatives and have rapidly become the mainstream research paradigm in the field of building energy-consumption prediction [
8,
9]. In the early stages of the development of data-driven approaches, traditional machine-learning models were widely applied. For instance, Boukelia et al. [
10] used Artificial Neural Network (ANN) and regression models to perform annual assessments of the cooling performance of power plants. Similarly, Ciulla et al. [
11] demonstrated that a Multivariate Linear Regression (MLR) model exhibits high reliability in predicting annual heating and cooling energy demands. These studies provided early evidence for the feasibility and potential of machine-learning methods in building-energy-related prediction tasks.
A Recurrent Neural Network (RNN) constitutes a family of deep learning architectures specifically developed to model temporal relationships within sequential data. Owing to this property, RNNs are highly effective in modeling time-series data that exhibit temporal dependence and periodic behavior. However, RNNs often suffer from exploding or vanishing gradients during training, which hinders their ability to preserve long-term dependencies in sequential data and consequently limits their prediction accuracy in time-series tasks. By adding memory cells and gating methods to the RNN design, Long Short-Term Memory networks overcome these drawbacks and improve the modeling of long-term dependence [
12]. In a number of fields, such as natural language processing [
13], machine translation [
14], and video recognition [
15], LSTM has demonstrated impressive results. Nevertheless, the predictive accuracy of a single model is still limited; thus, many researchers have sought to integrate the advantages of several models, for example, through ensemble or hybrid approaches, in order to reduce forecasting errors. It was demonstrated that Xu et al.’s attention-enhanced temporal depthwise separable convolutional neural network-based electric load forecasting model significantly increased forecasting accuracy and produced more trustworthy forecasts for power system operation and dispatch [
16]. Lan et al. [
17] developed an attention-based CNN-LSTM model for building heating load prediction, in which the attention mechanism’s ability to capture global features further improved prediction accuracy. Yang et al. [
18] demonstrated that CNN–LSTM models can anticipate short-term electric load with great accuracy, particularly when the load data exhibit nonlinear and nonstationary characteristics.
Although deep learning algorithms have shown excellent accuracy in load prediction in existing studies, their performance is highly sensitive to hyperparameter settings. Manual hyperparameter tuning is often empirical and subject to considerable uncertainty. Consequently, an increasing number of researchers have begun to adopt metaheuristic optimization algorithms to automatically search for optimal or near-optimal hyperparameter configurations in a global search space, thereby enhancing model stability and generalization capability. Heuristic algorithms, including Particle Swarm Optimization (PSO) [
19], Simulated Annealing Algorithms (SAAs) [
20], and Genetic Algorithms (GAs), are commonly used to solve optimization problems [
21], etc. Sekhar et al. [
22] proposed a CNN hybrid method based on PSO for power load forecasting. This method optimizes the parameters of the CNN through PSO, improving the prediction accuracy. This indicates that PSO has potential in handling time series prediction problems; Zhou et al. [
23] combined the CNN-LSTM model and the Sparrow Search Algorithm (SSA) in the short-term multi-load forecasting of cooling, heating and power systems, enhancing the accuracy and stability of the prediction. Although algorithms such as PSO, GA, and SSA have demonstrated potential in optimizing CNN-LSTM architectures, they still exhibit certain limitations. For instance, PSO suffers from slow convergence and relatively low efficiency [
24]. GA is capable of achieving global optima, but typically requires a larger number of function evaluations, and its encoding along with operators such as crossover and mutation is often complex to design [
25]. While SSA offers advantages in terms of fast convergence and high precision, it shows weaker global search capability and a limited ability to escape local optima [
24]. In contrast, the GWO algorithm, inspired by the social hierarchy and the hunting mechanisms of grey wolf packs, achieves a more effective balance between global exploration and local exploitation. It features straightforward parameter settings and exhibits strong global search performance [
26]. Cai et al. [
27] constructed an adaptive prediction model based on the Grey Wolf Optimization Algorithm and Long Short-Term Memory Network in the medium and long-term wind power load forecasting, and compared the results obtained by this model with existing studies, proving that the proposed prediction method has better performance than existing studies. This hybrid optimization strategy can fully utilize the advantages of different algorithms, further enhancing the model’s ability to capture spatiotemporal dynamics, thus achieving better results in complex prediction tasks.
In this paper, we use SketchUp to model a building in Beijing, and we use EnergyPlus and OpenStudio to simulate the cooling load of the building, obtaining the load data. This study is the first to include the application of the GWO to hyperparameter tuning of a CNN-LSTM hybrid model for building cooling load prediction, and we empirically demonstrate that, for this class of forecasting tasks, GWO outperforms conventional optimizers (e.g., PSO and GA) in terms of prediction accuracy. These findings suggest a more efficient and effective solution for cooling load forecasting in buildings.
2. Materials and Methods
2.1. Simulation of Building Energy Loads
SketchUp is a 3D modeling software developed by Trimble Inc. (headquartered in Westminster, CO, USA). The advantage of SketchUp lies in its compatibility with the OpenStudio plugin. Through the OpenStudio plugin, attributes related to energy consumption such as building materials, heat zones, and orientations can be added to the geometric models constructed in SketchUp.
The National Renewable Energy Laboratory in the United States created OpenStudio, which is an open-source software development kit (SDK) and application suite that supports the specification of input parameters required by the EnergyPlus simulation engine via both a graphical user interface and a data-modeling framework. Based on the geometric model created in SketchUp, envelope properties (including material layers and thermal characteristics), internal loads (such as the power density of equipment, lighting, and occupants), and operational schedules (including occupancy patterns, lighting and equipment usage profiles, temperature setpoints, and ventilation schedules) are assigned to each space or thermal zone. In addition, the configuration of the HVAC system must be specified, encompassing the system type (e.g., variable-air-volume systems, split-type air-conditioning units, radiant heating systems), control strategies, and key equipment performance parameters. Acting as middleware, OpenStudio organizes model objects in a structured manner and automatically generates the input files required by EnergyPlus. This workflow minimizes manual coding errors and ensures consistency among the geometric model, load definitions, operational schedules, and system configurations.
EnergyPlus is an energy simulation engine for buildings developed by the US Department of Energy. It calculates parameters such as the energy consumption and load of buildings based on heat transfer equations, airflow models, and system simulations. By integrating with OpenStudio, it simplifies the difficulty of parameter settings.
This study employed SketchUp 2023 for building geometry modeling, OpenStudio version 3.7.0 for energy model configuration, and EnergyPlus version 23.2.0 for building energy performance simulation.
2.2. Convolutional Neural Network
CNNs are a type of feedforward neural network. Convolutional, pooling, and fully linked layers make up their core. Through mechanisms such as local connections, weight sharing, and pooling operations, the complexity of the network model is reduced, and the number of parameters is decreased. This structure enables CNNs to effectively extract translation-invariant features of the input data and possesses strong hierarchical feature learning capabilities. As a result, they have been widely applied and achieved breakthroughs in fields such as image recognition and object detection. A schematic of a typical CNN architecture is illustrated in
Figure 1.
2.3. LSTM Neural Network
LSTMs are a special class of RNN specifically designed to mitigate the vanishing- and exploding-gradient problems encountered during the training of conventional RNNs. Effective learning of long-term dependencies in sequential data is made possible by LSTM’s ability to choose to keep or discard information by combining a gating mechanism—which consists of an input gate, a forget gate, and an output gate—with a cell state.
Figure 2 shows the structure of an LSTM unit.
The computation inside an LSTM unit goes like this at each time step (
t). The candidate cell state (
), forget gate (
), and input gate (
) are first computed:
Here, σ stands for the sigmoid activation function, which regulates each gate’s openness and has an output range of 0 to 1. The concatenation of the current input and the hidden state from the preceding time step is represented by . The relevant weight matrices and bias terms are denoted by the letters W and b, respectively.
The cell state (
Ct) at the present time step is then modified as follows:
The amount of information from the prior cell state that will be forgotten is determined by this action, as controlled by (), and how much new candidate information is to be added, as regulated by ().
Finally, the output gate (
) and the current hidden state (
) are computed as follows:
The cooling load time series in building cooling load forecasting is impacted by a number of variables, including building usage patterns and exterior weather, and thus, exhibits pronounced non-stationarity and complex temporal dynamics. Because of their tremendous ability to maintain long-term dependence, LSTM networks can accurately capture periodic patterns in load data—such as daily and weekly cycles—as well as long-range effects driven by weather trends. This property provides a strong rationale for employing LSTM-based models to develop high-accuracy cooling load forecasting methods.
2.4. Grey Wolf Optimizer
Mirjalili et al. devised the population-based metaheuristic optimization technique known as GWO [
28]. Its fundamental mechanism is modeled after the social order and hunting strategy of grey wolves. Owing to its simplicity, ease of implementation, and small number of control parameters, GWO has been extensively used in many different fields. In GWO, the wolves are classified into four hierarchical levels in descending order: α, β, δ, and ω. Lower-level wolves strictly follow the leadership of higher-level wolves. During the optimization process, the grey wolf pack locates the prey, estimates the distance, and iteratively updates the positions of the wolves in the search space until the prey is successfully captured. The algorithm comprises two main stages: encircling and assaulting the victim. The specific steps of the GWO algorithm are detailed in
Appendix A.
3. Results and Discussion
3.1. Load Simulation
SketchUp was used to produce a three-dimensional model of the structure. In OpenStudio, the building envelope (material layers and thermal properties), internal loads, and operational schedules were specified as inputs for the load simulations. The simulation location was specified at 39.8° N and 116.47° E.
A small office building in Beijing is used as the simulation case, as offices constitute a major component of the city’s public building stock, and their cooling-load behavior is of broad research relevance. The modeled building has a gross floor area of 123.68 m
2 across two stories, with 34.5 m
2 of HVAC-conditioned space. The interior is partitioned into representative thermal zones—office area, storage, and restroom—to capture heterogeneous internal gains. Occupancy density, equipment operation, and HVAC operating schedules are specified according to typical weekday patterns for Beijing. The simulation period was set to run from 1 April to 30 September. The HVAC system in this study was modeled as a variable air volume (VAV) system with reheat, comprising a chiller, cooling coil, heating coil, variable-speed pumps, variable-speed fans, a boiler, and a cooling tower.
Table 1 presents the model parameters.
Figure 3 and
Figure 4 present the EnergyPlus building model developed in SketchUp.
Table 1 summarizes additional modeling parameters.
The dataset used in this study was generated using building energy simulation software with an hourly sampling interval. All input features and the target variable were normalized using Z-score standardization. As the data were simulation-based, no missing values were present; nevertheless, potential outliers were identified using the 3σ criterion and subsequently corrected by replacing them with values obtained via linear interpolation. The results of the building cooling load simulation are shown in
Figure 5.
In the subsequent cooling load forecasting, data from June, July, and August, which correspond to the summer season in the study area, were used to develop and evaluate the forecasting models.
3.2. Variable Filtering
In developing a data-driven cooling load forecasting model, the appropriate selection of input variables is crucial to improving both prediction accuracy and generalization capability. The Pearson correlation coefficient was employed in this investigation as a quantitative criterion to identify candidate input variables that exhibit strong correlation with the building cooling load. This coefficient assesses how strongly and in which direction two continuous variables have a linear relationship, taking values in the range [−1, 1]. The linear relationship between the variables is stronger the closer the coefficient’s absolute value is to 1; positive values denote positive correlations, whereas negative values indicate negative correlations.
The seven variables that had the strongest connection coefficients with the cooling load were chosen as model inputs based on the correlation analysis: cooling load in the previous hour (0.921), cooling load two hours earlier (0.816), indoor air temperature (0.666), global horizontal irradiance (0.628), outdoor dry-bulb temperature (0.613), diffuse horizontal irradiance (0.558), and outdoor dry-bulb temperature in the previous hour (0.534). These correlation results are illustrated in
Figure 6.
Following feature selection based on Pearson correlation analysis, the cooling load data and associated input variables from June, July, and August were partitioned into a training set (80%) and a test set (20%) for subsequent load forecasting.
3.3. Load Forecasting Based on GWO-CNN-LSTM
To mitigate overfitting and enhance generalization, a comprehensive strategy encompassing data partitioning, model selection, and training control was implemented. The data was strictly partitioned in chronological order, with the first 80% allocated for training and the remaining 20% reserved for testing, thereby preventing data leakage. The proposed CNN–LSTM architecture begins with a one-dimensional convolutional layer for temporal feature extraction, followed by a ReLU activation function to introduce nonlinearity and a Dropout layer for regularization. This output is fed sequentially into two LSTM layers to capture temporal dependencies, with an additional Dropout layer applied after the first LSTM layer. A final fully connected layer then maps the features to a single prediction value. All critical network hyperparameters—including the learning rate, batch size (constrained to powers of two), Dropout rate, and L2 regularization coefficient—were automatically optimized using the GWO. During this optimization phase, the performance of each candidate model was rigorously evaluated via 5-fold cross-validation. Furthermore, an early stopping mechanism was employed during training. These combined measures effectively controlled overfitting across multiple levels, from data handling and model configuration to the training process itself. The parameter configurations are displayed in
Table 2.
In the GWO algorithm, the eight hyperparameters are encoded as an 8-dimensional position vector, with each dimension corresponding to one parameter to be optimized. All initial values are normalized to the range [0, 1]. Specifically, linear scaling is applied to the number of convolutional kernels, the size of convolutional kernels, the number of hidden units in the two LSTM layers, and the Dropout rate. Logarithmic scaling is used for the learning rate and the L2 regularization coefficient to facilitate balanced exploration across multiple orders of magnitude. For the batch size, exponential scaling is adopted: the normalized value determines an integer exponent k, and the final batch size is computed as 2k to ensure computational efficiency.
In
Table 2, the number of convolutional filters (numFilters) was set within the range of 8–64 to ensure sufficient feature extraction capability while preventing excessive model complexity, which could lead to overfitting and increased computational cost. The convolutional kernel size (filterSize) was constrained to 3–11 to capture local features at different temporal scales. A two-layer LSTM architecture was adopted to enhance temporal modeling capacity: the first LSTM layer employed 32–128 hidden units (hidden1) to learn high-dimensional temporal representations, while the second layer used 16–128 hidden units (hidden2). During optimization, an empirical constraint of hidden2 ≤ hidden1 was enforced to maintain structural coherence and avoid redundant parameters. To improve training efficiency and take advantage of GPU memory alignment, the batch size (batchSize) was restricted to powers of two, corresponding to values between 16 and 256. The dropout rate was set within [0, 0.5], and the L2 regularization coefficient was selected on a logarithmic scale in the range of 10
−6 to 10
−2, thereby enhancing model generalization and mitigating overfitting under limited training data conditions.
The objective of the GWO optimization process is to minimize the mean squared error (
MSE) of the model on the validation set. The formula for
MSE is given below.
represents a solution vector consisting of 8 hyperparameters, with = 5 being the number of cross-validation folds; denotes the mean squared error on the k-th fold validation set, is the true normalized load value, and is the model predicted value.
The GWO convergence curve is shown in
Figure 7, and the training and validation loss curves are shown in
Figure 8.
The optimal hyperparameters obtained from the optimization process are presented in
Table 3.
The final prediction results are shown in
Figure 9.
The Mean Absolute Error (MAE), Root-Mean-Square Error (RMSE), and coefficient of determination (R
2) were used as evaluation metrics to assess model performance. Lower values of MAE and RMSE, together with higher values of R
2, indicate better model fit and higher prediction accuracy. The evaluation results for the GWO-CNN-LSTM model are summarized in
Table 4.
3.4. Comparison of Different Models
3.4.1. Comparison with Single-Algorithm Models
To ensure a fair comparison, the training conditions for all models (CNN, LSTM, and the proposed GWO-CNN-LSTM) were unified. All models were trained using the Adam optimizer to minimize the Mean Squared Error (MSE) loss function. To prevent overfitting, an early stopping mechanism was implemented: the training process terminated if the validation loss did not decrease for six consecutive epochs. The maximum number of epochs was set to 80 for all models.
The prediction results of the GWO-CNN-LSTM model were compared with those of the CNN and LSTM models, as illustrated in
Figure 10.
As illustrated in
Figure 10, the CNN model demonstrates the poorest fit: it not only fails to capture peak loads but also exhibits unstable oscillations across the trend, making it difficult to reflect the actual patterns of load variation. Although the LSTM model can, to some extent, track the periodic variations in load, it shows significant shortcomings in predicting extreme values, along with a pronounced lag. In comparison, the prediction curve of the GWO-CNN-LSTM model aligns most closely with the actual cooling load changes, effectively capturing both short-term fluctuations and long-term trends in load.
Table 5 presents the evaluation metrics for different algorithms.
Plot the evaluation indicators as a histogram, as shown in
Figure 11.
As shown in
Table 5, the proposed GWO-CNN-LSTM model demonstrates significant superiority over the standalone CNN and LSTM models across all evaluation metrics. Specifically, the MAE and RMSE of GWO-CNN-LSTM are approximately 61.1% and 55.8% lower than those of the CNN model and about 35.0% and 27.0% lower than those of the LSTM model, respectively. Moreover, the R
2 value increases to 0.9266, which is 0.3021 and 0.0643 higher than the R
2 values of the CNN and LSTM, respectively. These findings indicate that the standalone CNN model has limited capability in capturing long-term dependencies in time series, while the LSTM, although effective in processing temporal information, still allows room for improved prediction accuracy through structural enhancements. By integrating the local feature extraction ability of CNN with the temporal modeling strength of LSTM, and further incorporating the Grey Wolf Optimizer for hyperparameter tuning, the GWO-CNN-LSTM model substantially enhances its capacity to capture both long-term trends and short-term fluctuations in cooling load sequences, thereby achieving more accurate prediction performance.
3.4.2. Comparison with Other Hybrid Algorithms
PSO is a swarm intelligence algorithm that draws inspiration from flocks of birds feeding. In PSO, a set of “particles” update their positions and velocities within the search space, directed by the swarm’s overall optimal position as well as each individual’s best position. Through iterative updates, the algorithm quickly converges to the optimal solution. PSO is known for requiring few parameters, being easy to implement, and exhibiting relatively fast convergence, making it widely used in continuous optimization problems. GA is a global optimization technique based on natural selection and genetic principles. It applies encoding, selection, crossover, and mutation operations to preserve high-quality individuals and generate new solutions during the evolutionary process, progressively improving the population within the search space. GA is particularly effective for multi-modal, nonlinear, and complex search spaces, offering strong global search capabilities. Establish GA-CNN-LSTM and PSO-CNN-LSTM models for cooling load forecasting. To ensure the fairness of the comparison, the training conditions for the proposed GWO-CNN-LSTM and the benchmark models (PSO-CNN-LSTM, GA-CNN-LSTM) were unified. The Adam optimizer was employed for network training. For the optimization algorithms, the population size was set to 15, and the maximum number of iterations was set to 40. During the fitness evaluation (cross-validation) phase, the maximum training epochs were set to 20 with a validation patience of 3 to prevent overfitting and reduce computational time. For the final model training using the optimal hyperparameters, the maximum epochs were increased to 80 with a validation patience of 6. The predicted results were compared with those of the GWO-CNN-LSTM method, as shown in
Figure 12.
As shown in the comparative figures, there are notable differences among the three models in terms of peak magnitude, trough location, and fitting accuracy during rapid fluctuation phases. Compared with the GA-CNN-LSTM and PSO-CNN-LSTM models, the prediction curve of GWO-CNN-LSTM aligns more closely with the actual cooling load values across most time intervals. Particularly during peak load periods, it captures the peak magnitude more accurately with smaller deviations. In low-load and sudden-change intervals, GWO-CNN-LSTM also demonstrates better stability and resistance to fluctuations, reducing over-smoothing or lagging phenomena. These results indicate that the GWO algorithm can more effectively balance the spatial feature extraction capability of CNN and the temporal modeling ability of LSTM during the hyperparameter optimization process, thereby enhancing the model’s overall capacity to characterize both short-term fluctuations and long-term variation patterns. Overall, the prediction performance of GWO-CNN-LSTM surpasses that of the comparison models optimized by GA and PSO.
Table 6 presents the evaluation metrics for different algorithms.
Plot the evaluation indicators as a histogram, as shown in
Figure 13.
From the comparison of hybrid models driven by different optimization algorithms in
Table 6, it can be observed that GWO-CNN-LSTM outperforms both GA-CNN-LSTM and PSO-CNN-LSTM in terms of MAE, RMSE, and R
2. Specifically, GWO-CNN-LSTM achieves the lowest MAE and RMSE. Compared with GA-CNN-LSTM, MAE and RMSE are reduced by 20.8% and 16.3%, respectively. In comparison with PSO-CNN-LSTM, the reductions reach 22.3% in MAE and 16.1% in RMSE. The R
2 value of 0.9266 further confirms that the Grey Wolf Optimizer possesses stronger global search capability and convergence efficiency during hyperparameter optimization, enabling it to identify superior network configurations and thereby improve the prediction accuracy of the model for building cooling load forecasting.