A Generalizable Hybrid AI-LSTM Model for Energy Consumption and Decarbonization Forecasting

Salem, Khaled M.; Elgharib, A. O.; Rey-Hernández, Javier M.; Rey-Martínez, Francisco J.

doi:10.3390/su172310882

Open AccessArticle

A Generalizable Hybrid AI-LSTM Model for Energy Consumption and Decarbonization Forecasting

by

Khaled M. Salem

^1,2,*,

A. O. Elgharib

¹

,

Javier M. Rey-Hernández

^3,4,5

and

Francisco J. Rey-Martínez

^2,4,5

¹

Department of Basic and Applied Science Engineering, Arab Academy for Science, Technology and Maritime Transport (Smart Village Campus), Smart Village, Giza 12577, Egypt

²

Department of Energy and Fluid Mechanics, Engineering School (EII), University of Valladolid (UVa), 47002 Valladolid, Spain

³

Department of Mechanical Engineering, Fluid Mechanics and Thermal Engines, Engineering School, University of Malaga (UMa), 29016 Málaga, Spain

⁴

Institute of Advanced Production Technologies (ITAP), University of Valladolid (UVa), 47002 Valladolid, Spain

⁵

GIRTER Research Group, Consolidated Research Unit (UIC053) of Castile and Leon, 47002 Valladolid, Spain

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(23), 10882; https://doi.org/10.3390/su172310882

Submission received: 4 November 2025 / Revised: 2 December 2025 / Accepted: 3 December 2025 / Published: 4 December 2025

(This article belongs to the Special Issue AI-Driven Smart Sensing and Non-Destructive Testing for Sustainable Innovation: Enhancing Environmental Sustainability and Technological Progress)

Download

Browse Figures

Versions Notes

Abstract

This research presents a solution to the problem of controlling the energy demand and carbon footprint of old buildings, with the focus being on a (heated) building located in Madrid, Spain. A framework that incorporates AI and advanced hybrid ensemble approaches to make very accurate energy consumption predictions was developed and tested using the MATLAB environment. At first, the study evaluated six individual AI models (ANN, RF, XGBoost, RBF, Autoencoder, and Decision Tree) using a dataset of 100 points that were collected from the building’s sensors. Their performance was evaluated with high-quality data, which were ensured to be free of missing values or outliers, and they were prepared using L1/L2 normalization to guarantee optimal model performance. Later, higher accuracy was achieved through combining the models by means of hybrid ensemble techniques (voting, stacking, and blending). The main contribution is the application of a Long Short-Term Memory (LSTM) model for predicting the energy consumption of the building and, very importantly, its carbon footprint over a 30-year period until 2050. Additionally, the proposed methodology provides a structured pathway for existing buildings to progress toward nearly Zero-Energy Building (nZEB) performance by enabling more effective control of their energy demand and operational emissions. The comprehensive assessment of predictive models definitively concludes that the blended ensemble method is the most powerful and accurate forecasting tool, achieving 97% accuracy. A scenario where building heating energy use jumps to 135 by 2050 (a 35% increase above 2020 levels) represents an alarming complete failure to achieve energy efficiency and decarbonization goals, which would fundamentally jeopardize climate targets, energy security, and consumer expenditure.

Keywords:

sustainability; traditional buildings; AI integration; energy demand; built environment

1. Introduction

Buildings that are traditional, such as older and historic ones, are usually confronted with huge difficulties in energy demand management. A lot of them were built before the introduction of modern energy efficiency standards; thus, they suffer from problems such as having no or poor insulation, using single-pane windows, and experiencing large air leakage that is uncontrolled [1,2,3]. Even though the very thick and heavy walls seen in old buildings may provide some thermal mass benefits, the overall building envelope is usually very visible, which results in a significant amount of heat being lost during winter and heat being gained during summer. This poor thermal performance leads to the need for the control and maintenance of modern comfort using Heating, Ventilation, and Air Conditioning (HVAC) systems, which are the biggest energy consumers in buildings, and this is performed through continuous and heavy usage. Therefore, these buildings are in a situation in which energy is largely wasted and, on the other hand, they are major targets for energy efficiency retrofitting, whereby such energy waste is reduced along with the operational cost and environmental impact. However, such interventions must be conducted in a way that the cultural and historical aspects of the buildings are not only maintained but also appreciated [4,5,6].

The use of Artificial Intelligence (AI) is a methodology that allows for the effective management of the energy demand issues associated with traditional buildings that respect their historical fabric. An AI-based Building Energy Management System (BEMS) can consider multiple inputs, such as weather patterns, occupancy in real time, and previous energy use, to forecast the thermal performance of the building [7,8]. As a result, heating, cooling, and lighting services can be optimized in real time at a fine level for maximum operational efficiency while ensuring thermal comfort. Further, AI can be used with digital twins (virtual representations of a real-world building) to visualize the impact of planned retrofits even prior to initiation of the retrofit project. This data-centric approach can also help identify opportunities to save energy, even if it requires only operational changes or non-disruptive, minor interventions, establishing AI as a real change agent in the field of sustainable heritage conservation [9,10,11,12,13].

1.1. Literature Review

Mohammad et al. [14] emphasized the significance of forecasting electrical energy consumption for successful management of commercial buildings. Their review contrasted classical and Artificial Intelligence (AI) forecasting methods for electricity consumption, which enabled an evaluation of the performances of both. Ultimately, the authors concluded that there are possible results that can be achieved through hybridization and that, in particular, AI and Swarm Intelligence (SI) provide a method for more accurate results. The critical review found that the hybridization of Support Vector Machine (SVMs) and SI methods was more effective than classical methods at producing building electrical-energy-consumption forecasts. Dalia et al. [15] attribute the energy performance gap to poor system management and find that AI in BEMS is key to meeting NZE goals. Their review shows office buildings have the highest energy savings potential (up to 37%) using AI for HVAC optimization. In their work, Omer et al. [16] created and analyzed six AI models (SVR, KNN, RF, MLP, GBoost, and XGBoost) for the purpose of predicting heating and cooling loads (HL and CL) in residential buildings based on eight geometric design inputs. The research was carried out with three different input options (S1, S2, and S3), and it was RF and XGBoost that were considered the best algorithms for the HL and CL during the entire duration of the study. Although S1 gave rise to very accurate predictions, Scenario-2 (S2) came out as the most advantageous one, providing a similar level of efficiency with fewer inputs and errors. The predictions of the Heating Load (HL) were always statistically better than the predictions of the Cooling Load (CL). This study acts as a reference point for the efficient application of intelligent models in forecasting residential-building energy demands [17,18,19,20,21].

Ngoc et al. [22] came up with a novel framework of combining BIM and ML to increase energy intelligence in professional office buildings in Taiwan and to cut down on emissions. They constructed the TaiBIM_EnergyOffice dataset to forecast energy use for the buildings according to their architectural and environmental factors. Their hybrid FBI_BA model achieved greater accuracy in predicting energy use (R² = 0.944), which emphasizes the feasibility of their framework for the rapid and cost-effective assessment of energy and environmental measures. Overall, this provides scalable and meaningful energy prediction from the use of the static BIM dataset in conjunction with dynamic energy use modeling in consideration of machine learning. Prades et al. [23] created an agile urban-building energy model by integrating the degree-days method, solar irradiation analysis, GIS-based geometric data (cadastral/altimetric), and a European standard methodology. By forecasting heating and cooling demand, the model helps with energy planning. It demonstrates that the building’s age, surface-to-volume ratio, and solar exposure have a significant impact. By prioritizing retrofitting and forecasting the effects of temperature rise, it was concluded that a 50% reduction in thermal demand could be achieved by retrofitting just 17% of the most demanding buildings. Predicting energy demand, evaluating the effects of climate change, and simulating the effects of district-level countermeasures like retrofitting are all made easier with this energy planning tool.

The hotel industry is heavily reliant on energy, and José et al. [24] presented the problem accompanied by the issue of energy management and the necessity of using renewable energy sources for more efficient operations. The paper investigates luxury hotels’ energy management through AI prediction model developments and uses the highly complex resort The Ritz-Carlton, Abama, in Tenerife as a case study for this purpose. The model employs a hybrid architecture with artificial neural networks to carry out daily power demand forecasting using the last 24 h of data. This forecasting feature opens a route for the formulation of commensurate actions for energy management, which will be economically beneficial and, at the same time, yield more stability. Itziar et al. [25] aimed to identify simplified and appropriate energy performance indicators (EPIs) for Spanish residential buildings during their operational stage, using the Spanish Energy Performance Certification. The study found that certified Passive House (PH) buildings were, generally, nearly zero-energy buildings (nZEBs) under the current system. Disaggregate indicators (especially heating and cooling demand) were deemed more suitable for comparing buildings across different certification schemes than aggregate indicators. The paper notes that Spain’s new Technical Building Code boosts nZEB achievement, but its focus on aggregate EPIs as limiting values may hinder the passive building concept and competition among different energy solutions [26,27,28,29,30,31,32]. A summary of the literature review is provided in Table 1.

One of the main shortcomings of the current investigation is that it has not resulted in the establishment of validated, high-resolution AI/ML energy-optimization models, which are specifically designed for traditional, non-commercial, and heritage residential buildings. Although very precise AI models are being used efficiently in modern offices and complex commercial buildings (which are equipped with sophisticated BEMS and have plenty of data), such methods have not been completely adapted to the unique difficulties of older houses. These difficulties consist of insufficient data, thermal complexity, and the very strict requirement for non-invasive solutions that would comply with heritage preservation constraints, thus leaving a crucial area of energy demand management without attention.

1.2. Contributions

Using a MATLAB developed AI framework, this research provided a breakthrough in sustainable heritage preservation area through its application to traditional-building energy performance analysis in Madrid, Spain. The experiment set a detailed standard for the evaluation of six different types of AIs—which included ANN, RF, XGBoost, RBF, Autoencoder, and Decision Tree—so that the old building’s complicated heating and cooling needs could be perfectly reflected. Most importantly, it then merged the hybrid ensemble techniques (voting, stacking, and blending) to achieve better prediction accuracy and robustness than any single model, thereby making the most out of the individual models’ capabilities. The research used Long Short-Term Memory (LSTM), the deep learning model that is perfect for time-series forecasting, to predict the building’s energy consumption and, consequently, its carbon footprint over 30 years until 2050. The whole forecasting period is considered very important because it enables the identification of the best non-invasive energy efficiency measures, as well as the development of a data-driven strategy, to achieve significant carbon footprint reduction, thereby offering a revealing model for the management of the energy transition of historic building stock. The main contributions of this paper can be summarized as follows:

Systematically evaluated and benchmarked six individual Artificial Intelligence (AI) models (ANN, RF, XGBoost, RBF, Autoencoder, and Decision Tree) to accurately model the complex energy demands of a traditional structure.
Utilized advanced hybrid ensemble methods (voting, stacking, and blending) to combine the strengths of the individual AI models, achieving a highly robust and accurate prediction capability.
Employed the Long Short-Term Memory (LSTM) deep learning model to project the building’s energy consumption and carbon footprint over a 30-year horizon until 2050.
Provided a data-driven strategy and a replicable framework for identifying optimal, non-invasive energy efficiency measures needed to achieve substantial carbon-footprint reduction targets in historic and traditional building stock.

2. Methodology

2.1. Data Collection

The study utilized a comprehensive dataset comprising 100 data points randomly sampled from various sensors installed within a traditional building in Madrid, Spain, as shown in Figure 1. This approach of random sampling ensured that the collected data, while limited in volume, represented a diverse set of input features corresponding to different environmental, occupancy, and system parameters, capturing a broad and unbiased snapshot of the building’s operational characteristics. The data’s high quality was a crucial starting point, as an initial inspection confirmed the absence of missing values and outliers. This meticulous data integrity meant the subsequent modeling process could proceed directly without the need for complex imputation or robust anomaly removal techniques, saving considerable computational time and ensuring the reliability of the raw sensor readings.

Data Normalizing

The core of the predictive framework is a limited but high-quality seasonal dataset collected throughout the year 2020, specifically detailing the operational characteristics essential for modeling the heating performance of the traditional building. A collection period spanning an entire year is critical, as it inherently incorporates the full annual cycle of thermal fluctuations and seasonal demand variations, enabling the models to learn robust generalizable patterns rather than just short-term trends. These data, while consisting of only 100 data points, were rigorously managed to ensure maximum learning potential: initial inspection confirmed the absolute presence of no missing values or outlier readings. This meticulous pre-cleaning process was non-negotiable, guaranteeing that the complex subsequent AI and hybrid ensemble models (such as LSTM and XGBoost) were trained on reliable inputs, which is essential to prevent erroneous correlations and maximize the predictive efficacy required for developing a high-fidelity, long-term carbon reduction strategy.

The issue of strong data scaling was tackled by normalization applying L1/L2 penalties. This preprocessing operation is important since the various sensor measurements—including temperature, humidity, energy consumption, and occupancy counts—are very different in scale and range. If normalization was not performed, the model would be biased and the learning process would be affected, as features with larger numbers would have more influence than others. L1/L2 penalties not only rescaled the features to a similar range but also added a small amount of regularization in a way that was not noticeable.

This strategic preprocessing was essential for the successful deployment of the chosen complex AI and hybrid ensemble models (ANN, RF, XGBoost, etc.). By ensuring that the limited, yet high-quality, 100-point dataset was clean (no missing values/outliers) and properly scaled via L1/L2 normalization, the study maximized the learning potential of the algorithms. This foundation of robust data management was critical to enabling the models to efficiently converge and generalize effectively, ultimately leading to the highly accurate long-term energy forecasts required for developing the carbon reduction strategy for the traditional building.

2.2. Model Development

Predicting energy demand is essential for effectively managing energy resources and ensuring their efficient allocation. To enhance the precision of these predictions, various machine learning models have been developed. This section focuses on introducing the following six models: Artificial Neural Network (ANN), Random Forest, XGBoost, Radial Basis Function (RBF), Autoencoder, and Decision Tree. Artificial Neural Networks (ANNs) are designed to emulate how the human brain functions. They are composed of layers of interconnected nodes (neurons) and are notable for their capability to process data. The model learns from input data and adjusts the connection weights through a process known as backpropagation, which is a fundamental aspect of its functionality.

Due to their capacity to manage complex non-linear relationships, ANNs are widely regarded as suitable tools for forecasting energy demand. Random Forest is a type of ensemble learning that generates multiple decision trees during training and determines the outcome by either taking the most frequent class or averaging the predictions from the trees. This approach enhances accuracy and mitigates overfitting, making it a strong candidate for predicting energy demand, especially when dealing with large datasets that include many variables [33,34,35,36,37].

XGBoost (Extreme Gradient Boosting), a distinct ensemble technique, leverages the use of decision trees. By enhancing the gradient boosting framework, it equips models with improved predictive capabilities. XGBoost has rightfully gained recognition as one of the fastest and highest-performing methods in the forecasting domain, resulting in its extensive use in energy demand prediction. Conversely, Radial Basis Function (RBF) networks, which are part of the artificial neural network’s family, utilize radial basis functions for their activation mechanism. This characteristic makes them particularly adept at interpolation and function approximation, thereby providing a robust forecasting ability for energy demand predictions. Autoencoders are a type of neural network specifically designed for unsupervised learning tasks.

They function by compressing input data with the encoder and then reconstructing it with the decoder. In the context of energy demand forecasting, autoencoders can be an invaluable resource for feature extraction and dimensionality reduction, leading to enhanced performance for the other models being used. Decision Trees are straightforward yet powerful models that segment the dataset into smaller subsets based on feature values. They produce results that are easily interpretable and are effective at handling both numerical and categorical data. Decision Trees serve as a highly efficient tool in energy demand forecasting, particularly when interpretability is crucial [38,39,40,41,42].

To evaluate the effectiveness of these models, several key performance indicators (KPIs) are frequently employed. The Root Mean Square Percentage Error (RMSPE), expressed as a percentage, reflects how closely the predicted values align with the actual values. The Mean Absolute Percentage Error (MAPE) represents the average absolute percentage error between predicted outcomes and actual observations, serving as an indicator of the precision of the forecasts. Kling–Gupta Efficiency (KGE) is a dimensionless metric that accounts for correlation, bias, and variability, which can be used to gauge overall model performance. Nash–Sutcliffe Efficiency (NSE) assesses the model’s predictions against observed data, where a value of 1 signifies a perfect prediction. The coefficient of determination (R²) measures the extent to which the independent variables explain the variance in the dependent variable, with values approaching 1 indicating superior model performance [43,44,45,46].

The mathematical model for ANN is provided in Table 2, explaining the model architecture covering the input layer, hidden layer with activation function, output layer, and loss function application during training. This model structure is important to comprehend how ANNs function in relation to energy demand predictions. The mathematical model for Random Forest is included in Table 3, describing the ensemble approach implemented in this model, noting how multiple decision trees vote on the final prediction. The mathematical structure for the XGBoost model is discussed in Table 4, and in explaining it, gradient boosting is incorporated to improve the predictive accuracy. The Radial Basis Function (RBF) model is introduced in Table 5, noting the use of radial basis functions—as the activation function—for interpolation and function approximation. The Autoencoder model is outlined in Table 6, including the architecture of an encoder–decoder for feature extraction and dimensionality reduction. The mathematical model for Decision Trees can be viewed in Table 7, disclosing how features are utilized from the data to split and make predictions. Also, in Table 8, the ensemble methods are provided, including voting, stacking, and blending, explaining that ensembles make better predictions through multiple models’ performance. Table 9 presents the LSTM future demand. Lastly, same as in the first example, Table 10 provides a discussion by key performance experts (KPEs), who evaluated the optimal models’ performance, including the use of root mean square percentage.

2.3. Optimization Procedures

A very detailed and well-organized method of data handling and model selection was applied in the whole process of energy demand forecasting to achieve the maximum efficiency possible and to make the models directly comparable to each other. The original 365-day dataset was first divided into the training and testing sets, which were fixed in size, so that 70% of data was used for training and 30% for testing. This fixed separation was very important to set a fair benchmark, which allowed for assessing the predictive performance of each model directly on unseen data. After the split, hyperparameter tuning was carried out with the utmost care for each algorithm individually. In the case of the Artificial Neural Network (ANN), a two-layer, 5-neuron-each architecture was selected. The training was conducted with the setting of a maximum of 2000 epochs, learning rate of 0.01, performance goal of 1× 10³ and validation failure tolerance of 100, which ensured that the ANN was optimally trained for convergence and generalization.

All of the foundational models were optimized by applying the selected hyperparameters for them to boost their respective advantages. For example, while tuning the Random Forest model, we set the number of decision trees to 100 to balance the costs versus predictive fidelity and to minimize the potential overfitting. Normally, we applied the standard Decision Tree model with no additional hyperparameter tuning as an interpretable baseline to assess models against it. The Autoencoder, for the purpose of latent feature extraction and data compression, was implemented with a hidden layer of 15 neurons and was provided with the same maximum training limit in epochs (2000). Thus, each model was systematically tuned from the deep, iterative nature of the ANN and Autoencoders to the ensemble nature of Random Forest to create a powerful highly optimized sets of individuals. Each predictor and it had an individual specific predictive capacity to contribute to the combined predicted value.

The optimization process has a final and most advanced step, which is the realization of a hybrid ensemble modeling strategy that delivers the best forecasting performance. The method used here is an efficient one, and it involves a higher-level framework receiving the outputs from the individual models that were optimized before (ANN, Random Forest, Decision Tree, and Autoencoder) as its inputs. The predictions of these individual models are then combined using established techniques, such as voting, stacking, or blending, whereby the hybrid ensemble benefits from the collective advantages of model diversity while, at the same time, reducing the variance and errors associated with each predictor. This process, which works similarly to a meta-learner, aims to significantly increase such good overall accuracy and stability of the energy demand forecasts that it would become almost impossible to detect the difference between the model output and the actual power consumption figure. So, the main point is overcoming the limitations of individual models and producing a forecasting tool that is not only efficient but also very robust, which is a prerequisite for long-term consumption planning and carbon reduction analysis. The AI workflow is shown in Figure 2.

The selection process for the six main Machine Learning (ML) models, encompassing Artificial Neural Network (ANN), Random Forest (RF), XGBoost, Radial Basis Function (RBF), Autoencoder, and Decision Tree, concentrated on the necessity for a thorough, comparative evaluation over the entire range of predictive models. The specified models aimed at examining different methodologies in dealing with the intricate, non-linear, and time-dependent factors of the building energy data to the fullest. Tree-based ensemble versions (RF and XGBoost) were chosen for their extraordinary strength, high computational skill, and capability to engage in non-linear feature interplay without extensive preprocessing. Neural Networks (ANN and RBF) were used to evaluate the performance of gradient-based deep and shallow learning, with the ANN taking a classical multi-layer approach and the RBF providing a different kernel-based, distance-sensitive mapping strategy. The Autoencoder rounded off the list, being used for determining the importance of implicit feature extraction and dimensionality reduction to the framework.

This wide-ranging evaluation not only creates a competitive performance benchmark but also yields the needed various outputs (low bias, low variance, and high variance) for the further development and optimization of the advanced hybrid ensemble (stacking and blending) methods, therefore finally securing a setup based on the best-informed foundation.

3. Result & Discussion

This Figure 3 demonstrates a complete evaluation of six machine learning models, Artificial Neural Network, Random Forest, XGBoost, Radial Basis Function, Autoencoder, and Decision Tree, on predicting a continuous target variable. In each subplot, you can see a scatter plot of predicted values versus actual values, with each point marked with a blue circle and a red regression line indicating the trend. The x-axis and y-axis are similarly scaled from 0 to 30 for predicted and actual values to allow for direct visual comparison of the models. This arrangement demonstrates a visual representation of their predictive performance accuracy and resulting error distributions quite effectively.

The regression lines and the distribution of points across the models illustrate different levels of performance. Artificial Neural Network, Random Forest, and XGBoost show relatively tight distributions of the points around the regression line, suggesting better predictive performance and fewer outlying points. If we look at the Radial Basis Function and Autoencoder, there are greater overall distributions of the points, particularly at higher magnitudes, which indicates a variance in predictive accuracy, as well as potential overfitting or underfitting by the models. The Decision Tree model exhibited a moderate performance, with clear evidence of deviation from the ideal y = x line that indicates better points for predictive performance accuracy.

In Figure 4, Ensemble methods, such as voting, stacking, and blending, are applied to utilize multiple diverse models to improve the predictive performance beyond what a single model can provide. Voting ensembles combine predictions by majority vote or averaging, so they offer a robust and straightforward way to combine predictions. More advanced ensemble techniques (stacking and blending) add in a meta-learner, which is just another model that is trained on the predictions of the base models, although stacking often takes cross-validated predictions for this training, while blending uses a simple holdout set. Both stacking and blending train a meta-learner to intelligently weigh the predictions of the base models to obtain a superior prediction.

Hybrid ensemble voting is demonstrated in the Figure 4 as a case where multiple predictive models cooperate and produce a more accurate outcome. More precisely, the method combines the predictions of different models, thus letting each one impart its strengths in solving the general issue. Using a voting system—where every model’s prediction counts toward the outcome—this technique reduces the biases of individual models and increases the reliability of predictions. The image probably gives the impression of various prediction paths meeting in the middle, thus confirming that disparate model outputs can collaboratively lead to a single and enhanced prediction.

The figure also shows stacking—a process by which multiple base models make predictions that are then combined by a meta-learner. The meta-learner is trained to leverage the signals from the base models to create the best combination of base model outputs, which can be diverse. The figure may show the relationship between the base models and the meta-learner, depicting how the meta-learner uses base predictions to generate a final output. Stacking is a useful method of capturing complex patterns in your data, as well as interdependencies among each base prediction, leading to a more careful decision than using any single model.

Blending, the technique depicted in the picture, does not require a separate meta-learner, rather it merges the predictions made by different models through averaging or weighting according to their accuracies. The image could present a simple mixing of results, thus making it easy to implement and fast this technique. Although blending might produce results that are up to standard, the process usually involves using a holdout set for validation instead of training a meta-learner, as in stacking. This aspect of blending makes it a feasible option for many use cases; however, the intricate performance that stacking can deliver in complicated situations may be the downside of blending.

The diagram shows the performance metrics for the following three ensemble techniques: hybrid ensemble voting, stacking, and blending, each showing different levels of accuracy. Hybrid ensemble voting has an accuracy of 85%, as it combines the predictions of individual models to minimize bias and variance. Stacking gives the best level of accuracy, achieving an impressive 90% as the meta-learner can effectively leverage the differences in outputs from each of the base models. Blending is 97% accurate, which shows it has strong predictive power, but not as strong of performance as stacking. These figures highlight the potential benefits of ensemble techniques to enhance predictive performance across different scenarios.

Based on the data provided in Figure 5 analyzing a traditional building’s energy demand, this comparative study evaluates the performances of nine different machine learning models—including ANN, RF, XGBoost, RBF, AUTO, Tree, voting, stacking, and blending approaches—in predicting energy consumption metrics, where the left panel demonstrates that simpler models like Tree and ANN achieved lower RMSE and MAE values (below 2) for heating demand prediction, indicating better accuracy in forecasting overall energy consumption, while the right panel reveals that ensemble methods like voting and stacking excelled in statistical performance metrics (KGE, NSE, and R², approaching 0.9) for bimodal demand patterns, suggesting these advanced techniques better capture the complex, dual-mode energy-usage characteristics typically found in traditional buildings with combined heating and operational energy requirements.

The superior performance of the blended ensemble model (97% accuracy and 0.9999 correlation) can be attributed to its intrinsic ability to mitigate the individual weaknesses of the base learners while capitalizing on their strengths. The blending technique, particularly the optimal weight allocation to base model predictions, effectively dampens the variance and corrects the systematic bias that persists in single models, thereby producing a smoother, more generalized prediction surface capable of capturing the most complex, non-linear dependencies in the energy and CO₂ data. Conversely, the fundamentally poor performance of the Radial Basis Function (RBF) model (0.2772 correlation) is likely exacerbated by the limited dataset size of only 100 data points. We hypothesize that the failure may stem from challenges in optimal kernel width selection, which is critical for defining the neighborhood of influence, or a high sensitivity to feature rescaling, preventing the RBF kernel from accurately mapping the complex, high-dimensional input features to the output space; this is a common issue with small datasets where the kernel cannot establish robust local relationships, highlighting the RBF’s limitation in modeling the heterogeneous nature of building energy consumption compared to the robust, hierarchical learning of the tree-based and deep neural network models.

The correlation heatmap of the model predictions reflects that nearly all machine learning models have a very strong, positive correlation with the actual target value, as shown in Figure 6. Models such as ANN, Autoencoder, Decision Trees, and ensemble methods including voting and blending have correlation coefficients above 0.99, indicating that their predictions are nearly on target with the true value. This suggests that these models modeled the data well and provided reasonably close predictions. The algorithms’ ability to consistently perform strongly is indicative of their ability to successfully model the predictive task at hand. The Radial Basis Function (RBF) model, on the other hand, demonstrates a very weak correlation to both the true values and the other model predictions. The correlation with the true value is as low as 0.2772, and the RBF model has a correlation value less than the majority of the other models, which are all below 0.6, indicating quite likely that RBF is an outlier. This low correlation suggests that either RBF is a very poor model or is modeling potentially entirely different features of the data that the true target variable (or the mean of other model predictions) was not.

The ensemble methods, which consist of voting and stacking, exhibited correlation patterns of interest. On the one hand, voting maintained very high correlations (for instance, 0.9764 with the actual values), while, on the other hand, stacking revealed a moderately strong but comparatively lower correlation (for example, 0.6852 with the actual values). This discrepancy implies that the stacking algorithm could be integrating model predictions in a manner that introduces some variance or is relying more on the weaker base models, like RBF, while the voting method likely derives its strength from the strong consensus of the top-performing models.

In addition, the inter-correlations among the top models (ANN, XGB, RF, Autoencoder, and Tree) were so high that they hinted at the possibility of the models making similar errors or capturing redundant data. This redundancy might weaken the advantage of ensemble methods that rely on model diversity. The outstanding performance of the blending ensemble, which is almost perfectly correlated (0.9999) with the actual values, reflects its capability to use the strengths of the individual models to generate accurate predictions, possibly through optimally assigning weights to their contributions.

Using the Taylor diagram provided in Figure 7, which includes a comparison of the different machine learning models, certain major conclusions of the performance characteristics can be drawn. The diagram maps out the model performances using the three metrics mentioned above: the radial distance from the center indicates the spread of the model predictions, the angular position specifies the correlation with the actual values, and the distance from the reference point (i.e., “Perfect Model”) indicates the centered root-mean-square difference. The ideal model would be positioned at the reference point, where a red cross marks approximately 315 degrees.

The models show different clustering groups and can be easily separated. Artificial Neural Network (ANN), Random Forest (RF), XGBoost (XGB), and Autoencoder (AutoEnc) have formed a cluster in the upper-right quadrant of the model performances, with an overall strong performance with higher correlation coefficients and only moderate standard deviations. In contrast, the RBF and Tree models are positioned quite differently than the other models, most likely indicating different performance characteristics. The voting and stacking ensemble methods, as well as the blended model, feature some of the most varied positioning. Intriguingly, the blended model was one of the closest models to the reference point for a perfect model—most other algorithms were not as close to the perfect model coordinates.

The Taylor diagram illustrates that the ensemble methods operate distinctly, with stacking and blending operating much better when proximity to the ideal reference point is greater, demonstrating that the two methods are better at replicating the observed data’s pattern and variability. They do this skillfully through their meta-learning scheme. Stacking employs a second-level model to optimally weight the base learner’s predictions together, while blending uses a holdout (validation) set to train the aggregator. In this way, stacking and blending methods can pull together the best elements of the various models in a more intelligent way that goes beyond the simplicity of weighting models together, and, instead, intelligently consider consensus based on learned accuracy. The voting method is certainly reliable and within a decent tier of ability, but as it combines model outputs based on a laid observed averaging or a majority rule vote, it is lastly not as reliable because it does not incorporate the nuance weighting that the stacking and blending methods use, so they generally have higher correlations and fewer errors.

Heating in traditional buildings releases CO₂, which greatly adds to the overall greenhouse gases produced worldwide, especially in cases where these buildings use natural gas for heating systems. The factor of 0.025 kg CO₂ per energy unit consumed is vital for establishing the carbon footprint of a building. This usually results in the total annual emissions being considerable, since the use of energy for heating, both space and water, can be over half of the total energy consumption of a building. The actual emissions data, as illustrated in the graph, show quite a wide variation from about 0 to 7 kg, which are probably the total emissions computed for the different buildings or periods, indicating that gas-fired heating systems have variable but consistently present environmental impacts, as shown in Figure 8.

Examining the performance of the voting, stacking, and blended machine learning models against the same emissions indicates the potential of AI to enhance energy management and emissions forecasting. The “Actual” data column is labeled as the ground truth and shows the real-world variability that the models will have to reproduce. The predictions from the “Voting” model (blue) seem to follow the general shape of the actual data but noticeably deviate from the actual values for several data points. This suggests that the “Voting” model is averaging the predictions of its base learners with moderate success but does not possess the accuracy to meet all cases, especially those with higher emissions.

Conversely, the predictions made by the “Stacking” ensemble (green) exhibit a somewhat different behavior. The predictions generally appear to cluster together more tightly in the mid-range, while significantly underpredicting some of the higher actual values of emissions. This suggests that while the meta-learner from the stacking model is able to better synthesize the patterns than just a simple vote for many of the instances, it may have more difficulty capturing outliers or the upper extreme of the emissions spectrum, which could be a function of smoothing out some of the more extreme peaks of energy use and emissions.

The “Blended” model (red) seems to be the one that gives the most visually similar predictions to the actual data considering the whole range of emissions. The blending technique has been able to draw the close relationship between the input features and the CO₂ output through the proximity of so many red dots to the yellow “Actual” points. This is evidenced by the model’s performance which implies a more durable and adaptable model that can accurately predict the carbon footprint of gas-heated buildings more times than not, which is very important for the strategy of targeted reductions. In the end, the precise prediction of CO₂ emissions, as proven by the great performance of the blended model, is the number one requirement for the decarbonization of the building sector. Stakeholders will be able to find inefficient buildings, improve the performance of their heating systems, and check if the retrofitting or changing of habits has been effective by relying on emission forecasts. The transition from natural gas with its 0.025 kg CO₂ per unit emission factor to electric heat pumps or renewable energy sources is the only long-term solution. Meanwhile, sophisticated forecasting models like these are the main tools for management and considerable reduction in the environmental impact of our current building stock.

The ensemble machine learning models tested here vary in their efficiencies to estimate the environmental position based on the natural-gas emission analysis on existing buildings. The blended model is the most impressive in its predictions, as it seems to come closest to aligning the predictions with the actual emissions across the ranges examined. It is noteworthy that the emissions factor employed (0.025 kg CO₂ emitted per unit of energy input) demonstrates the carbon intensive nature of gas heating and, ultimately, that is what each model seeks to communicate. The voting ensemble, which is a reasonable benchmark model and creates a baseline by averaging its predictions, and the stacking model, which incorporates a meta-learned estimator into its methodology, are both robust models. However, there are challenges in fitting these models to extreme values and demonstrating extreme values reliably as the outputs need more advanced methods to demonstrate efficiently. While each model’s estimates of predictions advances the models capacity to identify and remediate inefficiencies in the existing building stock and implement a decarbonization strategy, especially the blended model, essentially requiring significant foresight into the building’s environmental impact before a full conversion to renewable energy is adopted.

The visualization contains two different line graphs projecting energy consumption for heating from 2021 to 2050 with an LSTM (Long Short-Term Memory) model, with the top graph entitled “Projected Energy Consumption 2021–2050”, has an absolute forecast that starts with a 2020 baseline of 100% and shows a clear downward trend with dramatic energy consumption over 30 years. The bottom graph shows the annual percent change in humidity, indicating volatility and the change rate year after year, with significant deviation before settling, as shown in Figure 9.

One of the important aspects of such forecasts is the application of the color gradient to portray the passage of time. The progression of the line starts at blue for the immediate future (e.g., 2023) and slowly moves from purple to red for the far-off projections (e.g., 2050). This color scale is very effective in conveying the uncertainty that is characteristic of forecasting; the longer the time horizon the less certainty there is about the predictions. On the right-hand side, the legend displays each year along with its corresponding color in a detailed manner, so that the forecast for any particular year can be accurately interpreted.

The story illustrated by these charts is one of a meaningful energy transition. The steady decline in the top chart suggests energy efficiency measures have been implemented successfully, a transition to more efficient heating systems has taken place, or energy sources with better conversion efficiencies have been utilized. The considerable volatility in the annual change chart, particularly at the beginning of this process, suggests a time of rapid adoption of technology and market volatility before the new energy revolution reaches a more stable state of gradual gain towards the 2050 vision.

In complete contrast to the projected decline, a scenario where energy use for heating jumped to 135 by 2050 would amount to a significant and alarming movement away from trends we currently see. A 35% increase above 2020 levels would reflect a complete failure to retrofit buildings for energy efficiency and to decarbonize the heating sector. Some of the catalysts of this trajectory could be a rapid rise in global energy demand, increasing dependence on fossil fuels, slow retrofitting of building stock, and the effects of extreme weather, which may require higher heating load expectations. In short, the impact of this escalation in energy consumption would fundamentally jeopardize climate goals, energy security, and consumer energy expenditure and reflect the profound importance of the aggressive efficiency commitments and clean energy transition that the LSTM original modeling outlook demonstrates.

4. Conclusions

The assessment of machine learning algorithms and ensemble methods intended for predicting building energy consumption and CO₂ emissions offers a clear, definitive characterization of their performance and reveals the paramount importance of machine learning in reducing building carbon footprints. Specifically, Artificial Neural Networks (ANNs), Random Forest (RF), and XGBoost (XGB) performed strongly as base models. Their generated predicted values clustered tightly along the perfect regression line, and their correlation coefficients with actual observed values were near perfect (0.99). In contrast, the Radial Basis Function (RBF) model was an outlier, exhibiting fundamentally poor performance with a low correlation of 0.2772. This finding underscores the importance of model selection, as tree-based algorithms (RF and XGB) or Artificial Neural Networks (ANNs) consistently proved to be superior in capturing the complex, non-linear relationships driving building energy use.

Ensemble methods led to a predicted power increase by intelligently combining the base models. While the voting ensemble set a dependable baseline accuracy (85%), the stacking and blending techniques were significantly more advanced and superior in performance. The blended model was particularly exceptional, corroborated visually by the Taylor diagram, which placed it at the “perfect model” reference point. Its remarkable performance of 97% accuracy and an almost absolute correlation (0.9999) with actual emissions data are proof of its capacity to optimally merge the base learners’ predictions. The blended model now stands as the most powerful and accurate carbon footprint forecasting tool for buildings. This high-fidelity forecasting capability is vital for efficient and precise energy management through targeted actions, providing policymakers and urban planners with the precise data needed to set realistic, verifiable decarbonization targets for the existing building stock and enabling building owners and facility managers to leverage exceptional accuracy to optimize heating schedules, prioritize cost-effective retrofits, and minimize operational energy waste. Moreover, the predictive framework contributes to advancing existing buildings toward nZEB performance by supporting more precise control of energy demand and operational emissions.

The current study’s limitations stem primarily from the data scope and model interpretability: specifically, the high-accuracy blended ensemble model was trained on a relatively limited 100-point seasonal dataset from a single building, which inherently restricts the immediate generalization of the framework to diverse building types and only addresses the heating-related carbon footprint. Moving forward, future work must focus on enhancing the model’s scalability and robustness by validating it against a larger, more varied portfolio of building data; integrating Explainable Artificial Intelligence (XAI) techniques to open the black-box nature of the ensemble and provide transparent, actionable insights to building operators; and developing a real-time predictive control system to translate the exceptional forecast accuracy into automated, energy-saving operational actions.

Author Contributions

Conceptualization, K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; methodology, K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; software, K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; validation, K.M.S.; formal analysis, K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; investigation, K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; resources, K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; data curation K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; writing—original draft preparation, K.M.S.; writing—review and editing, K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; visualization, K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; supervision, A.O.E., and J.M.R.-H.; project administration, F.J.R.-M. and J.M.R.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Acknowledgments

We would like to acknowledge the support received from “LIFE23-CET-Re-Energize” European Project by University of Málaga, Spain, “Lime4Health” National Project by Technical University of Madrid (UPM), RED-”TRAPECIO” IberAmerican Project by CYTED, and ITAP Research Institute at University of Valladolid. We would like to acknowledge the use of MATLAB (Version R2018a, MathWorks, https://www.mathworks.com (2024)) for data analysis and visualization in this study. Additionally, the images included in this document were created by the authors and are original works.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

List of symbols
Variable	Description
X	Input vector: represents the input features of the neural network, where $n$ is the number of input parameters
$z$	Weighted sum
$W^{1}$	Weight matrix for connections from input to the first hidden layer
$b^{1}$	Bias vector for the first hidden layer
$L$	Loss value
$N$	Number of samples
$Y$	Output variable
$\hat{y}$	Predicted demand
$g$	Activation function of the output layer
$E_{actual}$	Actual demand
$δ$	Error term for the layers
$f^{'} z$	Derivative of the activation function.
$η$	Learning rate code for controlling the step size
$h^{1}$	Activations from the first hidden layer
$f_{i} (X)$	Represents the prediction from each tree in the forest
$a r g {m a x}_{j \in J} (G a i n (j))$	Node splitting
${\hat{Y}}_{l e a f}$	Leaf predictions
$I m p o r t a n c e (X_{k})$	Feature importance
$X_{k}$	Each feature
OOB	Out-of-bag observations, which are instances not included in a tree’s bootstrap sample, used for performance estimation
$K$	Sum of the predictions from all decision trees in the ensemble
$L o s s (Y_{i}, {\hat{Y}}_{i})$	Error between the actual values, $Y_{i}$ , and the expected values, $Y_{i}$ ,
$Ω (f_{k})$	Regularization of Term
$T$	Number of leaves in the tree
$λ$	Weight of the leaf
$G a i n (X_{j}, s p l i t)$	objective function
$ϕ_{j} (x)$	Activation of the $j$ -th neuron in the hidden layer
$c_{j}$	Center of the $j$ -th RBF
$σ_{j}$	Spread (width) of the RBF
$M$	Number of training samples
$W_{e}$	Weight matrix of the encoder
$b_{e}$	Bias vector of the encoder
$σ$	Activation function (e.g., sigmoid, ReLU)
$W_{d}$	Weight matrix of the decoder
$b_{d}$	Bias vector of the decoder
$\hat{x}$	Reconstructed output
\|\| • \|	Norm (typically L2 norm).
$A$	Attribute being tested
$D_{v}$	Subset of data for value v
$\| D \|$	Size of dataset, $D$
$p_{k}$	Fraction of class k in dataset, $D$
List of abbreviations
ANNs	Artificial Neural Networks
RFs	Random Forests
XGBoost	Extreme Gradient Boosting
RFB	Radial Bias Function
RMSPE	Root Mean Square Percentage Error
MAPE	Mean Absolute Percentage Error
KGE	Kling–Gupta Efficiency
NSE	Nash–Sutcliffe Efficiency
R²	Coefficient of Determination
nZEB	Nearly Zero-Energy Building
AI	Artificial Intelligence
MLR	Multiple Linear Regression
GDP	Gross Domestic Product
HVAC	Heating, Ventilation, and Air Conditioning
KNN	K-Nearest Neighbor Algorithm

References

Rey-Hernández, A.; San José-Alonso, J.; Picallo-Perez, A.; Rey-Martínez, F.J.; Elgharib, A.O.; Rey-Hernández, J.M.; Salem, K.M. A Predictive Approach for Energy Efficiency and Emission Reduction in University Campuses. Appl. Sci. 2025, 15, 9419. [Google Scholar] [CrossRef]
Salem, K.M.; Rey-Martínez, F.J.; Elgharib, A.O.; Rey-Hernández, J.M. Energy Demand Forecasting Scenarios for Buildings Using Six AI Models. Appl. Sci. 2025, 15, 8238. [Google Scholar] [CrossRef]
Salem, K.M.; Rey-Hernández, J.M.; Elgharib, A.O.; Rey-Martínez, F.J. Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions. Appl. Sci. 2025, 15, 6806. [Google Scholar] [CrossRef]
Xue, Q.; Wang, A.; Jiang, S.; Wang, Z.; Yang, Y.; Cheng, Y.; Zheng, Z. Stochastic Optimization of Energy Systems Configuration for Nearly-Zero Energy Buildings Considering Load Uncertainties. Renew. Energy 2025, 243, 122610. [Google Scholar] [CrossRef]
Shen, P.; Li, Y.; Gao, X.; Zheng, Y.; Huang, P.; Lu, A.; Gu, W.; Chen, S. Recent Progress in Building Energy Retrofit Analysis under Changing Future Climate: A Review. Appl. Energy 2025, 383, 125441. [Google Scholar] [CrossRef]
Zhan, X.; Zhang, W.; Chen, R.; Bai, Y.; Wang, J.; Deng, G. Non-Dominated Sorting Genetic Algorithm-II: A Multi-Objective Optimization Method for Building Renovations with Half-Life Cycle and Economic Costs. Build. Environ. 2025, 267, 112155. [Google Scholar] [CrossRef]
Liu, C.; Sivasubramaniam, A.; Kandemir, M. Optimizing Bus Energy Consumption of On-Chip Multiprocessors Using Frequent Values. In Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, Coruna, Spain, 11–13 February 2004; IEEE: New York City, NY, USA, 2004; pp. 340–347. [Google Scholar]
Agarwal, A.; Li, H.; Roy, K. DRG-Cache: A Data Retention Gated-Ground Cache for Low Power. In Proceedings of the 39th Annual Design Automation Conference, New Orleans, LA, USA, 10–14 June 2002; pp. 473–478. [Google Scholar]
Faghri, S.; Tahami, H.; Amini, R.; Katiraee, H.; Langeroudi, A.S.G.; Alinejad, M.; Nejati, M.G. Real-Time Energy Flexibility Optimization of Grid-Connected Smart Building Communities with Deep Reinforcement Learning. Sustain. Cities Soc. 2025, 119, 106077. [Google Scholar] [CrossRef]
Noorollahi, Y.; Zahedi, R.; Ahmadi, E.; Khaledi, A. Low Carbon Solar-Based Sustainable Energy System Planning for Residential Buildings. Renew. Sustain. Energy Rev. 2025, 207, 114942. [Google Scholar] [CrossRef]
Liu, J.; Chen, J. Applications and Trends of Machine Learning in Building Energy Optimization: A Bibliometric Analysis. Buildings 2025, 15, 994. [Google Scholar] [CrossRef]
Xie, B.-C.; Wanke, P.F.; Wang, H.; Chen, X.-P. Environmental Assessment of Energy Structure Transition. Environ. Impact Assess. Rev. 2026, 117, 108181. [Google Scholar] [CrossRef]
Li, J.; Huang, L.; Zhai, J.; Wang, S. Reforming Trade Governance for Sustainable Resource Flows: Ecologically Unequal Exchange in Pan-Eurasia. Environ. Impact Assess. Rev. 2026, 118, 108247. [Google Scholar] [CrossRef]
Daut, M.A.M.; Hassan, M.Y.; Abdullah, H.; Rahman, H.A.; Abdullah, M.P.; Hussin, F. Building Electrical Energy Consumption Forecasting Analysis Using Conventional and Artificial Intelligence Methods: A Review. Renew. Sustain. Energy Rev. 2017, 70, 1108–1118. [Google Scholar] [CrossRef]
Ali, D.M.T.E.; Motuzienė, V.; Džiugaitė-Tumėnienė, R. Ai-Driven Innovations in Building Energy Management Systems: A Review of Potential Applications and Energy Savings. Energies 2024, 17, 4277. [Google Scholar] [CrossRef]
Alawi, O.A.; Kamar, H.M.; Yaseen, Z.M. Optimizing Building Energy Performance Predictions: A Comparative Study of Artificial Intelligence Models. J. Build. Eng. 2024, 88, 109247. [Google Scholar] [CrossRef]
Abbasimehr, H.; Paki, R.; Bahrini, A. A Novel XGBoost-Based Featurization Approach to Forecast Renewable Energy Consumption with Deep Learning Models. Sustain. Comput. Inform. Syst. 2023, 38, 100863. [Google Scholar] [CrossRef]
Zhu, Z.; He, K. Prediction of Amazon’s Stock Price Based on ARIMA, XGBoost, and LSTM Models. Proc. Bus. Econ. Stud. 2022, 5, 127–136. [Google Scholar] [CrossRef]
Bitirgen, K.; Filik, Ü.B. Electricity Price Forecasting Based on Xgboost and Arima Algorithms. BSEU J. Eng. Res. Technol. 2020, 1, 7–13. [Google Scholar]
Yucong, W.; Bo, W. Research on Ea-Xgboost Hybrid Model for Building Energy Prediction. In Proceedings of the Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2020; Volume 1518, p. 012082. [Google Scholar]
George, J.; Yadav, J.; Nair, A.M.; Peter, M.V.; Alapatt, B.P.; Baby, R. Improving Groundwater Forecasting Accuracy with a Hybrid ARIMA-XGBoost Approach. In Proceedings of the 2024 3rd International Conference for Advancement in Technology (ICONAT), Goa, India, 6–8 September 2024; IEEE: New York City, NY, USA, 2024; pp. 1–7. [Google Scholar]
Nguyen, N.-M.; Wiratama, F.; Sulalah, A. Enhancing Energy Intelligence in Taiwanese Office Buildings: Utilizing a Novel BIM-Derived Dataset for AI-Driven Energy Consumption Prediction. Energy Build. 2025, 333, 115420. [Google Scholar] [CrossRef]
Prades-Gil, C.; Viana-Fons, J.D.; Masip, X.; Cazorla-Marín, A.; Gómez-Navarro, T. An Agile Heating and Cooling Energy Demand Model for Residential Buildings. Case Study in a Mediterranean City Residential Sector. Renew. Sustain. Energy Rev. 2023, 175, 113166. [Google Scholar] [CrossRef]
Casteleiro-Roca, J.-L.; Gómez-González, J.F.; Calvo-Rolle, J.L.; Jove, E.; Quintián, H.; Martín, J.F.A.; Perez, S.G.; Diaz, B.G.; Calero-Garcia, F.; Méndez-Perez, J.A. Prediction of the Energy Demand of a Hotel Using an Artificial Intelligence-Based Model. In Proceedings of the International Conference on Hybrid Artificial Intelligence Systems, Oviedo, Spain, 20–22 June 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 586–596. [Google Scholar]
Martínez-de-Alegría, I.; Río, R.-M.; Zarrabeitia, E.; Álvarez, I. Heating Demand as an Energy Performance Indicator: A Case Study of Buildings Built under the Passive House Standard in Spain. Energy Policy 2021, 159, 112604. [Google Scholar] [CrossRef]
Orosa, J.A.; Vergara, D.; Costa, Á.M.; Bouzón, R. A Novel Method for NZEB Internal Coverings Design Based on Neural Networks. Coatings 2019, 9, 288. [Google Scholar] [CrossRef]
Salem, K.M.; Rady, M.; Aly, H.; Elshimy, H. Design and Implementation of a Six-Degrees-of-Freedom Underwater Remotely Operated Vehicle. Appl. Sci. 2023, 13, 6870. [Google Scholar] [CrossRef]
Salem, K.M.; Elreafay, A.M.; Abumandour, R.M.; Dawood, A.S. Modeling Two-Phase Gas-Solid Flow in Axisymmetric Diffusers Using Cut Cell Technique: An Eulerian-Eulerian Approach. Bound. Value Probl. 2024, 2024, 150. [Google Scholar] [CrossRef]
Abumandour, R.M.; El-Reafay, A.M.; Salem, K.M.; Dawood, A.S. Numerical Investigation by Cut-Cell Approach for Turbulent Flow through an Expanded Wall Channel. Axioms 2023, 12, 442. [Google Scholar] [CrossRef]
Elreafay, A.M.; Salem, K.M.; Abumandour, R.M.; Dawood, A.S.; Al Nuaimi, S. Effect of Particle Diameter and Void Fraction on Gas–Solid Two-Phase Flow: A Numerical Investigation Using the Eulerian–Eulerian Approach. Comput. Part. Mech. 2024, 12, 289–311. [Google Scholar] [CrossRef]
Salem, K.M.; Rey-Hernández, J.M.; Rey-Martínez, F.J.; Elgharib, A.O. Assessing the Accuracy of AI Approaches for CO₂ Emission Predictions in Buildings. J. Clean. Prod. 2025, 513, 145692. [Google Scholar] [CrossRef]
Salem, K.M.; Mohamed, M.S.; ElMessmary, M.H.; Ehsan, A.; Elgharib, A.O.; ElShimy, H. Design and Development of Cost-Effective Humanoid Robots for Enhanced Human–Robot Interaction. Automation 2025, 6, 41. [Google Scholar] [CrossRef]
Chen, S.; Guo, W. Auto-Encoders in Deep Learning—A Review with New Perspectives. Mathematics 2023, 11, 1777. [Google Scholar] [CrossRef]
Selvakumarasamy, S.; Rengaraju, B.; Kulathooran, R.; Tarafdar, A. Artificial Neural Network and Mathematical Modeling on the Drying Kinetics of Costus Pictus Rhizomes and Its Impact on the Polyphenol, Flavonoid Content and Antioxidant Activity. Biomass Convers. Biorefin 2025, 15, 2537–2555. [Google Scholar] [CrossRef]
Minitaeva, A.M. Mathematical Model and Method of Decision Making in Economic Systems Based on Deep Learning. In Proceedings of the 2025 7th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE), Moscow, Russia, 8–10 April 2025; IEEE: New York City, NY, USA, 2025; pp. 1–6. [Google Scholar]
Harkos, C.; Hadjigeorgiou, A.G.; Voutouri, C.; Kumar, A.S.; Stylianopoulos, T.; Jain, R.K. Using Mathematical Modelling and AI to Improve Delivery and Efficacy of Therapies in Cancer. Nat. Rev. Cancer 2025, 25, 324–340. [Google Scholar] [CrossRef]
Farooq, U.; Alshamrani, A.; Alam, M.M.; Rafique, K. Cross-Thermal Streamline Patterns and Heat Transfer in EP-Nanofluids: A Neural Network Approach with Uncertainty Analysis. Case Stud. Therm. Eng. 2025, 70, 106152. [Google Scholar] [CrossRef]
Wang, Y.; Guo, Y. Forecasting Method of Stock Market Volatility in Time Series Data Based on Mixed Model of ARIMA and XGBoost. China Commun. 2020, 17, 205–221. [Google Scholar] [CrossRef]
de Jesús Rubio, J.; Garcia, D.; Sossa, H.; Garcia, I.; Zacarias, A.; Mujica-Vargas, D. Energy Processes Prediction by a Convolutional Radial Basis Function Network. Energy 2023, 284, 128470. [Google Scholar] [CrossRef]
Dong, Y.; Wang, D.; Zeng, F.; Zhang, Y. A Novel MADM Model Integrating Hybrid Information for Evaluating the Development Prospects of Urban New Energy Vehicles. PLoS ONE 2025, 20, e0314026. [Google Scholar] [CrossRef] [PubMed]
Dong, Y.; Zeng, F.; Sun, H. Research on the Evaluation of Urban Green Transportation Development Level in Guangzhou Under the Promotion of New Energy Vehicles. World Electr. Veh. J. 2025, 16, 253. [Google Scholar] [CrossRef]
Dong, Y.; Zhu, L.; Sun, J.; Li, P.; Wang, T.; Li, Z. Optimized Design and Performance Study of Hybrid Energy Systems for Building Clusters Based on Image Recognition and Generative Models: A Case Study of Office Parks. Energy Build. 2025, 332, 115438. [Google Scholar] [CrossRef]
Asfahan, H.M.; Sajjad, U.; Sultan, M.; Hussain, I.; Hamid, K.; Ali, M.; Wang, C.-C.; Shamshiri, R.R.; Khan, M.U. Artificial Intelligence for the Prediction of the Thermal Performance of Evaporative Cooling Systems. Energies 2021, 14, 3946. [Google Scholar] [CrossRef]
Yue, Y.; Yan, Z.; Ni, P.; Lei, F.; Yao, S. Machine Learning-Based Multi-Performance Prediction and Analysis of Earth-Air Heat Exchanger. Renew. Energy 2024, 227, 120550. [Google Scholar] [CrossRef]
Rikame, R.; Ranjan, M.K.; Jadhav, M.; Wankhede, S.; Bankar, S.; Bachhav, P. A Hybrid Automata-Driven Machine Learning Framework for Real-Time Energy Optimization in Smart Buildings. EPJ Web Conf. 2025, 328, 01060. [Google Scholar] [CrossRef]
Rahman, Z.U.; Chen, Y.; Ullah, A. Assessment of the Causal Links between Energy, Technologies, and Economic Growth in China: An Application of Wavelet Coherence and Hybrid Quantile Causality Approaches. Appl. Energy 2025, 377, 124469. [Google Scholar] [CrossRef]

Figure 1. Scheme for the building in Spain: case study building.

Figure 2. AI workflow.

Figure 3. Scatter plots of the actual vs. prediction values by all AI models for energy consumption in a traditional building.

Figure 4. Actual vs. prediction values by the hybrid models of all AI models for the energy consumption of a traditional building.

Figure 5. Performance matrix comparison for all AI models of energy consumption for a traditional building.

Figure 6. Correlation of the heatmap for all of the AI models for the energy consumption of a traditional building.

Figure 7. Taylor diagram for all AI models for energy consumption of a traditional building.

Figure 8. CO₂ emissions for selected AI models for energy consumption of a traditional building.

Figure 9. Energy consumption forecast (2020 = 100%): projected heating energy use from 2021 to 2050, as forecast by an LSTM model.

Table 1. Summarizing the literature review.

Ref.	Model/Technique	Outcome/Key Finding
Mohammad et al. [14]	Classical vs. Hybrid AI (specifically SVM & SI hybridization)	Hybridization of AI and Swarm Intelligence (SI), particularly SVM and SI, was found to be more effective than classical methods for electricity consumption forecasting in commercial buildings.
Dalia et al. [15]	AI in Building Energy Management Systems (BEMS)	AI is key to meeting Near-Zero Energy (NZE) goals, with office buildings having the highest energy savings potential (up to 37%) through AI-based HVAC optimization.
Omer et al. [16]	Six AI models (SVR, KNN, RF, MLP, GBoost, XGBoost)	RF and XGBoost were the best models for predicting Heating and Cooling Loads (HL/CL) in residential buildings. Scenario-2 (fewer inputs) was the most advantageous option for efficiency.
Ngoc et al. [22]	Hybrid Framework (BIM and ML) (FBI_BA Model)	Demonstrated the feasibility of combining BIM and ML. The hybrid FBI_BA model achieved high accuracy (R2 = 0.944) for rapid and scalable energy prediction in professional office buildings.
Prades et al. [23]	Agile Urban Energy Model (Degree-Days, GIS, Solar Irradiation)	Model proved useful for energy planning; prioritizing retrofitting just 17% of the most demanding buildings could achieve a 50% reduction in thermal demand.
José et al. [24]	Hybrid Architecture with Artificial Neural Networks (ANNs)	Developed a hybrid ANN model for daily power demand forecasting in luxury hotels, enabling efficient and economically beneficial energy management actions.

Table 2. Mathematical model for ANN.

Layer	Equation Description	Equation	Equation No.
Input Layer	Input Features	$x = [x_{1}, x_{2}, \dots, x_{n}]$	(1)
Hidden Lever 1	Weighted Sum	$z^{1} = W^{1} x + b^{1}$	(2)
Hidden Layer 2	Weighted Sum	$z^{2} = W^{2} h^{1} + b^{2}$	(3)
Output Layer	Weighted Sum	$z^{3} = W^{3} h^{2} + b^{3}$	(4)
	Final Output (Prediction)	$\hat{y} = g z^{3}$	(5)
Loss Function	Mean Squared Error	$\begin{array}{r} L = \frac{1}{N} \sum_{i = 1}^{N} {(\hat{y} - E_{actual})}^{2} \end{array}$	(6)
Backpropagation	Gradient of Loss w.r.t. Output	$\frac{\partial L}{\partial \hat{y}} = - \frac{2}{N} E_{actual} - \hat{y}$	(7)
	Gradient w.r.t Last Layer	$δ^{3} = \frac{\partial L}{\partial z^{3}} = \frac{\partial L}{\partial y} g^{'} z^{3}$	(8)
	Gradient w.r.t Hidden Layer 2	$\begin{array}{r} δ^{2} = δ^{3} \cdot W^{3} \cdot f^{'} z^{2} \end{array}$	(9)
	Gradient w.r.t Hidden Layer 1	$δ^{1} = δ^{2} \cdot W^{2} \cdot f_{z}^{'}$	(10)
Weight Updates	Update Rule for Weights (Layer 1)	$\begin{array}{r} W^{1} \leftarrow W^{1} - η \cdot δ^{1} \cdot x^{T} \end{array}$	(11)
	Update Rule for Weights (Layer 2)	$\begin{array}{r} W^{2} \leftarrow W^{2} - η δ^{2} \cdot h^{1 T} \end{array}$	(12)
	Update Rule for Weights (Output Layer)	$\begin{array}{r} W^{3} \leftarrow W^{3} - η δ^{3} \cdot h^{2 T} \end{array}$	(13)
	Update Rule for Biases (Hidden Layer 1)	$b^{1} \leftarrow b^{1} - η \cdot δ^{1}$	(14)
	Update Rule for Biases (Hidden Layer 2)	$b^{2} \leftarrow b^{2} - η \cdot δ^{2}$	(15)
	Update Rule for Biases (Output Layer)	$b^{3} \leftarrow b^{3} - η \cdot δ^{3}$	(16)

Table 3. Mathematical model for RF.

Component	Equation	Equation No.
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(17)
Ensemble Prediction	$\hat{Y} = \frac{1}{N} \sum_{i = 1}^{N} f_{i} (X)$	(18)
Tree Structure	$Each tree f_{i} (X)$ is constructed using random samples of features and instances
Node Splitting	$\arg \max_{j \in J} ($ $Gain (j))$	(19)
Leaf Prediction	${\hat{Y}}_{leaf} = \frac{1}{m} \sum_{j = 1}^{m} Y_{j}$	(20)
Feature Importance	$Importance (X_{k}) = \frac{1}{N} \sum_{i = 1}^{N}$ $Gain (X_{k})$	(21)
Error Estimation	$OOB Error = \frac{1}{N} \sum_{i = 1}^{N} I (Y_{i} \neq {\hat{Y}}_{i})$	(22)

Table 4. Mathematical model for XGBoost.

Component	Equation	Equation No.
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(23)
Model Equation	$\hat{Y} = \sum_{k = 1}^{K} f_{k} (X)$	(24)
Objective Function	$L = \sum_{i = 1}^{N} L o s s (Y_{i}, {\hat{Y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})$	(25)
Regularization Term	$Ω (f_{k}) = γ T + \frac{1}{2} λ \sum_{j = 1}^{T} w_{j}^{2}$	(26)
Tree Splitting Gain	$G a i n (X_{j}$ $, split) = \frac{1}{2} (\frac{{(\sum_{i \in L} g_{i})}^{2}}{\sum_{i \in L} h_{i} + λ} + \frac{{(\sum_{i \in R} g_{i})}^{2}}{\sum_{i \in R} h_{i} + λ} - \frac{{(\sum_{i} g_{i})}^{2}}{\sum_{i} h_{i} + λ})$	(27)
Final Prediction	$\hat{Y} =$ $base_score + \sum_{k = 1}^{K} f_{k} (X)$	(28)

Table 5. Mathematical model (RBF).

Component	Equation	Equation No.
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(29)
Activation of the j-th neuron in the hidden layer	$ϕ_{j} (x) = e^{- \frac{{∥x \cdot c_{j}∥}^{2}}{2 σ_{j}^{2}}}$	(30)
Output of the RBF	$y (x) = \sum_{j = 1}^{N} w_{j} ϕ_{j} (x)$	(31)
Error can be computed	$E = \frac{1}{2} \sum_{i = 1}^{M} {(y_{i} - {\hat{y}}_{i})}^{2}$	(32)

Table 6. Mathematical model for Autoencoder.

Component	Equation	Equation No.
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(33)
Encoding Process	$z = f (x) = σ (W_{e} x + b_{e})$	(34)
Decoding Process	$\hat{x} = g (z) = σ (W_{d} z + b_{d})$	(35)
Loss Function	$L = \frac{1}{N} \sum_{i = 1}^{N} {∥x_{i} - {\hat{x}}_{i}∥}^{2}$	(36)

Table 7. Mathematical model for Decision Tree.

Component	Equation	Equation No.
Input Variables	$X = [X_{1}, X_{2}, X_{3}, X_{4}, X_{5}]$	(37)
Gini Impurity	$G i n i (D) = 1 - \sum_{k = 1}^{K} p_{k}^{2}$	(38)
Entropy	$E n t r o p y (D) = - \sum_{k = 1}^{K} p_{k} {l o g}_{2} (p_{k})$	(39)
Mean Squared Error (MSE)	$M S E = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - \hat{y})}^{2}$	(40)
Information Gain	$I G (D, A) = E n t r o p y (D) - \sum_{v \in V a l u e s (A)} \frac{\|D_{v}\|}{\| D \|} E n t r o p y (D_{v})$	(41)

Table 8. Mathematical models for voting, stacking, and blending.

Component	Equation	Equation No.
Voting	$P_{a v g, j} = \frac{1}{N} \sum_{i = 1}^{N} P_{i j} for j = 1,2, \dots, K$	(42)
Voting	$Prediction = a r g m a x (P_{avg, 1}, P_{avg, 2}, \dots, P_{avg, K})$	(43)
Stacking	Base Models:	(44)
	${Predictions}_{j} = f_{j} (X) for j = 1,2, \dots, N$	(44)
	Meta-Model:	(45)
	$Final Prediction = g ({Predictions}_{1}, {Predictions}_{2}, \dots, {Predictions}_{N})$	(45)
Blending	Base Models: ${Predictions}_{j} = f_{j} (X) for j = 1,2, \dots, N$	(46)
Blending	Train-Meta-Model: $Final Prediction = g ({Predictions}_{1}, {Predictions}_{2}, \dots, {Predictions}_{N})$	(47)

Table 9. Mathematical model for LSTM.

Component	Equation	Equation No.
Forget Gate	$\begin{array}{r} f_{t} = σ (W_{f} . [h_{t - 1}, x_{t}] + b_{f}) \end{array}$	(48)
Input Gate	$\begin{array}{r} i_{t} = σ (W_{i} . [h_{t - 1}, x_{t}] + b_{i}) \end{array}$	(49)
Candidate Memory Cell	$\begin{array}{r} {\tilde{C}}_{t} = t a n h (W_{C} . [h_{t - 1}, x_{t}] + b_{C}) \end{array}$	(50)
Cell State Update	$C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\bar{C}}_{t}$	(51)
Output Gate	$\begin{array}{r} o_{t} = σ (W_{o} . [h_{t - 1}, x_{t}] + b_{o}) \end{array}$	(52)
Hidden State Update	$h_{t} = o_{t} \cdot t a n h (C_{t})$	(53)
Output Layer	${\hat{y}}_{t} = W_{y} \cdot h_{t} + b_{y}$	(54)

Table 10. Evaluation metric equations.

Component	Equation	Equation No.
Root Mean Square Percentage Error (RMSPE)	$R M S P E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}} * 100$	(55)
Mean Absolute Percentage Error (MAPE)	$M A P E = \frac{1}{N} \sum_{i = 1}^{N} \|y_{i} - {\hat{y}}_{i}\| * 100$	(56)
Kling–Gupta Efficiency (KGE)	$K G E = 1 - \sqrt{(r - 1)^{2} + {(\frac{σ_{model}}{σ_{obs}} - 1)}^{2} + {(\frac{μ_{model}}{μ_{o b s}} - 1)}^{2}}$	(57)
Nash–Sutcliffe Efficiency (NSE)	$N S E = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \overline{y})}^{2}}$	(58)
Coefficient of Determination (R2)	$R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \overline{y})}^{2}}$	(59)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salem, K.M.; Elgharib, A.O.; Rey-Hernández, J.M.; Rey-Martínez, F.J. A Generalizable Hybrid AI-LSTM Model for Energy Consumption and Decarbonization Forecasting. Sustainability 2025, 17, 10882. https://doi.org/10.3390/su172310882

AMA Style

Salem KM, Elgharib AO, Rey-Hernández JM, Rey-Martínez FJ. A Generalizable Hybrid AI-LSTM Model for Energy Consumption and Decarbonization Forecasting. Sustainability. 2025; 17(23):10882. https://doi.org/10.3390/su172310882

Chicago/Turabian Style

Salem, Khaled M., A. O. Elgharib, Javier M. Rey-Hernández, and Francisco J. Rey-Martínez. 2025. "A Generalizable Hybrid AI-LSTM Model for Energy Consumption and Decarbonization Forecasting" Sustainability 17, no. 23: 10882. https://doi.org/10.3390/su172310882

APA Style

Salem, K. M., Elgharib, A. O., Rey-Hernández, J. M., & Rey-Martínez, F. J. (2025). A Generalizable Hybrid AI-LSTM Model for Energy Consumption and Decarbonization Forecasting. Sustainability, 17(23), 10882. https://doi.org/10.3390/su172310882

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Generalizable Hybrid AI-LSTM Model for Energy Consumption and Decarbonization Forecasting

Abstract

1. Introduction

1.1. Literature Review

1.2. Contributions

2. Methodology

2.1. Data Collection

Data Normalizing

2.2. Model Development

2.3. Optimization Procedures

3. Result & Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI