Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions

Salem, Khaled M.; Rey-Hernández, Javier M.; Elgharib, A. O.; Rey-Martínez, Francisco J.

doi:10.3390/app15126806

Open AccessArticle

Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions

by

Khaled M. Salem

^1,2,3

,

Javier M. Rey-Hernández

^1,4,5

,

A. O. Elgharib

^1,2

and

Francisco J. Rey-Martínez

^1,3,5,*

¹

GIRTER Research Group, Consolidated Research Unit (UIC053) of Castile and Leon, 47002 Valladolid, Spain

²

Department of Basic and Applied Science Engineering, Arab Academy for Science, Technology and Maritime Transport (Smart Village Campus), Smart Village, Giza 12577, Egypt

³

Department of Energy and Fluid Mechanics, Engineering School (EII), University of Valladolid (UVa), 47002 Valladolid, Spain

⁴

Department of Mechanical Engineering, Fluid Mechanics and Thermal Engines, Engineering School, University of Malaga (UMa), 29016 Málaga, Spain

⁵

Institute of Advanced Production Technologies (ITAP), University of Valladolid (Uva), 47002 Valladolid, Spain

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(12), 6806; https://doi.org/10.3390/app15126806

Submission received: 20 May 2025 / Revised: 10 June 2025 / Accepted: 12 June 2025 / Published: 17 June 2025

(This article belongs to the Special Issue Infrastructure Resilience Analysis)

Download

Browse Figures

Versions Notes

Abstract

Industry 5.0 is transforming energy demand by integrating sustainability into energy planning, ensuring market stability while minimizing environmental impact for future generations. There are several patterns for calculating energy consumption depending on whether it is measured daily, monthly, or annually through the integration of artificial intelligence approaches, particularly Artificial Neural Networks (ANNs) and Random Forests (RFs), and within the framework of Industry 5.0. This study employs machine learning techniques to analyze energy consumption data from two distinct buildings in Spain: the LUCIA facility in Valladolid and the FUHEM Building in Madrid. The implementation was conducted using custom MATLAB code developed in-house. Our approach systematically evaluates and compares the predictive performance of Artificial Neural Networks (ANNs) and Random Forests (RFs) for energy demand forecasting, leveraging each algorithm’s unique characteristics to assess their suitability for this application. The performances of both models are calculated using the Root Mean Square Percentage Error (RMSPE), Root Mean Square Relative Percentage Error (RMSRPE), Mean Absolute Percentage Error (MAPE), Mean Absolute Relative Percentage Error (MARPE), Kling–Gupta Efficiency (KGE), and also the coefficient of determination, R². Training times are validated using ANN and RF models. Lucia RF took 2.8 s, while Lucia ANN took 40 s; FUHEM RF took 0.3 s, compared to FUHEM ANN, which took 1.1 s. The performances of the two models are described in detail to show the effectiveness of each of them.

Keywords:

industry 5.0; sustainability; nZEB; AI integration; optimization; ANN; random forest; smart grids; energy forecasting

1. Introduction

Industry 5.0 is an approach wherein production and manufacturing are inclined towards more sustainable and humanistic practices. Advanced technologies in AI, robotics, and IoT will mold the power demands of Industry 5.0-related buildings much more. This framework promotes total efficiency and sustainability, which is also apparent in smart energy management applications that concern themselves with the optimal use of any kind of energy consumption [1,2,3]. Renewable energy is harnessed in buildings with an energy-efficient design and real-time monitoring for minimum carbon footprints. Further, such a collaboration of humans and machines will require appropriately adaptive energy solutions so that energy supply matches the dynamic energy demands of the intelligent manufacturing processes. In Industry 5.0, energy demand will not address just operational needs, but will also play an integral role in creating a circular economy to protect the environment [4,5,6,7].

Traditional energy demand forecasting models are likely to experience severe shortcomings like sparsity of data, which can adversely affect the quality and reliability of forecasts due to the absence of adequate historic data; seasonality, whereby the demand patterns change based on the time of year, and, thus, establishing consistent trends is challenging; and computational complexity, since traditional methods may require extremely high computational time and effort to achieve reliable results [8]. AI is transforming energy management in buildings by facilitating smarter data-driven decisions. Through advanced analytics and machine learning algorithms, AI can forecast energy consumption trends, optimize resource distribution, and improve operational efficiency. This technology allows for the real-time monitoring and control of energy systems, enabling quick adjustments based on factors like occupancy, weather, and energy prices. As a result, buildings become more responsive and adaptable, significantly reducing waste and enhancing sustainability [9,10]. Moreover, AI-driven predictive maintenance can identify potential issues before they escalate, further improving energy efficiency and lowering operational costs. This proactive approach to energy management enhances both performance and sustainability [11,12,13,14,15,16]. The combination of Artificial Neural Networks (ANNs) and Random Forest (RF) models introduces a new method that leverages the strengths of each approach: ANNs detect nonlinear relationships and learn intricate patterns from large datasets, while RF contributes robustness and interpretability with its ensemble method, effectively managing overfitting and boosting predictive ability [17,18,19].

1.1. Literature Review

Ferlito et al. [20] investigated the energy requirements of buildings, focusing on factors such as weather, structure, lighting, and HVAC systems. Their study analyzed data from a building in Eboli, Italy, collected between 2011 and 2013, employing an Artificial Neural Network (ANN) with six neurons and one hidden layer trained using the Levenberg–Marquardt method. They presented regression charts that compared expected energy output (NAR output) with actual energy demand (NAR target). The accuracy of energy consumption estimates, as measured by Root Mean Square Percentage Error (RMSPE), ranged from 15.7% to 17.97%, with slight variations based on prediction horizons of three, six, and twelve months. In contrast, Li et al. [21] utilized an ANN to explore energy savings in buildings, achieving accuracy within ±10% for overall consumption and heating/cooling variances. Their MSHD approach yielded a lighting consumption accuracy of ±10% and heating/cooling accuracy of ±2%, ensuring overall energy variance remained below 10%. Ekici et al. [22] studied how insulation thickness and orientation affect building energy needs using finite difference numerical methods. They examined three building samples with insulation thicknesses from 0 to 15 cm and orientation angles from 0 to 80 degrees utilizing an ANN toolbox in MATLAB (Version R2018a, MathWorks) and FORTRAN 77 software. Their findings indicated an impressive prediction accuracy of 94.8% to 98.5%, with a variance of 3.43%. Verma et al. [23] employed predictive techniques, including regression and ANN, to estimate hourly energy consumption for four buildings in the United States. These results aim to enhance energy conservation efforts in existing structures and improve energy efficiency in construction design. Orosa et al. [24], through the International Energy Agency, developed an innovative approach combining artificial intelligence and neural networks to predict indoor environments in near-zero-energy buildings (nZEB). This strategy aims to further research indoor conditions and building construction while enhancing our understanding of permeable material behavior. Arida et al. [25] explored nonlinear auto-regression Artificial Neural Networks for accurate predictions of building energy consumption, optimizing models with genetic algorithms and evaluating performance using statistical metrics [26,27]. Cáceres et al. [28] introduced an intelligent optimization framework integrating AI models to achieve near-zero-energy-consumption buildings (nZEB). This framework demonstrated effective multi-objective optimization and high prediction accuracy, resulting in 21.25% energy savings for a teaching building in Wuhan. They recommended using Random Forest algorithms within a Big Data framework to accurately forecast household energy demand, leveraging socioeconomic data and advanced scaling techniques to enhance energy management efficiency.

Additionally, a Random Forest (RF) ensemble approach was proposed for predicting hourly energy consumption in two educational buildings in North Central Florida [29]. Ahmad et al. [30] found that RF was less sensitive to variable parameters and outperformed regression tree (RT) and support vector regression (SVR) in prediction accuracy, showing improvements of 14–25% and 5–5.5%, respectively. They also noted that the most significant factors for energy prediction varied by semester, emphasizing the importance of accounting for shifts in energy behavior throughout the academic year [31,32,33,34]. Table 1 summarizes the literature review with mathematical models, proposed for this paper, time period, and accuracy. Although numerous studies have employed artificial intelligence techniques such as Artificial Neural Networks (ANN) and Random Forest (RF) models to predict building energy consumption and optimize energy management, there remains a significant research gap. Most existing research has focused on specific building types, regions, or limited prediction horizons, often achieving accuracy levels ranging from approximately 10% to 20%, which may not be sufficient for precise energy planning and demand management. Additionally, many studies primarily concentrate on single factors like weather, insulation, or lighting without integrating the complex interplay of multiple variables influencing energy demand simultaneously. Furthermore, while some research has compared the performance of ANNs and RFs, there is limited exploration of advanced ensemble techniques or hybrid models that could potentially enhance prediction accuracy and stability across different building types and operational conditions. There is also a lack of comprehensive analysis on the models’ robustness under varying socioeconomic and environmental scenarios, especially in different geographical contexts. Therefore, future research should aim to develop more integrated, scalable, and adaptive prediction frameworks that combine multiple data sources, incorporate advanced machine learning algorithms, and evaluate their performance over extended periods and diverse building typologies to support more effective energy efficiency strategies.

1.2. Contributions

While advancements in forecasting energy demand is evident, there is a lack of studies to directly compare the performance of Artificial Neural Networks (ANN) and Random Forest (RF) models in HVAC systems in near-zero-energy buildings (nZEB) and heating applications across regions in Spain. The goal of this paper is to implement a MATLAB-based framework to accurately evaluate and contrast the ability of ANN and RF algorithms in different contexts. This study will evaluate the performance of both machine learning methods based on their accuracy and computational cost. It will examine both regional climates and building characteristics, as well as their effect on energy demand predictions and how this information can be useful for informing and improving energy management tasks in Spanish households. The main contributions of this paper can be summarized as follows:

It addresses the impact of Industry 5.0 on energy demand, specifically within the context of sustainability and meeting current market needs while minimizing environmental impact.
It fills a gap in energy consumption forecasting by integrating artificial intelligence approaches, specifically Artificial Neural Networks (ANNs) and Random Forests (RFs), into the Industry 5.0 framework.
It implements and compares ANN and RF models using real-world energy consumption data from two specific locations in Spain (LUCIA, FUHEM) using a house-developed code based on MATLAB.
It evaluates and compares the performance of ANN and RF models in predicting energy demand using a comprehensive set of metrics, including Root Mean Square Percentage Error (RMSPE), Root Mean Square Relative Percentage Error (RMSRPE), Mean Absolute Percentage Error (MAPE), Mean Absolute Relative Percentage Error (MARPE), Kling–Gupta Efficiency (KGE), and the coefficient of determination (R²).

This paper is organized as follows: Section 1 offers an overview, including background information, a review of the relevant literature, and the objectives of the study. Section 2 details the methodology, covering data collection, normalization techniques, mathematical modeling with Artificial Neural Networks (ANNs) and Random Forests (RFs), as well as the optimization processes. Section 3 presents the results along with their discussion. Finally, Section 4 provides the conclusions and suggests directions for future research.

2. Methodology

2.1. Data Collection

This research used data collected from two prominent buildings in Spain, Lucia in Valladolid and FUHEM in Madrid, where historical energy use profile data from 2020 was used to extract a complete dataset of 8760 entries, one for every hour of the year. The detailed hourly data were vital in analyzing energy demand changes across the year, accommodating both the daily and seasonal differences. Extensive datasets were collected for the Lucia Building, as shown in Figure 1a; the dataset encompassed various factors, including weather parameters and structural geometric characteristics. Key weather variables—such as temperature, humidity, wind speed, and solar radiation—were critical for assessing how external conditions influenced energy consumption. Additionally, geometric data provided details on the buildings’ physical attributes, such as surface area and volume, further supporting the analysis [35].

As an additional step to evaluate our predictive models, we drew a random dataset of 100 data points from the FUHEM Building, as shown in Figure 1b, in Madrid. The FUHEM Building in particular was selected for this test because of its unique building typology and operational characteristics, which are different from the Lucía Building in Valladolid in both geographical and climate aspects. The differences in the climate of Madrid and Valladolid would also be relevant for identifying how geographical variation will influence energy demand.

We feel that, by collating the extensive dataset of hourly recorded consumption (as well as monitored variables such as occupancy and outdoor temperature) with the limited dataset of the FUHEM Building, we are equipped to create forecasting models for energy demand. Models such as those provided by obtainable datasets are often good for developing an understanding of the general capabilities of the variables. While more data are always better for modeling, using these combinations of sources adds another type of ‘generalized’ dataset to make our forecasting models more relevant to differing building types and conditions.

Using data from 2020 to train the models may limit the relevance of the findings, particularly because building usage patterns can evolve over time. While this dataset provides valuable insights into energy consumption trends and behaviors during that period, it is important to consider how shifts in occupancy, usage patterns, and external factors such as economic conditions and climate change can impact energy demand. Additionally, the challenges associated with collecting more recent data, such as resource availability and varying monitoring systems across different buildings, can complicate efforts to obtain up-to-date information. These factors highlight the need for a comprehensive approach to data collection that accommodates the dynamic nature of building energy usage.

In Figure 2, the Energy Demand Prediction Workflow shows the steps in energy demand prediction in a structured way with the help of two machine learning models: ANN and Random Forest. First, the data are inputted to the software environment MATLAB; then, some preprocessing operations take place, which include data checking and normalization. Afterward, both models will be trained and tested using some performance evaluation metrics, namely RMSE, MAE, KGE, and Nash–Sutcliffe Efficiency, to compare the models. The models are then tested on their performance in terms of energy demand prediction to ensure a robust evaluation of their effectiveness.

2.2. Data Preprocessing

Data preprocessing is a critical step in analyzing the seasonal data for the two buildings, LUCIA and FUHEM. In this instance, the preprocessing was conducted with a focus on maintaining the integrity of the dataset. No outliers were removed, as all values were considered essential for reflecting the true operational conditions of the buildings. This decision allows for a more accurate representation of the data, particularly in understanding any unique trends or patterns that may arise. Furthermore, the dataset was found to be complete, with no missing values. This completeness ensures that the analysis will not be skewed by gaps in the data, allowing for a thorough examination of seasonal variations in energy usage or occupancy. By retaining all original data points, the analysis can provide valuable insights into how each building’s performance fluctuates throughout the seasons, ultimately supporting better decision making and resource management [36]. Table 2 shows the sample of the dataset in each zone.

Data Normalizing

Prior to analysis, the dataset was cleaned by removing outliers and correcting erroneous entries to prevent biased results. Missing values were addressed using interpolation techniques, ensuring data continuity and reliability, which are critical factors for accurate time-series forecasting. In addition to the primary dataset from the Lucía building, a randomized subset of data from the FUHEM Building in Madrid was incorporated. This secondary dataset was specifically curated to evaluate the robustness of our predictive model across varying conditions, including fluctuations in occupancy and weather patterns. By employing random sampling, we assessed the model’s ability to generalize and perform under diverse scenarios, thereby strengthening the validity of our findings. Following data cleaning and preprocessing, feature engineering was conducted to improve predictive accuracy. New variables were introduced to capture temporal trends—such as time of day and day of the week—which are instrumental in understanding energy consumption patterns. The data were then normalized to ensure uniform feature scaling, facilitating the optimal training of our Artificial Neural Network and Random Forest models. These preprocessing steps laid the groundwork for developing a robust and generalizable predictive model capable of capturing energy demand dynamics across both buildings [37].

In this research, we used Artificial Neural Networks (ANNs) and Random Forests (RFs) because of their desirable characteristics in examining the LUCIA and FUHEM datasets. The LUCIA dataset included 8760 hourly data points for near-zero-energy buildings (nZEB), so it includes a significant amount of data that have complex nonlinear associations. An ANN is effective in capturing nonlinear associations because of the establishment of layers and a flexible architecture, allowing the model to learn complicated relationships present in the data. This architecture is particularly efficient in forecasting time-series data, which is imperative in predicting energy consumption patterns over longer time intervals. Additionally, an ANN can have multiple input features and become amenable to different data preprocessing and feature engineering methods. The FUHEM dataset only includes 100 data points for regular buildings and can benefit from technical strength in the case of RF and an ability to resist overfitting in the model. Working off random samples, RF produces predictions based on a compounded decision tree, minimizing the risk of overfitting while promoting better generalization. This is especially important for our work, as the limited size of the FUHEM dataset offers a restrictive sample for modeling and we need to ensure that the prediction can be reliable while maximizing interpretability.

2.3. Model Development (ANN)

Artificial Neural Networks (ANNs) have rapidly become a revolutionary machine learning method as they are designed based on biological neural networks. ANNs are information processing systems that organize a series of artificial neurons in a layered structure, which usually consists of three main layers (input, hidden, and output layers). Each neuron processes the information in a set of weighted inputs combined with a nonlinear activation function. This allows ANNs to recognize complex patterns or relationships within the data. Due to their inherent flexibility, ANNs can be used in a variety of fields such as computer vision, speech recognition, natural language processing, and financial modeling. They excel at recognition tasks because they can model complicated nonlinear relationships in the data and generalize well. Most ANNs are trained using backpropagation algorithms, which adjust the ANN weights using training data to reduce the prediction error. This incremental optimization of weights is based on reducing the cost function, which develops the performance of the ANN based on the dataset size for training [38,39,40].

Mathematical Model (ANN)

The ANN model developed for the Lucía Building in Valladolid incorporates 11 input variables, while the model for the FUHEM Building in Madrid accounts for similar energy demand drivers. These inputs include historical energy consumption data, weather parameters, and occupancy-related indicators, all of which directly influence heating, which is particularly relevant given Valladolid’s colder climate. The careful selection of these variables was critical as they directly correlate with heating requirements and energy consumption patterns. The model is designed to provide precise energy demand forecasts, enabling building management systems to optimize energy use and improve sustainability through data-driven decisions [41].

The ANN architecture consists of two hidden layers designed for distinct purposes. The first hidden layer extracts key patterns from input data, while the second enhances the model’s ability to generalize these patterns to unseen data. Both layers employ ReLU (Rectified Linear Unit) activation functions to effectively model complex nonlinear relationships in the data. The output layer utilizes a linear activation function to generate continuous energy demand forecasts, making the model suitable for real-time energy management applications. For network training, we implemented the Levenberg–Marquardt algorithm, a robust approach for nonlinear least squares optimization. This algorithm synergizes gradient descent and Gauss–Newton methods, offering both rapid convergence and training efficiency. During training, connection weights between neurons are iteratively adjusted to minimize the discrepancy between predicted and actual energy demand values. The model was trained on 2020 data, with separate training and validation subsets used to monitor performance and prevent overfitting [42].

The ANN model developed for the Lucía building addresses Europe’s growing energy demands, particularly in colder climate regions. By analyzing historical data through neural networks, the model provides actionable insights to facility managers, enabling more efficient energy use and cost optimization while supporting sustainable energy practices. An Artificial Neural Network with two hidden layers can be described mathematically as in Table 3 [43].

2.4. Model Development (RF)

Random Forests (RFs) have emerged as a powerful and effective machine learning method for both classification and regression problems. The method is an ensemble approach that averages the predictions of many decision trees, each trained on a different random subset of the data and/or feature space. Aggregating the predictions from many decision trees allows Random Forests to make better predictions, be less subject to overfitting, and provide more stable predictions relative to a single decision tree. In addition to their other strong points, Random Forests—and the decision trees they use—provide useful interpretability that comes with built-in feature importance metrics. This means that practitioners using Random Forests can easily identify which variables in their model are most important. They demonstrate remarkable flexibility in that they can collect both numerical and categorical dataset labels without as much intensive processing required. Because of this flexibility, Random Forest performs consistently well across different problem domains. This flexibility and overall strong performance has led to a proliferation of Random Forest applications, making it particularly popular and widely used for real-world applications, particularly in industries such as financial services and healthcare that rely on operational decisions for critical tasks, such as fraud detection, customer segmentation, and medical diagnosis, where practitioners require an acceptable trade-off of model complexity or interpretability and, ultimately, predictive performance [44,45].

Mathematical Model (RF)

The Random Forest algorithm is an effective ensemble learning technique that is particularly suited for classification and regression tasks in various fields, including architecture and urban planning. By leveraging historical data, RF can predict outcomes, identify key success factors, and enhance decision-making processes for projects like the Lucia Building and FUHEM. The combination of multiple decision trees minimizes the risk of overfitting and increases the accuracy of RF models, making it especially useful for analyzing complex datasets typical in urban development projects.

The data collection process for the Lucia and FUHEM Buildings involves gathering both file-based and geometric data. At this stage, thorough data preparation is essential to ensure accurate interpretation by the model. This includes handling missing values, encoding categorical variables, and normalizing numerical features to maintain consistency. Additionally, applying feature selection methods—such as iterative extraction or ranking—helps identify the most influential factors affecting project results. This approach provides both an improved predictive performance of the model and more understanding of the significant variables involved. Once the dataset is thoroughly prepared for modeling, it can be leveraged to train, validate, and measure the performance of the Random Forest (RF) model. When training the RF model, a number of decision trees are developed based on the concept of bootstrapping for sampling, and each decision tree sample has its own predictions; then, they are finally aggregated together. We use measures of performance metrics such as accuracy, precision, recall, and F1-score to assess the reliability of the model across repeated conditions. We can also apply cross-validation methods to validate the model’s ability to generalize and predict using unseen data. After the model is fully trained, we can use it to make predictions for outcomes for both the FUHEM and Lucia Buildings to support the stakeholder’s plans for informed data-driven decisions, as shown in Table 4 [46,47].

Tree Construction:

For each tree $t$ :
Draw a bootstrap sample $D_{t}$ by randomly selecting $N_{t}$ instances from the training dataset $D$ .
At each node within the tree, randomly select a subset of mmm features from the total set of $m$ features.
Determine the optimal split $j$ at the node using a specific criterion, such as Gini impurity for classification tasks or Mean Squared Error for regression problems.

2.5. Evaluation Metrics

There are a variety of metrics used in predictive modeling and regression analysis that show how well a model fits the analyzed systems. Commonly reported metrics include the Root Mean Square Percentage Error (RMSPE), Root Mean Square Relative Percentage Error (RMSRPE), Mean Absolute Percentage Error (MAPE), Mean Absolute Relative Percentage Error (MARPE), Kling–Gupta Efficiency (KGE), and timing Nash–Sutcliffe Efficiency (NSE). Each metric emphasizes different components of model generalization and predictive ability, allowing researchers and practitioners to evaluate how well their models achieve the outcomes they intend. Overall, developing a robust model relies significantly on understanding these metrics to provide reliable forecasts [48,49,50].

RMSPE and MAPE are commonly used to assess the error between predicted and actual values. RMSPE employs a quadratic scoring rule, squaring the differences and, thus, placing greater emphasis on larger errors, making it sensitive to outliers. It is particularly useful in situations where avoiding significant errors is crucial, as it effectively captures substantial discrepancies. In contrast, MAPE is a linear measure of absolute error, which tends to be less influenced by outliers since it averages the absolute errors.

KGE (Kling–Gupta Efficiency) and NSE (Nash–Sutcliffe Efficiency) are key metrics for performance analysis, especially in water and environmental modeling. KGE is a composite index that evaluates correlation, bias, and differences between predicted and observed data. It allows for a comprehensive assessment, with values close to 1 indicating better performance. NSE further compares the model’s predictions against actual measurements, revealing the proportion of variance explained by the model, with coefficients ranging from negative to positive indicating varying degrees of prediction accuracy. Lastly, R² measures how well the predicted values account for the variance in the actual data, with values approaching 1 signifying strong explanatory power. Together, these metrics create a solid framework for evaluating and enhancing model performance, promoting better understanding and more effective decision making in the modeling process [51,52]. These metrics provide a sound basis on which model performance evaluation and improvement can be based, as show in Table 5, therefore offering enhanced insight and decision making in modeling.

2.6. Optimization Procedures

Two models, an Artificial Neural Network (ANN) and a Random Forest (RF), will be utilized to predict the energy demand of a building and achieve sustainable energy outcomes. The equations will be implemented in code, with convergence defined as occurring when the maximum residual sum for all components of the solved variables (mu, gradient) falls below 0.1% based on the dataset [53,54,55]. The equation solving will be carried out using a custom-developed MATLAB program. We chose MATLAB for our study due to its extensive libraries and toolboxes specifically designed for advanced statistical and machine learning analyses, which are crucial for effectively implementing and comparing our Artificial Neural Network (ANN) and Random Forest (RF) models. MATLAB’s user-friendly environment facilitates efficient coding and visualization, allowing for seamless data manipulation and model testing. Its robust support for parallel computing enhances computational efficiency, which is particularly important given the size and complexity of the datasets we utilized. The platform’s strong community and documentation further streamline the development process, ensuring that we can focus on refining our models and interpreting results, ultimately providing a reliable framework for evaluating energy demand predictions in near-zero-energy buildings (nZEB) and traditional buildings. Comparing the two models will help determine which one has a greater impact on the building based on the dataset. Various factors influencing energy demand will be examined, emphasizing the interconnections among economic growth, population dynamics, and technological advancements [56,57].

In the Artificial Neural Network model, the choice of two hidden layers with five neurons each balances the capacity to capture nonlinear relationships without excessive complexity, which could lead to overfitting. The training parameters include a high number of epochs (3000) to allow thorough learning, a low learning rate (0.01) to enable gradual weight updates for stability, and a stringent training goal (1 × 10⁻⁶) to ensure the model achieves a desired accuracy level. The allowance for 4000 maximum validation failures provides ample opportunities for improvement before stopping, while a minimum performance gradient of 1 × 10⁻³ ensures training continues until performance gains are negligible, promoting the comprehensive optimization of the model. In your Random Forest model, the choice of 100 trees balances model performance and training efficiency, as this number generally reduces variance without significantly increasing bias, providing good generalization to unseen data. The default impurity criterion, typically Mean Squared Error (MSE) for regression tasks, helps minimize prediction variance, making it effective for continuous targets. Additionally, using the ‘Bag’ method in fitrensemble implements bootstrap aggregating, which enhances model stability by averaging predictions from multiple trees built on different samples of the data, further reducing the risk of overfitting. The block diagram focused on the complex relationship between these elements and their collective impact on energy demand trends, as shown in Figure 3. The Levenberg–Marquardt (LM) algorithm is a popular optimization technique used in Artificial Neural Networks (ANNs) for training due to its efficiency in minimizing the error function. It combines the advantages of gradient descent and the Gauss–Newton method, making it particularly effective for nonlinear least squares problems. By adapting the learning rate dynamically, LM accelerates convergence and is especially useful when dealing with small-to-medium-sized datasets, where it can outperform other methods in terms of speed and accuracy. In contrast, Random Forest with Bayesian inference enhances model robustness by leveraging the strengths of ensemble learning and probabilistic modeling. This approach allows for the integration of uncertainty into predictions, making it ideal for complex datasets with high dimensionality and noise. The Bayesian framework provides a way to quantify uncertainty and improves the interpretability of model predictions, which is crucial in applications requiring reliable decision making. The steps for the optimization procedure method are given below [58,59]:

Split the dataset into three parts: 70% for training, 15% for validation, and 15% for testing.
Choose the network architecture and set the training parameters.
Train the model with the training dataset.
Validate the model’s performance using the validation dataset.
Iterate steps 2 to 4, experimenting with various architectures and training settings.
Identify the optimal network architecture based on validation results.
Evaluate the selected final model using the test dataset to assess its performance.

3. Results and Discussions

This study develops advanced machine learning approaches—Artificial Neural Networks (ANNs) and Random Forests (RFs)—to enhance energy efficiency in HVAC systems by predicting energy demands for two distinct buildings: the Lucía building in Valladolid and the FUHEM facility in Madrid, Spain. Our research aims to validate these models’ performance while analyzing their effectiveness in addressing different thermal requirements: comprehensive HVAC energy demands for Lucía versus specific heating needs for FUHEM. The methodology involved systematic data collection and preprocessing from both sites, followed by rigorous validation procedures including cross-validation to ensure prediction reliability. The Lucía dataset incorporated multiple HVAC demand drivers including temperature fluctuations, humidity levels, occupancy patterns, and insulation characteristics. In contrast, the FUHEM dataset focused primarily on heating demands influenced by external temperature variations and temporal factors (daily and seasonal cycles). Comparative analysis revealed notable performance differences between the models, reflecting the distinct energy consumption profiles of each building. These findings contribute to developing integrated energy management strategies adaptable to varying climatic conditions and operational requirements.

3.1. Actual and Predicted Energy Demand

The findings are anticipated to support the optimization of HVAC system operations and energy management strategies. By leveraging predictions from the ANN and RF models, facility managers can forecast peak demand times for HVAC services and take proactive steps to ensure efficient resource utilization while reducing energy consumption. Furthermore, the data-driven methodology described can be continuously refined and adapted to changing environmental and operational conditions, ultimately improving the overall efficiency and robustness of the HVAC system.

Figure 4a presents a comparative visualization of the actual energy demand (blue line) alongside predictions from two machine learning models: the Random Forest (RF) model (green line) and the Artificial Neural Network (ANN) model (red line). The plot provides a comprehensive comparison between measured HVAC demand and model forecasts. The analysis reveals that both models successfully captured the overall trend of cooling demand. However, the ANN model demonstrates superior performance in predicting both the timing and magnitude of peak demand periods, closely following the actual consumption curve. In contrast, the RF model shows greater deviation from actual values during high-demand periods, indicating reduced accuracy in forecasting peak HVAC loads.

The ANN model demonstrates strong agreement with actual HVAC demand, especially during peak usage periods. For example, at sample index 1100, the ANN accurately predicts a demand spike of 270 kW, closely matching the observed peak of 290 kW. In contrast, while the RF model captures the overall demand trend, it systematically underestimates consumption during high-demand periods, predicting only 200 kW at the same sample point, significantly below the actual 290 kW requirement.

For the FUHEM Building’s heating demand analysis, shown in Figure 4b, we evaluated 100 randomly sampled data points across one year. The actual demand shows considerable temporal variability, posing a challenge for accurate prediction. Both models attempt to replicate this fluctuation, with their performance depending on how well they track the dynamic changes in the actual demand curve.

The relationship between actual HVAC demand figures and the predicted values from the Random Forest (RF) and Artificial Neural Network (ANN) models is effectively illustrated in the scatter plot presented in Figure 5a. Such scatter plots are valuable in evaluating the accuracy and effectiveness of prediction models within HVAC systems. The plot indicates that the HVAC demand is dynamic and varied, with actual values (represented by blue dots) ranging from approximately −250 kW to 350 kW. The red dots, depicting the ANN model’s predictions cluster closely along the 45 degree line, indicating a strong alignment with actual values. In contrast, the green dots representing the RF model’s predictions are more dispersed and deviate further from the 45 degree line. Despite this, the RF model still shows a positive correlation with actual values; however, the wide scatter suggests it struggles to capture the full complexity of HVAC demand. This performance gap between the ANN and RF models suggests that neural network approaches may offer advantages for accurately predicting HVAC system needs.

Figure 5b compares actual heating demand values with forecasts from both models: Random Forest and Artificial Neural Network. The actual demand ranges from 0 to about 35 kWh. The ANN predictions, primarily shown in blue, closely align with actual values, particularly for lower demands. Conversely, the RF forecasts (in green) are less accurate, especially for larger demand figures, indicating that, while RF identifies certain trends, it may not fully capture peak demand.

Figure 6a shows a boxplot comparing real HVAC (Heating, Ventilation, and Air Conditioning) demand with ANN and RF model predictions. Using a box plot allowed us to see the entire distribution or spread of the data, which offers us some insight into how each model performed when forecasting HVAC demand. The box plot comparing the actual data with the ANN model shows a tighter spread and less variability than the actual data. With respect to the box plot, the median prediction from the ANN recursive neural network model was very close to the actual data and captured the ‘center’ of HVAC demand, as other researchers in predictive modeling call it. In fact, the IQR (interquartile range) for the predicted demand values of the ANN model were tighter than the actual data. This implies that the ANN model had reduced variability overall and better predictions of HVAC demand.

Figure 6b illustrates the differences in the distributions of real heating demand values with the predicted heating demand values from the ANN and RF models. Real heating demand ranges from 0 to just over 35, reaching a 50th percentile (median) of about 5 kWh. Real heating demand exhibits high variability. The predicted heating demand by the ANN is more centralized with a smaller interquartile range (IQR). In addition, the median predicted value sits closer to the median of true heating demand. This suggests that the ANN model captured heating demand patterns well. The RF model had similarities to the ANN model but had more variability and more extreme values with a larger IQR. Although its median value is also around 5 kWh, it is not as centralized as the ANN model. The analysis in this study helps to assess model performance and shows when the output of the ANN and RF models can be used for improvement assessments when predicting the heating demand of the FUHEM Building.

The cumulative distribution function (CDF) analysis reveals strong agreement between actual HVAC demand and predictions from both ANN and RF models in Figure 7. The characteristic S-shaped curves demonstrate that HVAC demand predominantly falls within moderate ranges, with relatively few extreme values observed. Both machine learning models successfully capture this distribution pattern, with their prediction curves closely following the empirical data. This close alignment indicates robust predictive capability for HVAC demand across both approaches. While the models show comparable overall performance, subtle differences in their CDF curves emerge, particularly in how they handle the upper and lower tails of the distribution. These variations likely stem from fundamental differences in how each algorithm processes noise and nonlinear relationships within the dataset. The ANN model appears slightly more responsive to demand fluctuations at distribution extremes, while the RF maintains a more consistent performance across the central range of values. Such nuanced performance characteristics may inform model selection depending on specific application requirements.

Residuals play a vital role in assessing how well the models predict HVAC demand and their overall accuracy. Below, we discuss each curve and its implications for HVAC demand in detail. Most residuals are close to zero, indicated by a prominent peak at zero on the horizontal axis. Since the predicted values generally fall within a narrow range of the actual values, we can conclude that both models effectively forecast HVAC demand. The histogram for the ANN model is distinct, featuring a larger and narrower peak on the left compared to the RF model on the right, as shown in Figure 8. This indicates that the ANN model has fewer and less varied residuals overall. These findings suggest that the ANN model is better suited for high-precision applications, such as real-time HVAC or energy management.

Conversely, the use of marketing cooperatives in the RF model may have contributed to its broader distribution, accommodating noise or variations in demand patterns, which can be advantageous for unexpected situations. Both models exhibit good performance, with residuals remaining close to zero. This higher precision could lead to improved energy efficiency and cost savings, though it also suggests that the ANN model may be more sensitive to fluctuations in its input data. While the RF model demonstrates a different performance profile, it may be more adept at handling unpredictable demand spikes. Additionally, when selecting a model, one should consider the computational complexity, as the ANN is relatively more complex in terms of training and querying than the RF model.

3.2. Evaluation of Performance Metrics

The essential performance metrics—RMSPE, MAPE, correlation heatmaps, KGE, NSE, and the coefficient of determination (R²)—collectively provide a framework for the development of these machine learning models. RMSPE and MAPE are fundamental metrics used to assess the difference between observed and predicted values. While RMSPE is sensitive to larger errors and highlights significant deviations, MAPE offers a more straightforward interpretation of average performance by calculating the average magnitude of errors without regard to their direction.

3.2.1. Correlations of Heatmap

The close correspondence between the actual HVAC demand and both predicted values would imply that the models capture the major determinants of energy use. Both the Neural Network and Random Forest models showed the effective prediction and management of HVAC (Heating, Ventilation and Air Conditioning) systems, as shown in Figure 9a. In the correlation heat map, each variable had self-correlation with correlation equal to 1.0, which was represented by the parallel entries on the diagonal. The off-diagonal entries demonstrated how the different variables relate to each other, and the value of the correlation coefficient described the strength and direction of the linear (or quasi-linear) relationship. The analysis demonstrated that the actual HVAC demand correlated very well with the Neural Network predictions of 0.9627, which demonstrates that the model managed the actual demand well. This relationship strength and direction reflects the Neural Network’s ability to replicate the actual real-world dynamics. The actual demand also correlated with the Random Forest predictions of 0.9445, to a slightly lesser extent than the Neural Network but also indicating a solid relationship.

Figure 9b presents the correlation heatmap for the FUHEM Madrid dataset, which indicates the relationship between actual demand, the predicted demand by the Neural Network, and the predicted demand by the Random Forest. The correlation of the actual demand to the Neural-Network-predicted demand is perfect (1.0), indicating a very strong linear relationship, where increases in actual demand are followed by increases in predicted demand. The correlation of actual demand to the Random-Forest-predicted demand is also very close to one at 0.9948, which still indicates a positive and strong association. Finally, the correlation between both predicted models—Neural Network and Random Forest—is approximately equal to 0.9951, which suggests that the models were producing the same or very similar predictions for the FUHEM project. Values close to one such as what we see here indicate a very strong relationship, which further affirms these models’ corresponding performance and reliability as a predictive model.

3.2.2. Evaluation Metrics (MAPE, RMSPE, KGE, NSE, R²)

The comparative performance evaluation of the Artificial Neural Network (ANN) and Random Forest (RF) models is presented through two distinct bar graphs in Figure 10. The first graph focuses on error metrics, specifically Mean Absolute Percentage Error (MAPE) and Root Mean Square Percentage Error (RMSPE), which are crucial for assessing prediction accuracy. The second graph examines efficiency and reliability using Kling–Gupta Efficiency (KGE), Nash–Sutcliffe Efficiency (NSE), and R-squared (R²) metrics, all of which are fundamental for evaluating model performance in predictive applications.

An analysis of error metrics reveals that the Random Forest model demonstrates marginally better performance than the Neural Network. The ANN shows an MAPE of 3.6727% compared to RF’s 4.3733%, while, for RMSPE, the ANN records 11.4334% versus RF’s 14.1956%. These differences, though relatively small, suggest that RF may have a slight advantage in minimizing prediction errors for this application. However, when examining the efficiency metrics, the Neural Network clearly outperforms Random Forest, achieving near-perfect scores of approximately one for all three measures (KGE, NSE, and R²). This indicates superior overall model efficiency and stronger correlation with the observed data.

Further validation using the FUHEM Madrid dataset confirms the ANN’s superior predictive capability. The model achieves an exceptionally low MAPE of 0.0272% compared to RF’s 0.5426%, demonstrating remarkable accuracy. Similarly, the ANN attains a perfect NSE score of 1, while RF scores 0.9. These consistent results across multiple evaluation metrics strongly suggest that the Neural Network is the more effective choice for HVAC demand prediction in this context, particularly when considering both accuracy and model efficiency. A comprehensive comparison provides valuable insights for model selection based on specific performance requirements and application needs.

3.2.3. Regression for (ANN)

Figure 11a presents a series of scatterplots comparing the predicted outputs to the actual target values across training, validation, and test datasets, as well as the combined data collected from Lucia, Spain. Each scatterplot includes a best-fit line representing the predicted versus actual values, with the R correlation coefficient indicates the strength and direction of the linear relationship. For the training dataset, the correlation coefficient is R = 0.9744, demonstrating a strong correlation and indicating good model performance. The validation and test datasets have R values of 0.9712 and 0.9698, respectively, reflecting a slight decrease in prediction accuracy but still maintaining high performance. This minor decline could suggest some overfitting, where the model captures the training data very well but may not generalize perfectly to unseen data. The combined dataset yields an overall R value of 0.9733, confirming that the model performs reliably across different data subsets and supports its practical applicability. These results highlight the model’s effectiveness in generalizing various locations and datasets.

Figure 11b illustrates the performance of the predictive model across different FUHEM datasets: training, validation, test, and the combined set (all data). The training plot shows a regression line aligned with the identity line (Y = T), with an R² of 1, indicating perfect agreement between predicted and actual values. The validation plot, shown in green, also has an R² of 1, demonstrating the model’s high accuracy on validation data. Similarly, the test plot, depicted in red, exhibits an R² of 1, confirming that the model can accurately predict outcomes on unseen data. Overall, these plots underscore the model’s exceptional predictive capability across all datasets.

3.2.4. Training State for (ANN)

The model’s dynamics training at epoch 4250 helps illustrate the characteristics of the learning process, as shown in Figure 12a. The gradient values in the first subplot suggest a great deal of initial variation, going above 0.1 before stabilizing around the 0.01 mark. This variation suggests that the learning rate may have been set a bit high initially; however, the eventual stability near the end of learning indicates that convergence was ultimately successful. The metric variations in the stability of the second subplot appear to show that the parameters of the model reached convergence, with very little updating actually occurring by the end of training. Ultimately, the validation performance subplot indicates stability improvement over the determined epoch to the limit of approximately 4000 by epoch 4250, which validates the model’s learning ability to generalize based on the training data.

Differently, the second training example shown in Figure 12b continued for only 238 epochs, which indicates equivalent training but equally informative stability patterns. The gradient values suggest that the trend smooths downwards, and then stabilizes without oscillation across epochs, indicating that the model converged stably to an optimum solution. The parameter µ demonstrates an astonishing consistency across training, and appears to have likely reached a steady state or possibly a local minimum. It is likely that the model’s training parameters were aptly selected with regard to the learning rate that neither over-updated the parameters nor committed the learning to premature convergence to a local minimum. Overall, the various training trajectories provide some reasonable benchmarks on how generations of the model optimized and converged with distinct training characteristics.

3.2.5. Performance (ANN)

Figure 13a shows the progression of performance for a given machine learning model over epochs, i.e., looking specifically at Mean Squared Error (MSE) for the training, validation, and test datasets. The first graph shows the optimal validation performance (approx. 118.81) at epoch 250, and that the error curves for all three datasets (train, validation, and test) are all close together toward the latter epochs, indicating that the model learned effectively from the training data and generalizes the well to unseen data (as seen by the small differences between train and validation errors). The flattening error after epoch 250 suggests that the model plateaued, and, with additional training, only marginal improvements would occur.

In line with the above, we see in Figure 13b that the model has a more dynamic learning phase, with its best validation performance achieved at epoch 188. In this case, the MSE values on all datasets saw a rapid decline in values in the early epochs and followed a close track overall. The sharp decrease indicates that the model was achieving decreased error values across datasets, showing that the learning rate is potentially appropriate. What is most notable is that it has already decreased a significant amount of error values on the first few epochs of training. This indicates that the model learned to an efficient degree in the early training stages, which lead to a strong foundation in the early phases of its training. Overall, both figures show that the model can learn and adapt over time, and both have elements that lend themselves nicely to learning, but Figure 13b offers a strong case in favor of a more effective stage of early learning.

3.3. Sensitivity Analysis, Feature Importance, and Computational Complexity

The Figure 14 illustrates the comparison analysis of training times of two machine learning models—Artificial Neural Network (ANN) and Random Forest (RF)—for different types of buildings: the LUCIA Building (nZEB) with special consideration for HVAC systems and the FUHEM Building for heating. For the LUCIA Building (a), the ANN takes significantly more training time compared to the Random Forest, owing to the complexity of neural network training, with its multiple epochs, backpropagation, and optimization of numerous weights and biases, which implies more computational demands. The RF model, however, which builds a lot of decision trees in parallel, completes its training much earlier. In the FUHEM Building (b), the same trend is noticed, but with a smaller time difference, which suggests that the efficiency of the models could be data-characteristic- or problem-complexity-dependent, while the performance of the ANN gives suitable results compared to the RF model [32].

Figure 15 illustrates the sensitivity analysis results for the Artificial Neural Network (ANN) and the feature importance for the Random Forest (RF) model across two building types: the LUCIA Building (nZEB) and the FUHEM Building (Heating). In both cases, the sensitivity analysis for the ANN indicates that certain features have a significant impact on the model’s predictions; for instance, operative temperature and air temperature are critical for LUCIA, and heat load/floor area and humidity for FUHEM, as their perturbations lead to substantial changes in the output. Similarly, the RF model’s feature importance highlights temperature and nominal capacity as pivotal factors, consistently showing high importance scores. This alignment between the two models underscores the critical role of these features in influencing HVAC and heating demands, suggesting that they should be prioritized in building energy management strategies [32].

3.4. Summary of Results

Table 6 provides a detailed comparative analysis of the ANN and RF models’ performance in forecasting HVAC and heating demand for both the Lucía and FUHEM Buildings. The evaluation incorporates three key metrics: Mean Absolute Percentage Error (MAPE), Root Mean Square Percentage Error (RMSPE), and correlation coefficients, which collectively assess prediction accuracy and model reliability.

3.5. Practical Applications

Based on the results, integrating the insights from machine learning models like ANN and RF is crucial for building a smart grid. Understanding the training time and feature importance is critical. These data can facilitate the design of real-time energy optimization strategies. The information from the models could be used to create dynamic models which are data-driven. If it is computationally feasible, RF could be used for rapid real-time predictions, but if high accuracy is required, the model can switch to ANN. The RF model can use both to give data-driven optimization. The optimization algorithm will then send a command to the grid to adjust the power of generation and distribution. The scalability of the forecasting model across diverse building types and energy demand profiles is a key consideration. For traditional buildings, the model can be adapted using historical energy consumption data and incorporate features like occupancy patterns and weather conditions. However, for near-zero-energy buildings (nZEB), the model needs to be refined with additional parameters like renewable energy generation, building envelope characteristics, and advanced HVAC systems. The model should be scalable and adaptable to different building types and energy demand profiles. The dynamic nature of nZEBs requires more sophisticated feature sets, including renewable energy production and storage, along with granular energy consumption data to ensure accurate predictions and optimal performance.

4. Conclusions

The research gap was addressed through this study as it compared the performance of Artificial Neural Network (ANN) and Random Forest (RF) models with the nature of their building energy predictions for HVAC and heating demand levels in near-zero-energy buildings (nZEB) and traditional buildings across a range of regional climates in Spain to meet the demands of predicting energy levels in nZEBs in Spain. It was evidenced that the ANN consistently outperformed the RF in predicting energy demand for nZEBs and heating demands whilst the RF advertently underestimated peak demands. The prediction performance of the ANN turned out to be more consistent with real demand levels and showed a high correlation with real demand (R² ≈ 1.0) with low error metrics like MAPE and RMSPE to demonstrate the superiority of its accuracy. However, this comes at the cost of longer training times; the Lucia ANN required 40 s compared to Lucia RF’s 2.8 s, and the FUHEM ANN took 1.1 s versus FUHEM RF’s 0.3 s. Despite the computational trade-off, the ANN’s precision makes it a more reliable choice for high-stakes forecasting in energy management. This advancement of the ANN in predicting energy demands for nZEBs now helps support a framework for using AI-driven predictions in Industry 5.0 and complements the sustainability goals of energy performance and efficiency focused on improving energy performance in residential buildings.

By implementing a MATLAB-based framework with real-world data from Spanish case studies (LUCIA, FUHEM), this work fills a critical gap in the energy forecasting literature. This study not only validates the ANN’s robustness across diverse climates and building characteristics, but also highlights its adaptability for practical energy management applications. The comprehensive evaluation, including metrics such as KGE, NSE, and R², reinforces the ANN’s reliability in capturing complex nonlinear demand patterns. In contrast, the RF, while competent, shows limitations in peak demand prediction and slight overfitting during validation, making the ANN the preferred choice for high-accuracy forecasting tasks. The results have significant implications for energy management in Spanish households, particularly in optimizing HVAC systems for nZEBs. The ANN’s ability to model real demand behavior with minimal error makes it a valuable tool for policymakers and energy managers seeking to enhance efficiency and reduce environmental impact.

Using datasets for AI modeling comes with notable challenges, such as the quality of the data itself, decisions related to normalization that can produce distortion, and computational complexity and explainable limitations. From our perspective, advanced models can provide us with great utilities, like XGBoost, autoencoders, or radial basis function networks (RBFNs); however, we decided to use Artificial Neural Networks (ANNs) and Random Forests (RFs) as the pathway to provide computational efficiency and adequate performance from prior work. Moving forward, we should incorporate efforts into improving the precision and transparency of forecasting models by enhancing data preprocessing; exploring hybrid artificial intelligence (AI) applications with corresponding methods or use-cases; employing explainable AI models; and, lastly, studying AI-enabled optimizations for dynamic electricity pricing systems with demand-side management models. Being able to evaluate the feasibility of the modeling across a variety of scenarios will provide an anchor of understanding and credence for energy performance in nZEBs.

Author Contributions

Conceptualization, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; methodology, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; software, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; validation, K.M.S.; formal analysis, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; investigation, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; resources, K.M.S., A.O.E., F.J.R.-M., and J.M.R.-H.; data curation K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; writing—original draft preparation, K.M.S.; writing—review and editing, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; visualization, K.M.S., A.O.E., F.J.R.-M. and J.M.R.-H.; supervision, A.O.E. and J.M.R.-H.; project administration, F.J.R.-M. and J.M.R.-H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

We would like to acknowledge the support received from “LIFE23-CET-Re-Energize” European Project by University of Málaga, Spain; “EUSUVa4.0” Project by University of Valladolid (Spain); “Lime4Health” National Project by Technical University of Madrid (UPM) (Spain); RED-”TRAPECIO” IberAmerican Project by CYTED (Ibero-American Program of Science and Technology for Development); and ITAP Research Institute at University of Valladolid. We would like to acknowledge the use of MATLAB (Version R2018a, MathWorks, https://www.mathworks.com) for data analysis and visualization in this study. Additionally, the images included in this document were created by the authors and are original works.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

List of symbols
Variable	Description
$x = x_{1}, x_{2}, \dots, x_{n}$	Input vector: Represents the input features to the neural network, where $n$ is the number of input parameters.
$z_{j}^{(1)}$	The sum of inputs to the $j$ -th neuron in the first hidden layer, calculated as a weighted sum of inputs.
$w_{i j}$	The weight associated with the connection from the $i$ -th input to the $j$ -th neuron.
$a_{j}^{(1)}$	The output of the $j$ -th neuron in the first hidden layer after applying the activation function.
$k$	Index for neurons in the second hidden layer, indicating the connection from the first hidden layer.
$z_{k}^{(2)}$	The sum of inputs to the $k$ -th neuron in the second hidden layer, calculated similarly to the first layer.
$a_{k}^{(2)}$	The output of the $k$ -th neuron in the second hidden layer, processed through an activation function.
y	The predicted output for energy demand from the output layer of the neural network.
$b_{j}$	The bias term for the $j$ -th neuron in the first hidden layer.
$b_{k}$	The bias term for the $k$ -th neuron in the second hidden layer.
$η$	The learning rate used in the backpropagation algorithm to update weights and biases.
$m$	The total number of neurons in the first hidden layer.
$n$	The total number of input features.
$t$	Total number of trees in the Random Forest.
$N_{t}$	Number of instances from the training set used to create a bootstrap sample $D_{t}$ .
$n$	The feature to split on based on a chosen criterion (e.g., Gini impurity for classification, Mean Squared Error for regression).
$p_{i}$	The proportion of class $i$ in the node
$N$	The total number of instances at the node.
$\hat{y}$	The predicted output for regression tasks, calculated as the average of predictions from individual trees.
$y$	The actual output value for regression tasks.
${\hat{y}}_{i}$	The predicted class in classification tasks, determined by majority vote among the trees.
$O O B$	Out-of-bag observations, which are instances not included in a tree’s bootstrap sample, used for performance estimation.
$I$	An indicator function that equals 1 if the predicted class ${\hat{y}}_{i}$ does not match the actual class $y_{i}$ .
List of abbreviations
ANN	Artificial Neural Network
RF	Random Forest
RMSPE	Root Mean Square Percentage Error
MAPE	Mean Absolute Percentage Error
MARE	Mean Absolute Relative Error
RMSRE	Root Mean Squared Relative Error
KGE	Kling–Gupta Efficiency
NSE	Nash–Sutcliffe Efficiency
nZEB	Near-Zero-Energy Building
AI	Artificial Intelligence
IoT	Internet of Things
HVAC	Heating, Ventilation, and Air Conditioning
NAR	Nonlinear Autoregressive
MSHD	Method of Spatial Homogenization Decomposition
SVR	Support Vector Machine
ReLU	Rectified Linear Unit
OOB	Out-of-Bag
CDF	Cumulative Distribution Function
LM	Levenberg–Marquardt

References

Rane, N. ChatGPT and Similar Generative Artificial Intelligence (AI) for Building and Construction Industry: Contribution, Opportunities and Challenges of Large Language Models for Industry 4.0, Industry 5.0, and Society 5.0. Oppor. Chall. Large Lang. Models Ind. 2023, 4. [Google Scholar] [CrossRef]
Musarat, M.A.; Irfan, M.; Alaloul, W.S.; Maqsoom, A.; Ghufran, M. A Review on the Way Forward in Construction through Industrial Revolution 5.0. Sustainability 2023, 15, 13862. [Google Scholar] [CrossRef]
Ghobakhloo, M.; Iranmanesh, M.; Tseng, M.-L.; Grybauskas, A.; Stefanini, A.; Amran, A. Behind the Definition of Industry 5.0: A Systematic Review of Technologies, Principles, Components, and Values. J. Ind. Prod. Eng. 2023, 40, 432–447. [Google Scholar] [CrossRef]
Rožanec, J.M.; Novalija, I.; Zajec, P.; Kenda, K.; Tavakoli Ghinani, H.; Suh, S.; Veliou, E.; Papamartzivanos, D.; Giannetsos, T.; Menesidou, S.A. Human-Centric Artificial Intelligence Architecture for Industry 5.0 Applications. Int. J. Prod. Res. 2023, 61, 6847–6872. [Google Scholar] [CrossRef]
Korodi, A.; Nițulescu, I.-V.; Fülöp, A.-A.; Vesa, V.-C.; Demian, P.; Braneci, R.-A.; Popescu, D. Integration of Legacy Industrial Equipment in a Building-Management System Industry 5.0 Scenario. Electronics 2024, 13, 3229. [Google Scholar] [CrossRef]
Ghobakhloo, M.; Iranmanesh, M.; Mubarak, M.F.; Mubarik, M.; Rejeb, A.; Nilashi, M. Identifying Industry 5.0 Contributions to Sustainable Development: A Strategy Roadmap for Delivering Sustainability Values. Sustain. Prod. Consum. 2022, 33, 716–737. [Google Scholar] [CrossRef]
Ikudayisi, A.E.; Chan, A.P.C.; Darko, A.; Adedeji, Y.M.D. Integrated Practices in the Architecture, Engineering, and Construction Industry: Current Scope and Pathway towards Industry 5.0. J. Build. Eng. 2023, 73, 106788. [Google Scholar] [CrossRef]
Suganthi, L.; Samuel, A.A. Energy Models for Demand Forecasting—A Review. Renew. Sustain. Energy Rev. 2012, 16, 1223–1240. [Google Scholar] [CrossRef]
Radwan, M.; Alhussan, A.A.; Ibrahim, A.; Tawfeek, S.M. Potato Leaf Disease Classification Using Optimized Machine Learning Models and Feature Selection Techniques. Potato Res. 2024, 513, 1–25. [Google Scholar] [CrossRef]
Eed, M.; Alhussan, A.A.; Qenawy, A.-S.T.; Osman, A.M.; Elshewey, A.M.; Arnous, R. Potato Consumption Forecasting Based on a Hybrid Stacked Deep Learning Model. Potato Res. 2024, 68, 809–833. [Google Scholar] [CrossRef]
Wang, Z.; Srinivasan, R.S. A Review of Artificial Intelligence Based Building Energy Use Prediction: Contrasting the Capabilities of Single and Ensemble Prediction Models. Renew. Sustain. Energy Rev. 2017, 75, 796–808. [Google Scholar] [CrossRef]
Ciulla, G.; D’amico, A.; Brano, V.L.; Traverso, M. Application of Optimized Artificial Intelligence Algorithm to Evaluate the Heating Energy Demand of Non-Residential Buildings at European Level. Energy 2019, 176, 380–391. [Google Scholar] [CrossRef]
Vieri, A.; Gambarotta, A.; Morini, M.; Saletti, C. An Integrated Artificial Intelligence Approach for Building Energy Demand Forecasting. Energies 2024, 17, 4920. [Google Scholar] [CrossRef]
Antonopoulos, I.; Robu, V.; Couraud, B.; Kirli, D.; Norbu, S.; Kiprakis, A.; Flynn, D.; Elizondo-Gonzalez, S.; Wattam, S. Artificial Intelligence and Machine Learning Approaches to Energy Demand-Side Response: A Systematic Review. Renew. Sustain. Energy Rev. 2020, 130, 109899. [Google Scholar] [CrossRef]
Raza, M.Q.; Khosravi, A. A Review on Artificial Intelligence Based Load Demand Forecasting Techniques for Smart Grid and Buildings. Renew. Sustain. Energy Rev. 2015, 50, 1352–1372. [Google Scholar] [CrossRef]
Salem, K.M.; Rady, M.; Aly, H.; Elshimy, H. Design and Implementation of a Six-Degrees-of-Freedom Underwater Remotely Operated Vehicle. Appl. Sci. 2023, 13, 6870. [Google Scholar] [CrossRef]
El-kenawy, E.-S.M.; Khodadadi, N.; Mirjalili, S.; Abdelhamid, A.A.; Eid, M.M.; Ibrahim, A. Greylag Goose Optimization: Nature-Inspired Optimization Algorithm. Expert Syst. Appl. 2024, 238, 122147. [Google Scholar] [CrossRef]
Yassen, M.A.; Abdel-Fattah, M.G.A.; Ismail, I.; El Kenawy, E.-S.M.; Moustafa, H.E.-D. An AI-Based System for Predicting Renewable Energy Power Output Using Advanced Optimization Algorithms. J. Artif. Intell. Metaheuristics 2024, 8, 1–8. [Google Scholar] [CrossRef]
El-Sayed, E.; Eid, M.M.; Abualigah, L. Machine Learning in Public Health Forecasting and Monitoring the Zika Virus. Metaheuristic Optim. Rev. 2024, 1, 1–11. [Google Scholar] [CrossRef]
Ferlito, S.; Atrigna, M.; Graditi, G.; De Vito, S.; Salvato, M.; Buonanno, A.; Di Francia, G. Predictive Models for Building’s Energy Consumption: An Artificial Neural Network (ANN) Approach. In Proceedings of the 2015 XVIII Aisem Annual Conference, Trento, Italy, 3–5 February 2015; pp. 1–4. [Google Scholar]
Li, Z.; Dai, J.; Chen, H.; Lin, B. An ANN-Based Fast Building Energy Consumption Prediction Method for Complex Architectural Form at the Early Design Stage. Build. Simul. 2019, 12, 665–681. [Google Scholar] [CrossRef]
Ekici, B.B.; Aksoy, U.T. Prediction of Building Energy Consumption by Using Artificial Neural Networks. Adv. Eng. Softw. 2009, 40, 356–362. [Google Scholar] [CrossRef]
Verma, A.; Prakash, S.; Kumar, A. ANN-based Energy Consumption Prediction Model up to 2050 for a Residential Building: Towards Sustainable Decision Making. Environ. Prog. Sustain. Energy 2021, 40, e13544. [Google Scholar] [CrossRef]
Orosa, J.A.; Vergara, D.; Costa, Á.M.; Bouzón, R. A Novel Method for NZEB Internal Coverings Design Based on Neural Networks. Coatings 2019, 9, 288. [Google Scholar] [CrossRef]
Arida, M.; Nassif, N.; Talib, R.; Abu-Lebdeh, T. Building Energy Modeling Using Artificial Neural Networks. Energy Res. J. 2017, 7, 24–34. [Google Scholar] [CrossRef]
Rodrigues, F.; Cardeira, C.; Calado, J.M.F. The Daily and Hourly Energy Consumption and Load Forecasting Using Artificial Neural Network Method: A Case Study Using a Set of 93 Households in Portugal. Energy Procedia 2014, 62, 220–229. [Google Scholar] [CrossRef]
Chae, Y.T.; Horesh, R.; Hwang, Y.; Lee, Y.M. Artificial Neural Network Model for Forecasting Sub-Hourly Electricity Usage in Commercial Buildings. Energy Build. 2016, 111, 184–194. [Google Scholar] [CrossRef]
Cáceres, L.; Merino, J.I.; Díaz-Díaz, N. A Computational Intelligence Approach to Predict Energy Demand Using Random Forest in a Cloudera Cluster. Appl. Sci. 2021, 11, 8635. [Google Scholar] [CrossRef]
Wang, Z.; Wang, Y.; Zeng, R.; Srinivasan, R.S.; Ahrentzen, S. Random Forest Based Hourly Building Energy Prediction. Energy Build. 2018, 171, 11–25. [Google Scholar] [CrossRef]
Ahmad, T.; Chen, H. Nonlinear Autoregressive and Random Forest Approaches to Forecasting Electricity Load for Utility Energy Management Systems. Sustain. Cities Soc. 2019, 45, 460–473. [Google Scholar] [CrossRef]
Chen, Y.-T.; Piedad, E., Jr.; Kuo, C.-C. Energy Consumption Load Forecasting Using a Level-Based Random Forest Classifier. Symmetry 2019, 11, 956. [Google Scholar] [CrossRef]
Liu, Y.; Chen, H.; Zhang, L.; Feng, Z. Enhancing Building Energy Efficiency Using a Random Forest Model: A Hybrid Prediction Approach. Energy Rep. 2021, 7, 5003–5012. [Google Scholar] [CrossRef]
Yagli, G.M.; Yang, D.; Srinivasan, D. Automatic Hourly Solar Forecasting Using Machine Learning Models. Renew. Sustain. Energy Rev. 2019, 105, 487–498. [Google Scholar] [CrossRef]
Wang, Z.; Hong, Y.; Huang, L.; Zheng, M.; Yuan, H.; Zeng, R. A comprehensive review and future research directions of ensemble learning models for predicting building energy consumption. Energy Build. 2025, 115589. [Google Scholar] [CrossRef]
Chen, Y.-H.; Li, Y.-Z.; Jiang, H.; Huang, Z. Research on Household Energy Demand Patterns, Data Acquisition and Influencing Factors: A Review. Sustain. Cities Soc. 2023, 99, 104916. [Google Scholar] [CrossRef]
Salem, K.M.; Rey-Hernández, J.M.; Rey-Martínez, F.J.; Elgharib, A.O. Assessing the Accuracy of AI Approaches for CO₂ Emission Predictions in Buildings. J. Clean. Prod. 2025, 513, 145692. [Google Scholar] [CrossRef]
Kim, Y.-S.; Kim, M.K.; Fu, N.; Liu, J.; Wang, J.; Srebric, J. Investigating the Impact of Data Normalization Methods on Predicting Electricity Consumption in a Building Using Different Artificial Neural Network Models. Sustain. Cities Soc. 2024, 118, 105570. [Google Scholar] [CrossRef]
Fast, M.; Assadi, M.; De, S. Development and Multi-Utility of an ANN Model for an Industrial Gas Turbine. Appl. Energy 2009, 86, 9–17. [Google Scholar] [CrossRef]
Agatonovic-Kustrin, S.; Beresford, R. Basic Concepts of Artificial Neural Network (ANN) Modeling and Its Application in Pharmaceutical Research. J. Pharm. Biomed. Anal. 2000, 22, 717–727. [Google Scholar] [CrossRef]
Wu, W.; Dandy, G.C.; Maier, H.R. Protocol for Developing ANN Models and Its Application to the Assessment of the Quality of the ANN Model Development Process in Drinking Water Quality Modelling. Environ. Model. Softw. 2014, 54, 108–127. [Google Scholar] [CrossRef]
Elkatatny, S.; Tariq, Z.; Mahmoud, M. Real Time Prediction of Drilling Fluid Rheological Properties Using Artificial Neural Networks Visible Mathematical Model (White Box). J. Pet. Sci. Eng. 2016, 146, 1202–1210. [Google Scholar] [CrossRef]
Betiku, E.; Omilakin, O.R.; Ajala, S.O.; Okeleye, A.A.; Taiwo, A.E.; Solomon, B.O. Mathematical Modeling and Process Parameters Optimization Studies by Artificial Neural Network and Response Surface Methodology: A Case of Non-Edible Neem (Azadirachta Indica) Seed Oil Biodiesel Synthesis. Energy 2014, 72, 266–273. [Google Scholar] [CrossRef]
Pavlenko, I.; Trojanowska, J.; Ivanov, V.; Liaposhchenko, O. Scientific and Methodological Approach for the Identification of Mathematical Models of Mechanical Systems by Using Artificial Neural Networks. In Innovation, Engineering and Entrepreneurship; Springer: Berlin/Heidelberg, Germany, 2019; pp. 299–306. [Google Scholar]
Farnaaz, N.; Jabbar, M.A. Random Forest Modeling for Network Intrusion Detection System. Procedia Comput. Sci. 2016, 89, 213–217. [Google Scholar] [CrossRef]
Naghibi, S.A.; Ahmadi, K.; Daneshi, A. Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping. Water Resour. Manag. 2017, 31, 2761–2775. [Google Scholar] [CrossRef]
Kamusoko, C.; Gamba, J. Simulating Urban Growth Using a Random Forest-Cellular Automata (RF-CA) Model. ISPRS Int. J. Geoinf. 2015, 4, 447–470. [Google Scholar] [CrossRef]
Fawagreh, K.; Gaber, M.M.; Elyan, E. Random Forests: From Early Developments to Recent Advancements. Syst. Sci. Control Eng. Open Access J. 2014, 2, 602–609. [Google Scholar] [CrossRef]
Paterakis, N.G.; Mocanu, E.; Gibescu, M.; Stappers, B.; van Alst, W. Deep Learning versus Traditional Machine Learning Methods for Aggregated Energy Demand Prediction. In Proceedings of the 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Torino, Italy, 26–29 September 2017; pp. 1–6. [Google Scholar]
Khalil, M.; McGough, A.S.; Pourmirza, Z.; Pazhoohesh, M.; Walker, S. Machine Learning, Deep Learning and Statistical Analysis for Forecasting Building Energy Consumption—A Systematic Review. Eng. Appl. Artif. Intell. 2022, 115, 105287. [Google Scholar] [CrossRef]
Forootan, M.M.; Larki, I.; Zahedi, R.; Ahmadi, A. Machine Learning and Deep Learning in Energy Systems: A Review. Sustainability 2022, 14, 4832. [Google Scholar] [CrossRef]
Elreafay, A.M.; Salem, K.M.; Abumandour, R.M.; Dawood, A.S.; Al Nuaimi, S. Effect of Particle Diameter and Void Fraction on Gas–Solid Two-Phase Flow: A Numerical Investigation Using the Eulerian–Eulerian Approach. Comput. Part. Mech. 2024, 12, 289–311. [Google Scholar] [CrossRef]
Robinson, C.; Dilkina, B.; Hubbs, J.; Zhang, W.; Guhathakurta, S.; Brown, M.A.; Pendyala, R.M. Machine Learning Approaches for Estimating Commercial Building Energy Consumption. Appl. Energy 2017, 208, 889–904. [Google Scholar] [CrossRef]
Yokoyama, R.; Wakui, T.; Satake, R. Prediction of Energy Demands Using Neural Network with Model Identification by Global Optimization. Energy Convers. Manag. 2009, 50, 319–327. [Google Scholar] [CrossRef]
Ikeda, S.; Ooka, R. A New Optimization Strategy for the Operating Schedule of Energy Systems under Uncertainty of Renewable Energy Sources and Demand Changes. Energy Build. 2016, 125, 75–85. [Google Scholar] [CrossRef]
Salem, K.M.; Elreafay, A.M.; Abumandour, R.M.; Dawood, A.S. Modeling Two-Phase Gas-Solid Flow in Axisymmetric Diffusers Using Cut Cell Technique: An Eulerian-Eulerian Approach. Bound. Value Probl. 2024, 2024, 150. [Google Scholar] [CrossRef]
Abumandour, R.M.; El-Reafay, A.M.; Salem, K.M.; Dawood, A.S. Numerical Investigation by Cut-Cell Approach for Turbulent Flow through an Expanded Wall Channel. Axioms 2023, 12, 442. [Google Scholar] [CrossRef]
Nagai, T. Optimization Method for Minimizing Annual Energy, Peak Energy Demand, and Annual Energy Cost through Use of Building Thermal Storage/Discussion. ASHRAE Trans. 2002, 108, 43. [Google Scholar]
Bahlawan, H.; Morini, M.; Pinelli, M.; Poganietz, W.-R.; Spina, P.R.; Venturini, M. Optimization of a Hybrid Energy Plant by Integrating the Cumulative Energy Demand. Appl. Energy 2019, 253, 113484. [Google Scholar] [CrossRef]
Ferrara, M.; Rolfo, A.; Prunotto, F.; Fabrizio, E. EDeSSOpt–Energy Demand and Supply Simultaneous Optimization for Cost-Optimized Design: Application to a Multi-Family Building. Appl. Energy 2019, 236, 1231–1248. [Google Scholar] [CrossRef]

Figure 1. An architectural overview of the buildings: (a) LUCIA Building, (b) FUHEM Building.

Figure 2. Workflow for energy demand.

Figure 3. Block diagram for analyzing energy demand: ANN vs. Random Forest.

Figure 4. Actual vs. predicted demand: (a) (LUCIA Building, nZEB -HVAC), (b) (FUHEM Building-Heating).

Figure 5. Scatter plot: actual and predicted: (a) (LUCIA Building, nZEB-HVAC), (b) (FUHEM Building-Heating).

Figure 6. Box plot: actual vs. predicted demand: (a) (LUCIA Building, nZEB-HVAC), (b) (FUHEM Building-Heating).

Figure 7. CDF for LUCIA Building, nZEB-HVAC.

Figure 8. Histogram of residuals for LUCIA Building, nZEB-HVAC.

Figure 9. Correlations of heatmap: (a) (LUCIA Building, nZEB-HVAC), (b) (FUHEM Building-Heating).

Figure 10. Performance of two machine learning models: (a) (LUCIA Building, nZEB-HVAC), (b) (FUHEM Building-Heating).

Figure 11. Regression for ANN: (a) (LUCIA Building, nZEB-HVAC), (b) (FUHEM Building-Heating).

Figure 12. Training state for ANN: (a) (LUCIA Building, nZEB-HVAC), (b) (FUHEM Building-Heating).

Figure 13. Performance for ANN: (a) (LUCIA Building, nZEB-HVAC), (b) (FUHEM Building-Heating).

Figure 14. Training time for ANN and RF: (a) (LUCIA Building, nZEB-HVAC), (b) (FUHEM Building-Heating).

Figure 15. Sensitivity analysis and feature importance for two buildings: (a) LUCIA Building ANN and RF, (b) FUHEM Building (Heating) ANN and RF.

Table 1. Summarizing the literature review.

Reference Number	Mathematical Model	Variables	Proposed	Time Period	Accuracy
Ferlito et al. [20]	ANN	-Weather conditions -Building structure and characteristics -Energy consumption of components (like lighting and HVAC systems) -Level of occupancy	Forecasting building energy demand	2011–2013	RMSPE 15.7% to 17.97%
Li et al. [21]	ANN	-Building energy consumption	Fast energy consumption prediction for complex architecture	2015–2016	Cooling/Heating: ±10% (MCD); Total: ±10% (MSHD)
Ekici et al. [22]	ANN	-Orientation of the building -Insulation thickness (ranging from 0 to 15 cm) -Transparency ratio (15%, 20%, and 25%)	Predicting building energy needs using orientation, insulation, and transparency ratio	2006–2007	Deviation: 3.43%; Prediction rate: 94.8–98.5%
Verma et al. [23]	ANN	-Predicted temperature and relative humidity -Building characteristics -Energy consumption components	Energy consumption prediction for a 2BHK multizone building (two bedrooms, one living room, one kitchen, and two toilets)	2001–2017	95% coefficient bounds
Orosa et al. [24]	ANN	-Weather conditions -Real vapor permeability of internal coverings --Behavioral groups of indoor ambiences	Modeling permeable coverings for nZEB indoor conditions	2019–2020	The Mean Absolute Error (MAE) or Mean Squared Error (MSE) values.
Arida et al. [25]	ANN	-Geometrical characteristics -Shape factor -Areas of building	Forecasting total building energy consumption	2014–2016	CV or RMSE: 1.7–7.7%
Cáceres et al. [28]	RF	-Apartment area -Number of occupants -Electrical appliance consumption	Household energy demand forecasting using Big Data	2011–2014	RMSPE, MAPE, R²
Wang et al. [29]	RF	-Parameter settings of the Random Forest (RF) -Influential features-	Hourly building energy prediction for educational buildings	2014–2015	RF outperformed RT by 14–25% and SVR by 5–5.5%
Ahmad and Chen [30]	NARM, LMSR, RF	-Weather changes -Medium-term (MT) and long-term (LT) predictions	Medium-term and long-term energy prediction for utilities and industrial customers	2009	CV of LSBoost: 5.019% (summer), 3.159% (autumn), 3.292% (winter), 3.184% (spring)
Chen et al. [31]	RF	-Historical data of energy consumption -Classification levels -Prediction methods	Predicts actual numerical energy values, then classifies into levels (low, average, high)	2016	Execution time across 3, 5, and 7 energy level cases
Liu et al. [32]	RF-ANN	-Heat transfer coefficients -Solar radiation absorption coefficients -Window–wall ratio	Predicting building energy consumption based on building envelope design parameters	2015	Predicted RMSE is 0.0115, and R² is 0.933, which reflect the high accuracy of the RF prediction model

Table 2. Sample of the dataset.

Zone	Rated Capacity (kW)	Rated Flow (m³/s)	Total Refrigeration Load (kW)	Sensible Load (kW)	Latent Load (kW)	Air Temperature (°C)	Humidity (%)	Max. Refrigeration Hour	Max. Operating Temperature (°C)
Zone 1	5.95	0.407	5.17	4.8	0.37	24	51.2	Jul 11:00	29.81
Zone 3	11.13	0.896	9.68	9.68	0	23	49.7	Aug 14:00	31.2
Zone 10	8.2	0.57	7.13	6.72	0.42	24	54.3	Jul 15:00	31.12
Zone 9	7.25	0.509	6.31	6.01	0.3	24	55.2	Jul 15:00	29.03
Zone 8	7.2	0.506	6.26	5.97	0.29	24	55.3	Jul 15:00	28.85
Zone 7	7.19	0.506	6.25	5.96	0.29	24	55.3	Jul 15:00	28.83
Zone 6	7.18	0.505	6.25	5.96	0.29	24	55.3	Jul 15:00	28.82
Zone 5	7.19	0.505	6.25	5.96	0.29	24	55.3	Jul 15:00	28.83
Zone 2	3.54	0.241	3.08	3.08	0	25	44.4	Aug 14:00	29.87
Zone 4	0.5	0.04	0.43	0.43	0	23.01	43.8	Aug 07:00	28.55
Zone 24	5.1	0.41	4.44	4.44	0	23	49.7	Aug 14:00	29.13
Zone 22	0.4	0.027	0.35	0.35	0	25	44.4	Aug 14:00	28.56
Zone 23	0.35	0.024	0.31	0.31	0	25	44.4	Aug 14:00	29.6
Zone 26	4.16	0.287	3.62	3.39	0.23	24	53.7	Jul 15:00	30.6

Table 3. Mathematical model for ANN.

Component	Equation	NO. Equation
Input Layer: The network takes input features	$x = [x_{1}, x_{2}, \dots, x_{n}]$	(1)
First Hidden Layer: Each neuron j in the first hidden layer computes a weighted sum of its inputs	$z_{j}^{(1)} = \sum_{i = 1}^{n} w_{i j}^{(1)} x_{i} + b_{j}^{(1)}$	(2)
This sum is then passed through an activation function f	$a_{j}^{(1)} = f (z_{j}^{(1)})$	(3)
Second Hidden Layer: Each neuron k in the second hidden layer takes the outputs from the first hidden layer as inputs	$z_{k}^{(2)} = \sum_{j = 1}^{m} w_{j k}^{(2)} a_{j}^{(1)} + b_{k}^{(2)}$	(4)
This sum is then passed through an activation function f	$a_{k}^{(2)} = f (z_{k}^{(2)})$	(5)
Output Layer: The output layer produces the final prediction for energy demand	$\hat{y} = f (\sum_{k = 1}^{p} w_{k}^{(3)} a_{k}^{(2)} + b^{(3)})$	(6)
Backpropagation Algorithm	$\begin{matrix} w_{i j}^{(l)} \leftarrow w_{i j}^{(l)} - η \frac{\partial L}{\partial w_{i j}^{(l)}} \\ b_{j}^{(l)} \leftarrow b_{j}^{(l)} - η \frac{\partial L}{\partial b_{j}^{(l)}} \end{matrix}$	(7)

Table 4. Mathematical model for RF.

Component	Equation	NO. Equation
The split criterion for a node n can be expressed as follows	$Impurity (n) = \frac{1}{N} \sum_{i = 1}^{k} p_{i} (1 - p_{i})$	(8)
Prediction	$\hat{y} = \frac{1}{T} \sum_{t = 1}^{T} {\hat{y}}_{t}$	(9)
The predicted class is given by the majority vote among all trees	$\hat{y} = mode ({\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{T})$	(10)
Out-of-bag error	$OOB Error = \frac{1}{N} \sum_{i = 1}^{N} I ({\hat{y}}_{i} \neq y_{i})$	(11)

Table 5. Evaluation metrics equation.

Component	Equation	NO. Equation
Root Mean Square Percentage Error (RMSPE)	$R M S P E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}} \times 100$	(12)
Root Mean Square Relative Percentage Error (RMSRPE):	$R M S R P E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(\frac{(y_{i} - {\hat{y}}_{i})}{y_{i}})}^{2}} \times 100$	(13)
Mean Absolute Percentage Error (MAPE)	$M A P E = \frac{1}{N} \sum_{i = 1}^{N} \|y_{i} - {\hat{y}}_{i}\| \times 100$	(14)
Mean Absolute Relative Percentage Error (MARPE)	$M A R P E = \frac{1}{N} \sum_{i = 1}^{N} \|\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}\| \times 100$	(15)
Kling–Gupta Efficiency (KGE)	$KGE = 1 - \sqrt{{(r - 1)}^{2} + {(\frac{σ_{model}}{σ_{obs}} - 1)}^{2} + {(\frac{μ_{model}}{μ_{o b s}} - 1)}^{2}}$	(16)
Nash–Sutcliffe Efficiency (NSE)	$N S E = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}$	(17)
Coefficient of Determination (R²)	$R^{2} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}$	(18)

Table 6. Evaluation of ANN and Random Forest models in energy demand forecasting.

Metric	Lucia ANN	Lucia RF	FUHEM ANN	FUHEM RF
Mean Absolute Percentage Error (MAPE)	$3.6727 %$	$4.3733 %$	0.0207%	0.5426%
Root Mean Squared Error Percentage (RMSPE)	$11.4334 %$	$14.1956 %$	0.1004%	2.0903%
Mean Absolute Relative Percentage Error (MARPE)	3.562%	6.1%	0.0033%	0.0468%
Root Mean Squared Relative Error (RMSRE)	20.12%	24.56%	0.0055%	0.00796%
Correlation with Actual Demand	0.9627	0.9445	1	0.9948
KGE	0.9373	0.7501	0.9872	0.7401
NSE	0.9267	0.8870	0.9998	0.9316
R-Squared (R²)	0.9267	0.8870	0.9998	0.9316
Training Set R-Value	0.9774	-	1	-
Validation Set R-Value	0.97171	-	1	-
Test Set R-Value	0.96979	-	1	-
Training Time	40 s	2.8 s	1.1 s	0.3 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salem, K.M.; Rey-Hernández, J.M.; Elgharib, A.O.; Rey-Martínez, F.J. Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions. Appl. Sci. 2025, 15, 6806. https://doi.org/10.3390/app15126806

AMA Style

Salem KM, Rey-Hernández JM, Elgharib AO, Rey-Martínez FJ. Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions. Applied Sciences. 2025; 15(12):6806. https://doi.org/10.3390/app15126806

Chicago/Turabian Style

Salem, Khaled M., Javier M. Rey-Hernández, A. O. Elgharib, and Francisco J. Rey-Martínez. 2025. "Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions" Applied Sciences 15, no. 12: 6806. https://doi.org/10.3390/app15126806

APA Style

Salem, K. M., Rey-Hernández, J. M., Elgharib, A. O., & Rey-Martínez, F. J. (2025). Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions. Applied Sciences, 15(12), 6806. https://doi.org/10.3390/app15126806

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Energy Forecasting Using ANN and RF Models for HVAC and Heating Predictions

Abstract

1. Introduction

1.1. Literature Review

1.2. Contributions

2. Methodology

2.1. Data Collection

2.2. Data Preprocessing

Data Normalizing

2.3. Model Development (ANN)

Mathematical Model (ANN)

2.4. Model Development (RF)

Mathematical Model (RF)

2.5. Evaluation Metrics

2.6. Optimization Procedures

3. Results and Discussions

3.1. Actual and Predicted Energy Demand

3.2. Evaluation of Performance Metrics

3.2.1. Correlations of Heatmap

3.2.2. Evaluation Metrics (MAPE, RMSPE, KGE, NSE, R2)

3.2.3. Regression for (ANN)

3.2.4. Training State for (ANN)

3.2.5. Performance (ANN)

3.3. Sensitivity Analysis, Feature Importance, and Computational Complexity

3.4. Summary of Results

3.5. Practical Applications

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2.2. Evaluation Metrics (MAPE, RMSPE, KGE, NSE, R²)