A Comprehensive Review of Data-Driven and Physics-Based Models for Energy Performance in Non-Domestic Buildings

Phiri, Lukumba; Olwal, Thomas O.; Mathonsi, Topside E.

doi:10.3390/en18246481

Open AccessReview

A Comprehensive Review of Data-Driven and Physics-Based Models for Energy Performance in Non-Domestic Buildings

by

Lukumba Phiri

^1,*

,

Thomas O. Olwal

¹

and

Topside E. Mathonsi

²

¹

Department of Electrical Engineering/F’SATI, Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa

²

Department of Information Technology, Tshwane University of Technology, Private Bag X680, Pretoria 0001, South Africa

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(24), 6481; https://doi.org/10.3390/en18246481 (registering DOI)

Submission received: 28 September 2025 / Revised: 19 November 2025 / Accepted: 2 December 2025 / Published: 10 December 2025

Download

Browse Figures

Versions Notes

Abstract

The building sector accounts for a significant portion of the global energy consumption and carbon dioxide (CO₂) emissions, making it a critical area for improving energy efficiency. In Africa, the rapid energy demand and costs have further emphasized the urgency of developing effective solutions for reducing building energy use. This paper presents a comprehensive review of data-driven and physics-based modeling approaches for forecasting and optimizing energy performance in non-domestic buildings. The review highlights the evolution of statistical models, classical machine learning methods, deep learning, and hybrid approaches across various application scenarios. Emphasis is placed on the role of data pre-processing techniques, including data fusion and transfer learning, as strategies to address data limitations and improve model generalization. Furthermore, the study evaluates the strengths and limitations of different modeling methods in terms of accuracy, scalability, and applicability in real-world contexts. By integrating insights from recent literature, this paper identifies key research gaps such as the need for standard datasets, physics-informed hybrid modeling, and policy-oriented frameworks. The findings aim to guide building managers, policymakers, and researchers toward adopting robust data-driven solutions that enhance energy resilience, reduce operational costs, and support environmental sustainability in the built environment. The review also justifies the importance of these models for practical applications like energy benchmarking, retrofit planning, and CO₂ reduction, providing a clear link between research and industry implementation.

Keywords:

building energy efficiency; bata-driven models; physics-based models; machine learning; energy forecasting; non-domestic buildings

1. Introduction

The global building sector is a major contributor to energy consumption and greenhouse gas emissions, with non-domestic buildings presenting a unique challenge due to their complex operational implementation [1,2,3]. The global imperative to combat climate change, driven by the fast rise in average global temperature, makes reducing energy consumption across all economic sectors crucial [4]. The building sector, encompassing residential and non-residential structures, is a primary target for energy efficiency measures, given its substantial share of global energy demand [5,6].

Energy efficiency refers to using the least amount of energy necessary to maintain a building’s comfortable conditions for lighting, equipment, heating, and cooling [7]. Increased electrical energy consumption is one of the major causes of emitting greenhouse gases, which results in global warming [8]. The most prominent greenhouse gas causing global warming is carbon dioxide (CO₂), whose emissions have been steadily rising throughout time [9,10]. For instance, the global average carbon dioxide set a new record high in 2024: 422.7 parts per million (ppm) [11]. Moreover, a doubling of CO₂ would result in a 3.8 °C rise in global temperatures [12].

The three main economic sectors are buildings, transportation, and industry. The building sector consumes a considerable portion of energy among the three sectors [13,14]. 30% of the world’s final energy consumption and 26% of its energy-related emissions come from building operations (direct emissions in buildings make up 8% of this total, while indirect emissions from the generation of electricity and heat used in buildings make up 18%) [15]. In the United States, the building sector accounts for 41 percent of total energy consumption, whereas industry and transportation account for only 29 and 30 percent, respectively [16]. Non-domestic buildings, such as commercial, industrial, and institutional structures, present a unique challenge due to their complex operational dynamics, diverse occupancy patterns, and varied equipment usage. Accurate energy performance modeling is essential for industrial implementation, enabling key actions such as energy benchmarking against standards, reducing future consumption through informed retrofits, and optimizing Building Management Systems (BEMS) for substantial cost and carbon reduction [17]. In addition, the number of floors, thermal characteristics of building materials, dry, cold, and seasonal weather, and the temperature of the dry bulb all affect how much energy a structure uses [18,19,20]. Furthermore, the energy consumption of buildings is largely influenced by their characteristics, and appropriate design techniques have the potential to lower building energy requirements [21].

Non-domestic building sector presents a substantial energy demand, yet its energy performance lacks thorough investigation. Two primary approaches have emerged for this purpose: data-driven and physics-based models. Physics-based models, such as those using building simulation software like EnergyPlus 25.2.0 and ICE IDA 5 [22], rely on detailed physical principles to simulate energy flows. They are highly accurate but require extensive input data and are computationally intensive. Conversely, data-driven models, which include statistical, machine learning (ML), and deep learning (DL) techniques, learn patterns directly from historical energy consumption data. They are more flexible and less data-intensive but may lack generalizability and interpretability.

These models are not merely academic exercises; they are essential tools for achieving tangible energy savings, cost reduction, and regulatory compliance. Data-driven models are crucial for leveraging the vast amounts of data from smart meters and Internet of Things (IoT) sensors to provide real-time insights, predictive maintenance, and dynamic benchmarking. Physics-based models remain indispensable for accurate design-phase simulations, retrofit planning, and understanding the fundamental physical processes governing energy flows. Together, they form the backbone of modern BEMS.

This paper presents a comprehensive review that synthesizes and evaluates the strengths and limitations of both data-driven and physics-based modeling approaches for non-domestic building energy performance. The review focuses on how these models are applied across various tasks, including forecasting, energy benchmarking, and optimization. We provide a systematic framework that aligns different tasks with appropriate metrics, data requirements, and methods. By doing so, this study aims to provide researchers and practitioners with a clear roadmap for selecting and applying the most suitable modeling approach for their specific needs.

The literature search for this review was conducted across several key academic databases, including IEEE Xplore, Google Scholar, ScienceDirect, and MDPI. The primary keywords used were “building energy modeling,” “data-driven models,” “physics-based models,” “machine learning for energy,” “deep learning in buildings,” “energy performance forecasting,” and “non-domestic buildings.” The search was updated to include recent studies from 2023 to 2025 to ensure the review reflects the state of the art.

While Google Scholar was a primary search engine, we acknowledge its algorithmic bias. To mitigate this, we employed a forward and backward citation analysis on key papers to ensure we captured seminal and highly cited works that might not have appeared in initial searches. Papers were included if they focused on modeling energy performance in non-domestic buildings using data-driven or physics-based approaches. Papers focusing on residential buildings or lacking a clear methodological description were excluded.

The article envisages the following importance and benefits:

This paper examines how energy is used in non-domestic buildings, which are responsible for about 20% of the world’s energy usage. It plays an important role for people involved in making energy policy, building management, and those studying sustainability. The study uses technologies like artificial intelligence and machine learning to forecast energy use in buildings. This helps find opportunities to cut costs and improve building operations, reducing both energy waste and expenses. Moreover, the study supports efforts to lower carbon emissions. By integrating data from devices like IoT and smart meters, the research shows how big data can be used effectively for better energy management. This offers scalable solutions that work for various building types.

The study also plays a role in forming better rules and regulations for energy use, aiding governments in improving their energy policies. In the academic field, it introduces new methods to study energy usage, while people in the industry gain tools for faster and more precise energy assessments. Overall, the work aims to achieve cost savings, deliver environmental advantages, and inspire technological advancements, helping us move toward smarter, greener building practices. The research methodology is summarized in Figure 1 below.

The unique contributions of the paper are summarized as follows:

This comprehensive review distinguishes itself in several key ways within the field of building energy performance, particularly for non-domestic buildings, a sector that has received comparatively limited attention in existing literature. The framework contributions of this work are as follows:

1.: Prioritizing Energy Performance of Non-Domestic Buildings: Unlike most studies, which discuss energy modeling of houses or buildings in general, indirectly and projectively, the review at hand aims specifically at non-domestic buildings such as commercial, industrial, and institutional buildings. In so doing, it addresses a critical knowledge gap regarding the specific parameters impacting energy consumption in these gigantic, complex buildings.
2.: Comparative Study of Data-Driven and Physics-Based Models

The review comparatively evaluates formally the performance, usability, and limitations of physics-based and data-driven (ML, statistical, and hybrid models) modeling methods.

3.: Emphasis on Data Limitations and Strategies for Practical Model Improvement

The review has an emphasis on data limitations and offers practical model improvement strategies.

Part of the novelty lies in targeting such approaches as data fusion and transfer learning as solutions to common problems of the unavailability of data and heterogeneity of data. Such types of approaches facilitate generalizability as well as robustness of models, and hence the models become more applicable in real-world scenarios where a lack of data is one of the stringent constraints.

4.: Integration of New Data Sources for Improved Prediction: The review points to the integration of various data sources, operational data, weather data, and building attributes to improve the accuracy of prediction. It details how big data, IoT sensors, and smart meters can be leveraged to improve energy management systems’ accuracy and scale.
5.: Research Horizons and Literature Gaps

Apart from its reporting on work in progress, the review counts gaps, e.g., hybrid physics-informed machine learning model formulation, standard datasets, and policy-based frameworks. These are results for future research to tackle more efficient and sustainable energy in non-domestic buildings.

6.: Regional and Temporal Analysis

The review includes an analysis of the geographic distribution (Americas, Europe, Asia, Africa) and temporal trends of research in this field, providing insights into global research focus and gaps.

The rest of the article is organized as follows: Section 1 introduces energy efficiency in non-domestic buildings and models building energy performance, presenting the motivation and background of the research. It discusses the research problem and defines the solutions proposed for tackling those challenges in non-domestic buildings’ energy efficiency planning. In Section 2, the discussion of building energy benchmarking is expanded upon. It provides a thorough overview of created models and further supports the requirement for quick assessments of energy consumption or efficiency. Section 3 presents an overview of methods for data presentation. Details of the rationale behind the data collection, data preprocessing, and data transfer are described in detail. Section 4 consolidates the lessons learned and challenges. Section 5 discusses current trends and open research areas. Section 6 presents the conclusion.

2. Survey of Papers Related to Data-Driven and Physics-Based Models for Energy Performance in Non-Domestic Buildings

This section offers a literature review of the state-of-the-art building energy efficiency benchmarking, followed by an examination of data-driven techniques used for the prediction of energy consumption and performance of different building types, to provide a thorough understanding. The methods for analyzing, categorizing, comparing, grading, and assessing energy performance in non-domestic buildings are thoroughly reviewed in this section.

Engineering calculations, simulation, statistical approaches, machine learning, and other methods are the five areas into which methodologies are divided. The main uses of building evaluation techniques are discussed, along with their drawbacks and limits.

In recent years, several review papers have been written on the topic of building energy performance prediction using a data-driven method, since this technique has gained increasing attention. Table 1 synthesizes and compares the heterogeneous tasks for modeling energy in non-domestic buildings.

There are reviews of popular models, such as Linear Regression (LR), Auto Regression-Moving Average (ARMA), and Auto Regression Integrated Moving Average (ARIMA), Regression Tree (RT), Support Vector Machine (SVM), Artificial Neural Network (ANN), and Ensemble models with boosting and bagging, among others. Therefore, Table 1 presents a systematic framework for building energy modeling tasks.

2.1. Barriers and Enabling Mechanisms for Improving Energy Performance in Non-Domestic Buildings

A major obstacle to meeting the 2030 global energy efficiency targets is the low energy performance of the built environment [34,35]. One of the best approaches to addressing low building energy efficiency is to undertake energy renovation solutions on a large and reasonably priced scale [36]. Nevertheless, there are still several obstacles that limit expenditures in this area and prevent the adoption of practices and technology that increase energy efficiency. According to Tuominen et al. [37], the primary obstacles in the case of privately held residential buildings are typically mentioned as having less priority for energy performance improvements, a minimal impact of renovations on property prices, and a lack of reliable information. On the other hand, Mandel et al. [38] found that the primary barriers to the decision-making process for commercial office building conversion were information asymmetry between project partners, ambiguity surrounding predicted savings, and a lack of experience with energy technologies. In the latter instance, the author also emphasizes how case-study-oriented methodologies have frequently made these problems worse by lacking in-depth data and thorough pre-/post evaluations of load profiles after the deployment of EEMs [39]. However, scaling up such approaches proves to be a significant technological difficulty because retrofitting actions are dependent on a wide range of factors, which limit any evaluation method.

It is estimated that nearly 50% of CO₂ emissions, which have been recognized as the main contributor to change, are related to fuel-burning for construction and energy use of non-domestic buildings [40]. The production of more environmentally friendly products, a considerable change in human behavior regarding energy use, and the identification and mitigation of the sources of these unwanted gases are all necessary in the effort to reduce the amount of greenhouse gases [10]. Therefore, improving methods for creating more energy-efficient buildings and increasing the energy performance of existing structures appear like great steps toward reducing the threat of global warming.

By using a few basic strategies during the design and construction phases, China demonstrated a 30% reduction in energy consumption in the building construction industry [41]. The Energy Performance of Buildings Directive (EPBD) preamble was created as a result of the prioritization of improving building energy efficiency in Europe. In response to the EPBD, EU member states were asked to set energy efficiency standards and launch an energy performance obligation program for both new and existing buildings [42]. Data envelopment analysis studies are scarce in Africa, despite the continent’s renowned energy efficiency research. Furthermore, there is a dearth of research on enhancing energy efficiency in Africa [43,44,45,46]. The African Union (AU) is actively promoting energy efficiency (EE) in buildings through the African Energy Efficiency Strategy (AfEES), aiming to enhance energy production across the continent by 50% by 2050 and 70% by 2053. This strategy includes promoting energy-efficient practices in various sectors, including buildings, and aims to close the energy access gap and drive a clean energy transition [47]. Several countries in the Sub-Saharan region have adopted nationwide policies to align with the continental EE strategy. For instance, the Zambian Ministry of Energy (MoU) has collaborated with the European Union (EU) to develop a plan to meet its target of reducing energy use by 2% annually between 2018 and 2030 (or roughly 223 GWh from a base of 2015–2016). Even though several EE and demand-side management (DSM) experiments have been started, more is needed to fully implement EE and DSM in Zambia [48].

Investors are compelled to improve the energy efficiency of their properties due to the global push to minimize greenhouse gas emissions through the implementation of relevant regulations. This enforcement indicates that improving the BEMS and upgrading buildings are the most crucial ways to achieve the necessary decrease in carbon emissions [49]. Although designers consider sustainability in new buildings to satisfy regulations, 99% of constructions exist, and approximately 70% of these buildings will be utilized till 2050 [50].

2.2. Building Energy Performance Assessment

Data-driven approaches are gaining popularity because they can use such vast amounts of data to assess applied energy retrofitting solutions and forecast the potential energy savings of new EEMs [32]. In addition, conventional data-free deterministic approaches encounter a significant problem with scalability, as the outcomes are typically restricted to the particular building being studied. This implies that developing large-scale retrofitting schemes employing these techniques may provide significant challenges [51]. It is also worth noting that data-driven methods are already widely applied in the field of building energy efficiency, and many fascinating applications are beginning to take shape. Figure 2 illustrates several of these application data-driven methods in the building sector. These include optimizing demand response control, improving HVAC system efficiency, operating different types of buildings energy-efficiently, and more [52].

Top-down and bottom-up methodologies were used by Spudys et al. [51] to classify energy performance assessment concerning engineering procedures. In a top-down approach, a system is first designed without considering the data from subsystems, and it computes the integrated energy or emission rates while taking various common construction materials into account. Either straightforward or complex statistical techniques can be used to achieve this strategy. On the other hand, the bottom-up approach uses energy modeling to gather information on building systems at the building level. Then, a more accurate summary is produced by comparing this data with the actual building efficiency [52].

In their study, Borgstein et al. [53] examined model-based and empirical benchmarking, outlined leading tactics, and provided a detailed explanation of how benchmarks are used to build rating and categorization systems. Four primary kinds of building energy assessment were identified in this study: engineering calculations, simulation model-based benchmarking, statistical modeling, and machine learning.

2.2.1. Physics-Based Engineering Calculations

The engineering methodologies employ physical laws for the derivation of building energy consumption at the whole or sub-system levels. The most precise methods apply complex mathematics or building dynamics for the derivation of accurate energy usage for all components, considering internal and external details as the inputs (e.g., climate information, construction fabric, HVAC system). Since input data gathering for engineering calculations is challenging, this method requires a great deal of time and effort [54]. Figure 3 illustrates an example of end-user energy analysis performed using engineering calculations to determine total energy consumption.

Several streamlined models for calculating building energy efficiency have been developed [33,55,56,57] to speed up the computations. Although the goal of these models is to optimize quickly during the design phase, they can also be useful for predicting the impact of preservation measures and evaluating energy efficiency (e.g., for energy audits) [58]. These models have several advantages over simulation modeling, including faster computation times, clear relationships with physical factors, and easier application [59].

This computation usually requires the creation of mathematical equations, and the techniques covered in this class generally use steady-state models that take an average of variables over some time (like a year) in which other building characteristics remain constant. Monthly heat gain and loss calculations are performed by quasi-steady state (QSS), which also elucidates the influence of weather-related transitory parameters [60]. Building energy performance is estimated using QSS, which links energy use to those input features. Programs that use these oversimplified models typically don’t take into account all of a building’s intricate linkages, which means they don’t represent the building’s energy behavior. It is common to refer to these instruments as calculating instruments [61].

The International Organisation for Standardisation (ISO) explains the procedure for computing method as a foundation for heating and cooling loads calculations before the simplified calculation of whole-building energy estimation [62]. In the simplified calculation, a building’s total energy usage can be estimated as the aggregate of the fair use of all systems [63,64]. This accumulation model has been referred to as the most precise approach for estimating energy efficiency, which has the potential for the implementation of system-level benchmarks [64]. For benchmarking of existing buildings, these computations are compared to baseline buildings and provide advantageous detail and energy assessment [65]. Figure 3 presents the axiom of end-use energy calculations. The aggregated computing methods have certain special limitations when it comes to whole-building calculations, although they are undoubtedly effective tools for energy evaluation and energy-saving estimation. First, each HVAC heating and cooling load needs to be calculated separately. Secondly, these computations are most useful when combined with system-level benchmarking [66].

Limitations and applications

The computation methods discussed in this section have lower degrees of accuracy and might not be accurate in predicting building performance because they do not reproduce the dynamic processes of full simulation methodologies [67]. It is frequently necessary to identify “typical” parameters or performance levels to apply these approaches for performance evaluation. This calls for some kind of benchmark publishing or supplementary data collection effort [68].

They have served as the foundation for numerous national computation systems, though, and are quicker and simpler to use. They may also be employed in energy auditing processes, where quick computations and approximations of performance are frequently adequate for assessing performance and determining areas for improvement [69]. When carried out correctly, an energy audit will offer a thorough assessment of energy performance by determining end-use energy consumption and contrasting a building’s actual performance with its potential performance after energy efficiency measures have been implemented [31].

2.2.2. Simulation Method for Energy Performance in Non-Domestic Buildings

Software and computer models are used in building energy efficiency simulations to simulate performance with a predetermined status [51]. Generally speaking, there are several uses for computer simulation, including HVAC system design and lighting. A thorough approach uses an exact input detail and a first-principle model to calculate energy consumption [70].

Recently, simulation software has offered a real-world tool for low-energy building design. Various studies have reported on the use of simulation tools for HVAC and other building parts optimization [27,70,71]. For acquiescence assessment, building simulation technologies are typically utilized for new construction. The application of simulation models for the energy assessment of new and existing buildings is shown in Figure 4 and Figure 5, respectively [72].

Numerous modeling tools, including EnergyPlus [73], DOE-2 [74], and ESPr [75], have been created during the past three decades for the assessment of energy performance. Three primary instruments that are frequently used in building energy optimization are EnergyPlus, DOE-2, and TRNSYS [22]. The US Department of Energy created the first two, one for an hourly energy usage forecast and the other for evaluating building energy performance.

Limitations and application

As long as there is enough information available about the building’s attributes and usage profiles, building energy simulation is a very useful tool for modeling particular buildings, whether they are already constructed or are still in the design stage [76]. Even if the field is developing quickly, more knowledge of the critical elements that lower accuracy and confidence is required to use simulation efficiently and quickly at scale, and while tools are becoming more accessible, the expertise of modelers remains crucial for problem definition, input selection, and result interpretation [60].

2.2.3. Statistical Models for Energy Performance in Non-Domestic Buildings

Statistical models serve as foundational tools for energy benchmarking and as surrogate models for more complex simulations, thus bridging the gap between simple and complex approaches. One of the conventional statistical methods for examining the relationship between one or more independent variables (predictors or input features) and a dependent variable (output or response) is linear regression. Equation (1) displays it in its generic form.

\hat{y} = m_{0} + m x

(1)

where

\hat{y}

is the predicted output,

m_{0}

is the bias term, mx is a weight matrix for features x.

Keep in mind that only the linear link between characteristics and output may be found using the generic form. The input variables could be transformed into multiple forms using various active functions, such as a polynomial (Equation (2)) or natural logarithm function (Equation (3)), to increase the application of linear regression.

\hat{y} = m_{0} + m x^{n}

(2)

where n means the n-th polynomial.

\hat{y} = m_{0} + m l o g (x)

(3)

The primary benefit of using linear regression is its ease of use and intuitiveness [77]. The weight matrix could be used to directly determine how each variable contributed to the prediction outcome. Additionally, nonlinear problems could be solved using extended linear regression. But it’s equally important to recognize its limitations: (1) Nonlinear relationships between inputs and outputs could not be considered by the general form of linear regression; (2) the prediction performance of extended linear regression is heavily reliant on the appropriate choice of active function, which may be a major challenge; and (3) multicollinearity of input features would negatively impact the linear regression prediction result. Consequently, it is advised to use feature extraction techniques before creating linear regression models.

ARMA and ARIMA are the two most often utilized techniques for time series analysis [10]. The two primary components of ARMA are a moving average model (MA) of order N and an autoregressive model (AR) of order M.

\hat{y_{t}} = c + ε_{t} + \sum_{i = 1}^{N} \emptyset_{i} Y_{t - 1} + \sum_{i = 1}^{M} φ_{i} ε_{t - 1}

(4)

The weights for AR and MA are represented as

\emptyset_{1},

…,

\emptyset_{N}

and

φ_{1}

,…,

φ_{M}

respectively, while ε_t represents white noise and c is a constant.

Only stationary time series were compatible with ARMA. Because ARIMA incorporates an early differencing step to eliminate the non-stationary, it would be a superior option when predicting nonstationary time series [78].

If the output is heavily influenced by past values, then ARMA and ARIMA’s prediction performance would be acceptable because they demonstrate the ability to take historical data into account. It would be difficult to ascertain the ordering for AR and MA models, as well as the initial difference times [79].

Top-down methods of evaluating energy performance are now feasible due to the availability of building energy data. A popular statistical method for modeling building performance and energy consumption using historical building data is regression analysis [80]. These models are frequently referred to as data-driven surrogate models since they rely on readily available data rather than complex system features [80].

Statistical models are utilized in benchmarking by introducing an anticipated value of energy usage for each building. In general, energy consumption is normalized and expressed as energy use intensity (EUI). This method uses different building characteristics as input variables and EUI as target values for developing a linear or non-linear model to predict for EUI of other buildings [81].

Simple Multivariate Regression Models (MRM) are a frequently used classical statistical method in the building industry. The ASHRAE [82] contains the general guidelines for using these models. The Change Point Regression Model (CPRM), another well-liked technique, mimics the non-linear behavior of input characteristics. When predicting the energy loads of buildings with temperature or other climate-dependent variables under control, CPRM is the best option [83].

Considering

{E E}_{b}

to represent the baseline energy efficiency and V to denote the vector of input features (e.g., age of the building, energy system, roof type, floor area) throughout the monitoring stage, then

{E E}_{b}

can be calculated using Equation (5):

{E E}_{b} = {E E}_{0} + \sum_{i = 1}^{n} c_{i} V_{i}

(5)

Here

{E E}_{0}

is a constant value, and c is a vector of coefficients that are calculated by training n input features. Then the problem of Ordinary Least Squares (OLS) can be expressed as [84], see Equation (6):

Minimise {E E}_{0}, a, : \{\sum_{i = 1}^{0} b ε_{i}^{2} | {E E}_{b}\}

(6)

where b is the number of observations and ε is the stochastic error for the ith observation.

Instead of focusing just on a straightforward error measurement, Stochastic Frontier Analysis (SFA), a developed OLS regression, offers a way of calculating inefficiencies [85]. Based on predetermined attributes, the SFA model constructs an efficiency frontier, and the distance from this frontier is used to quantify inefficiency [86].

Data Envelopment Analysis (DEA) is another mathematical technique that has lately drawn interest in building energy modeling. DEA is a nonparametric technique that introduces the Decision-Making Unit (DMU) and efficiency expectation, enabling the performance of a multi-factor productivity study [87]. DEA, in contrast to linear regression, does not offer any information regarding the relationship between the physical properties of buildings; hence, it is challenging to interpret the model [77].

There is tremendous interest in using Artificial Intelligence AI techniques like ML in the construction industry because they can increase the enormous amount of reliable and accessible building datasets. In this context, ANN, SVM, GPR, and ensemble models, such as RF, are the most commonly used ML techniques.

Limitations and application

Although OLS regression is still widely used, Shobha et al. [77] point out that it can “lead to biased coefficients that are inflated in magnitude, have the wrong signs, or radically change depending on the model and variables that are selected”. Regularization, also known as penalized regression, is the suggested remedy.

The main drawback of statistical models appears to be their lack of connection to physical phenomena, which makes it challenging to interpret the findings and identify errors. Favero et al. [88] conclude that more effort is required to create a simplified model that has tangible significance.

2.2.4. Machine Learning for Energy Performance in Non-Domestic Buildings

ANN, SVM, and GPR models are the three primary supervised learning approaches that have been extensively utilized in the construction industry. For unsupervised learning, K-means and hierarchical clustering techniques have also been applied. Ensemble models have been less used compared to ANN, SVM, and GPR These approaches are covered in more detail in the sections that follow, and after that is an overview of additional ML methods.

(a): Classical Machine Learning Approaches

Conventional ML techniques, such as SVM, Decision Trees (DT), RF, Extreme Gradient Boosting (XGBoost), and ANN, are appropriate for managing nonlinear relationships and frequently offer better prediction accuracy than statistical learning-based techniques [89].

SVMs use kernel functions to tackle nonlinear regression problems. By converting data into a higher-dimensional space, these functions make it possible to quickly create a linear hyperplane for regression analysis. Even when complex patterns in the data are not linearly separable in the original feature space, SVMs can learn them thanks to this high-dimensional translation. It is more suited for short-term load forecasting (STLF) due to its increased speed and prediction accuracy [90]. SVM was used by Amasyali et al. [91] to predict the lighting load of an office building in Philadelphia, Pennsylvania, based on day types and daily average sky coverage. Using smart meter data from homes throughout Ireland, Vrablecova et al. [92] forecasted loads using SVR, showed that SVR was applicable for STLF, and came to the conclusion that SVR was appropriate for forecasting the aggregated loads of individual buildings and building groups. SVR was used by Chen et al. [23] to predict the hourly electricity demand of shopping malls and hotels, with errors of 4.0% and 6.0%, respectively. This enables cost savings and optimal energy utilization.

SVM was initially used in the building sector in 2005 to estimate the monthly electricity use for non-domestic buildings in Singapore, a tropical nation [93]. This study addresses four different buildings and takes into account three input parameters: temperature, humidity, and solar radiation. The created model was trained and tested using data gathered over three years. The SVM model has a low error rate of 4% and great accuracy in predicting electrical loads, according to results obtained by employing the RBF kernel. In terms of accuracy and the choice of tiny model parameters, the conclusion indicated that SVM was superior to previously developed ANN models. Wang et al. [24] carried out a follow-up study by using SVM to forecast the monthly and short-term (i.e., daily) electricity consumption of a household building situated in China. Over a year, they gathered data on power usage by using the temperature and humidity of the water, the living room, the bedroom, and the outdoors as input factors. In their short-term prediction of the electricity demand of non-domestic buildings, Massana et al. [94] compared SVM, ANN, and Multiple Linear Regression (MLR) and found that SVM offered better accuracy at a lower computational cost.

Three bottom-up techniques were applied by Shabunko et al. [95]: engineering modeling (EM), SVM, and OLS. These buildings’ energy inefficiency was brought to light by EM, whose models indicated that the average measured use of 20.5 MWh/year could be cut in half. SVM could identify homes and structures that are deemed inefficient if their annual consumption exceeds 16.0 MWh. With an R² > 0.8, the non-linear least squares out performs the linear least squares fit, and OLS is easy to implement. Two EUI benchmark measurements, with average values of 2035 kWh/person/year and 56 kWh/m²/year, can be derived thanks to this work. To precisely forecast the energy usage of building heating, ventilation, and air conditioning, the study by Wan et al. [96] proposes an energy consumption prediction model based on an enhanced evolutionary algorithm called the least squares support vector machine. To avoid overfitting and underfitting, this model optimizes the kernel and regularization parameters using the enhanced genetic method. The testing findings indicate that the improved genetic algorithm, the least squares support vector machine, may achieve convergence more quickly than existing algorithms, requiring only 0.2 milliseconds to complete. Furthermore, the enhanced genetic algorithm-least squares support vector machine’s average relative error was less than 0.6%.

Multilayer perceptron (MLP) and support vector regression (SVR) for the heating and cooling load forecasting of residential buildings are employed in [97]. MLP and SVR are the applications of ANN and ML, respectively. These methods are commonly used for modeling and regression and produce a linear mapping between input and output variables. Proposed methods are trained using training data of the characteristics of each sample in the dataset. To apply the proposed methods, a simulated dataset was used, in which the technical parameters of the building are used as input variables and heating and cooling loads are selected as output variables for each network.

Wei et al. [25] and ref. [53] used a parallel SVR implementation to forecast the energy consumption of office buildings. Their goal was to optimize a model case’s building attributes. To determine the energy requirements, they used the EnergyPlus software. The findings indicate a marginal improvement in accuracy.

The authors used correlation coefficients and gradient-guided feature selection later in 2012 to reduce the number of features for SVM models based on polynomials and RBF [98].

An SVM model was created in 2014 by Jain et al. [99] using sensor-based data from a multi-family domestic building in New York City. The purpose of their study was to determine how varied time intervals and data collection locations affected the ability to forecast energy usage. The authors noted that using hourly intervals collected at the floor level produced the best results for the developed model’s efficiency. In a comparison of SVM, LS-SVM, and ANN for hourly energy consumption forecasts of small residential structures, Edwards et al. [100] found that ANN was the least accurate model.

A regression tree (RT) is a sort of decision tree with continuous target variables, as shown in Figure 6. An RT begins with a root node from which the incoming data is divided into several leaf or internal nodes. Leaf nodes show the RT model’s output, whereas internal nodes continuously divide the inputs into subgroups. This suggests that the RT may be able to produce predictions without using the whole feature space.

DT form decision trees based on the feature partitioning of data and generate interpretable structures. Yao et al. [101] use zero-energy house examples in extremely cold parts of China as samples. This work offers an empirical approach to developing and implementing decision tree-based models. A decision tree model is developed by using EnergyPlus for performance simulations and analyzing the relationships between several passive design parameters. The suggested combinations of passive design characteristics that satisfy the low energy consumption requirements can be found with the help of this model. By building many trees using a bagging technique, RF reduces overfitting problems by producing averaged forecasts. RF models outperform individual trees in terms of predicting accuracy by randomly generating each DT with different features and datasets and enabling parallel training. Using five hourly building energy consumption datasets over a year, Zhou et al. [102] deployed RF that forecasts the load of several buildings on an hourly basis, verifying the model’s efficacy. An RF-based approach was used for integrated prediction by Wang et al. [103]. To expedite the computing process, RF also prioritizes features, making it easier to choose the most important ones and ignoring the less important ones. To forecast the day-ahead building load, Lahouar and Slama [104] used the RF model in conjunction with expert feature selection. This analysis identified day-ahead load, day type, and temperature as the most important influencing factors.

The ensemble learning technique XGBoost uses gradient boosting algorithms to optimize weak learners into strong ones. Its integrated regularization avoids overfitting and manages model complexity, resulting in better predictions. With XGBoost, decision trees are trained sequentially as opposed to in parallel, like with RF. This methodical approach enhances XGBoost’s accuracy. Wang et al. [105] used XGBoost with five important characteristics (day type, time, holidays, and weather) to estimate long-term building loads. Their results imply that XGBoost is better than SVM, RF, and LSTM. To achieve precise short-term predictions, Lu et al. [106] employed XGBoost in conjunction with the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) approach for data decomposition. To create residential building air conditioning cooling energy consumption prediction models, Lu et al. [107] gathered extensive data from 1325 air conditioners in Chongqing. These models included two single models and four ensemble models. The XGBoost model provided the most accurate predictions, according to their study’s findings.

As a branch of AI, ML includes a range of approaches. The ANN is one of the most well-known methods among them. An information processing system (ANN) is modeled after the networked neurons seen in biological systems. In a publication by McCulloch and Pitts [108], the authors proposed a theory regarding how neurons work and used electrical circuits to model their theories by building simple neural networks. Figure 7 illustrates a schematic of a typical ANN.

Figure 7. Schematic of a typical ANN [108].

\hat{y} = \emptyset (w_{o u t} h + b_{o u t}) = \emptyset [w_{o u t} σ (w x + b) + b_{o u t}]

(7)

where ∅ is the activation function of the output layer; h is the output of the hidden layer, h = σ(wx +

b_{o u t}

); σ is the activation function for the hidden layer.

w_{o u t}

is the weight matrix;

b_{o u t}

and b are the bias terms.

A straightforward single-layer perceptron was first presented by Rosenblatt [109] in 1958 to classify a continuous collection of inputs into one of two possible categories. Since then, there have been notable developments and breakthroughs in ANNs, which have led to an increase in their popularity. Notably, ANNs are widely used in tasks like pattern identification and forecasting/prediction. ANNs learn to execute tasks without the requirement for explicit, task-specific programming. Rather, they minimize errors by modifying their internal parameters in response to data.

The use of ANNs is essential for tackling major building energy management issues. The dynamic and nonlinear character of building energy consumption patterns is one of the fundamental problems in this field. Conventional modeling techniques frequently fail to represent the complex interrelationships between different contributing elements. ANNs offer an efficient solution because of their capacity to represent intricate and nonlinear relationships. ANNs can learn and adapt to the various elements influencing energy usage by being trained on previous energy consumption data. This enables ANNs to accurately forecast future energy consumption [110].

An initial study on the application of ANN for energy demand prediction was carried out in 1995. The study concentrated on using the feedforward neural network (FFN) model to forecast a building’s electricity usage in a tropical environment. Temperature and occupancy data served as the foundation for the forecasts.

The short-term electricity demand for the CIESOL bioclimatic building in southeast Spain was estimated by Mena et al. [111] using the ANN approach. The conducted studies demonstrate quick prediction with respectable end outcomes for actual data, with an average error of 11.48% and a short-term prediction horizon of 60 min. Recurrent neural networks (RNNs) and FFN were both used by Mihalakakou et al. [112] to forecast the hourly electricity usage of an Athens residential building. Their models used time series data spanning six years to account for climatic variables, including air temperature and solar radiation. The relative error values, according to the results, range from 8% to 15%.

A feedback neural network was used by Gonzales and Zamarreno [113] to estimate short-term electrical energy usage. Their study concentrated on examining how the size of the data utilized for the ANN and the number of neurons in the hidden layers affected the accuracy of the model. It is calculated that the mean absolute percentage error (MAPE) is 1.945. Using a partial optimization approach, Li et al. [114] presented an optimized ANN intended for hourly electricity usage prediction. To remove pointless input variables from the two datasets, ASHRAE Shootout I and Hanzou Library Building, they used PCA.

PCA was used by Platon et al. [115] to examine the NAS’s input factors to forecast the hourly power usage of a Canadian institutional building. Artificial neural regression (ANR) outperforms case-based reasoning (CBR) in terms of accuracy, according to the comparison’s findings. In actuality, the ANN model’s inaccuracy is about 7%, compared to the CBR’s approximate 13%. Nevertheless, CBR differentiates itself as a viable substitute method for complex systems that depend on a multitude of variables because it provides more transparency than ANN and can learn efficiently from small datasets.

Hong et al. [116] assessed the energy efficiency of primary and secondary schools in the UK, and estimated electricity and heating usage, using both statistical analysis and ANN. The results showed that ANN achieved energy balancing with greater accuracy when compared to benchmarks. The study concluded that to produce more accurate evaluations in this field, statistical benchmarks might be improved by adding new variables, like the number of students and school density. However, it was shown that when compared to engineering calculations and simulation, the prediction accuracy of non-artificial intelligence systems (NAS) is not as good.

ANN was employed by Wong et al. [117] to evaluate a commercial building in Hong Kong’s dynamic energy performance. The building’s daily energy consumption is calculated using EnergyPlus software and interior reflection calculation algorithms. The principal metric employed to examine the neural network’s prediction accuracy for heating, cooling, lighting, and overall electricity usage is the Nash-Sutcliffe Efficiency Coefficient (NSEC). The results of the NSEC are, in order, 0.994, 0.940, 0.993, and 0.996. With a coefficient of variation of the root mean square error ranging from 3% to 5.6%, the error analysis revealed that lighting consumption had the lowest errors, ranging from 0.2% to 3.6%.

The metrics for evaluating the energy performance of buildings can also be found using artificial neural networks. The total heat capacity, gain factor, and total heat loss coefficient are important components in estimating energy efficiency. Lundin et al. [118] provided a method for forecasting these values. RMSE values ranged from 2.5% to 9.4%, indicating good performance in the method’s validation using a test cell.

Using an ANN model and the Italian CENED database, Khayatian et al. [119] projected energy performance certificates for residential structures. The model inputs consist of a combined set of computed and direct characteristics, while the network outputs are indicators of heating demand that are obtained through the use of the CENED 1.2 software. The study’s findings demonstrate that just 12 factors from an energy certificate are needed to estimate the heating demand indicator. Frequency distribution and confidence interval are obtained by training 100 neural network models with stochastic initialization to guarantee maximum accuracy. Approximately 95% of the data fall within ±3 confidence ranges, according to the final results.

Ascione et al. [120] introduced an ANN intended to evaluate occupant thermal comfort as well as energy consumption, with an emphasis on forecasting the energy efficiency of structures built in southern Italy between 1920 and 1970. The EnergyPlus software was used to assess the energy efficiency of these buildings, and a sensitivity/uncertainty analysis based on simulation was used to suggest improvements to the network parameters. The study took into account newly constructed buildings and renovated stock separately, using the ANN to maximize upgrade parameters for the latter. Three different single-output ANNs were created for the former. The principal energy usage for space heating and cooling, as well as the ratio of annual discomfort hours, were the two objectives of these ANNs. These networks’ input parameters included general building characteristics like geometry, envelope, HVAC, and air conditioning. With a relative error range from 2% to 11% and a correlation coefficient ranging from 0.96 to 0.995, the models demonstrated strong performance.

An FFN model was developed by Karatasou et al. [26] to forecast the hourly energy loads of Athens’ residential structures. The study explores how various characteristics affect the trained model’s accuracy and finds that some factors, like wind speed and humidity, can be left out of the training parameters because they are not as important. Furthermore, the study illustrates how statistical analysis can improve the ANN model and obtain a prediction of energy usage 24 h in advance. During the model’s pre-processing and development phases, these statistical techniques include cross-validation, information criteria, and hypothesis testing.

Later, Dombayci [121] predicted the hourly energy usage of a basic model house built per Turkish norms using the ANN approach. The hourly energy consumption needed to train neural networks is found using the degree-hour approach. The models don’t account for many features; thus, they are appropriate for managing the energy of a single, basic residential building.

The ANN model with 29 neurons yields the best prediction. Based on the findings, the corresponding values of RMSE, R², and MAPE for training are 1.2575, 0.9907, and 0.2091. The ANN approach and multilinear regression were contrasted by Kialashaki and Reilsel [122] to determine the energy consumption of residential buildings in the US. The population, GDP, house size, median household income, residential electricity costs, natural gas and oil prices, and building characteristics were represented by seven independent variables that were chosen from various data sources (1984–2010). While displaying distinct trends, the accuracy of the two forecasting models’ performance over the test period is comparable. The reason for this disparity may be that the regression models only forecast general trends in individual parameters, whereas the NAS models are more susceptible to current economic volatility. In a study by Antanasijevic [123] and associates, on the other hand, these values for the test phase are 1.2125, 0.9880, and 0.2081, respectively.

Antanasijević et al. [123] used building data from 26 European countries between 2004 and 2012 to compare ANR with linear and polynomial regression models to forecast energy usage. The findings revealed a 4.5% increase in the accuracy (mean absolute percentage error) of the ANR. Neto and Fiorelli [124] examined how an ANN model and the EnergyPlus simulation program predicted the energy needs of a Brazilian building. It was discovered that, while evaluating energy consumption in the example analyzed, outdoor temperature was more significant than humidity and solar radiation. The authors demonstrated that the ANN outperforms the detailed simulation model in terms of accuracy, especially for short-term prediction (with a 10% relative error). They conclude that the primary cause of uncertainty in engineering models is inadequate evaluation of lighting and occupancy.

Popescu et al. [125] created unique ANN-based simulations and models to forecast the heating energy consumption of buildings in Romania that are connected to district heating systems on an hourly basis. The input data is the mass flow and climate factors for the last 24 h. It is possible to implement management strategies for a district that offers considerable and profitable prospects for energy savings by applying the proposed technique, as demonstrated by the comparison of the results achieved with the proposed models and conventional methods. Deb et al. [126] also projected the daily cooling needs of three institutional buildings in Singapore by feeding the ANN model with data from the preceding five days. Based on data from the previous five days, the results demonstrate that the ANN model can accurately estimate the energy use of the following day. This includes a thorough discussion of the architecture and development of the model. Furthermore, the anticipated production is used as an input to forecast the output for the next day, with an R² accuracy of more than 0.94 for projecting energy consumption for the ensuing 20 days. It is also mentioned that other institutional buildings can successfully implement this concept.

The authors of [127] estimated the daily heating usage of six Swedish building families that were built in the 1970s and underwent renovations in the early 1990s. Both before and after the remodeling procedure, measurements were made. With a significant correlation (R² between 0.90 and 0.95), the ANN displayed efficient and precise long-term energy requirement forecasts based on short-term observed data. Additionally, the year of construction, the number of floors, the frame, the floor space, the number of occupants, and the ventilation system were reduced to four important parameters using PCA.

ANR modeling was utilized by Ekici and Aksoy [128] to forecast the heating demands of three distinct buildings while accounting for meteorological data. The one-dimensional transient heat conduction problem is solved numerically using a finite-difference method to determine the heating energy demand of the buildings under study. When the study’s ANN model’s output was compared to the numerical findings, an average accuracy of between 94.8% and 98.5% was found. By concentrating on the short-term operating heating power level characteristics and the building occupancy profile, Paudel et al. [129] employed a pseudo-dynamic ANN to forecast heating energy usage. A case study of a French institution building is used to apply the pseudo-dynamic model, and the outcomes are contrasted with those of static neural network models. For the static and pseudo-dynamic neural network models, the findings indicate correlation coefficients of 0.82 and 0.89 (with an energy consumption error of 0.02%) during the learning phase and 0.61 and 0.85 during the prediction phase.

Ben-Nakhi [130] optimized HVAC and thermal energy storage by using hourly energy usage data to estimate the energy profile of public buildings, and the following day, using a general RNN. To train and evaluate the ANN model, data from a public office building in Kuwait, constructed between 1997 and 2001, is used. Using ESP-r modeling software version 10 series, the value of the building’s energy consumption is determined while accounting for various occupancy loads, orientation factors, and meteorological data. The findings demonstrated that, in contrast to modeling software, which needs intricate meteorological parameters, the ANR just needs the outside temperature to anticipate cooling demands with accuracy.

By combining an approximate set theory and an ANN model, Hou et al. [131] investigated the prediction of hourly cooling loads in an air-conditioned building in China. Rough set theory is used to analyze and optimize the cooling load’s pertinent parameters, which determine the ANN’s input characteristics. When compared to the ARIMA model, the suggested model with various input set combinations exhibits superior accuracy. Yan and Yao [132] conducted a survey to determine how climatic information affected energy use throughout China’s different climate zones. To help with the design of new buildings, backpropagation ANR is used to anticipate the heating and cooling demand. The selected model’s performance is demonstrated by the results, which range in CV-RMSE values from 1.71% to 2.86%.

Afterward, Biswas et al. [133] used the MATLAB toolkit to apply a comparable strategy to the residential sector and demonstration houses in the USA. Appliances, lighting, and space cooling (ALC) in residential buildings in Canada were modeled by Aydinalp et al. [134]. When compared to engineering calculation methods, the accuracy of the ANR employed for energy consumption prediction was higher. Subsequently, they projected domestic hot water and space heating for the same buildings using ANR [135]. Azadeh and Sohrabkhani [136] demonstrated the value of the ANN model in forecasting electricity usage in the manufacturing sector. A multi-layer perception model is used to forecast the long-term yearly consumption of sectors in Iran. When compared to conventional regression models that use ANOVA, the results are on par with or even better than those. Later, Kialashaki [137] estimated the energy requirements of the US industrial sector by factoring in population, national product, and gross domestic product. The energy demand is expected to rise by 16% by 2030, according to the ANR model. This finding points to the necessity of creating new, reasonably priced energy sources. The ANN model is regarded as a dependable method for mapping input to output. To verify the effectiveness of the model, the outcomes are compared with forecasts from the US Department of Energy.

Researchers have been using Gaussian process (GP) regression since the early 2000s for a variety of purposes [138,139,140]. Due to its potential for identifying forecast uncertainty, GP has lately been used in the building energy industry. Uncertainties are typically found in the section of appropriate values for specific parameters (e.g., envelope insulation) in building energy modeling. Because of this, the GP is now considered an alternative to traditional and other ML regression models for building energy modeling when evaluating the impact of input uncertainty on predicted results.

Heo [141,142] used the GP model to forecast the overall energy consumption and compute the energy savings of the retrofitted building. The occupancy count, relative humidity, and outdoor temperature were the model’s input variables. To estimate the levels of uncertainty, the output measurement errors were taken into account. Zhang et al. [143] employed GP regression later in 2013 to forecast the energy consumption of a post-retrofit office building’s heating and cooling systems. They demonstrated how the training and testing data ranges had a significant impact on the GP model’s accuracy.

Using meteorological data and smart meter readings, Noh and Rajagopal [144] developed a long-term GP prediction model for a campus building’s overall energy usage. A GP-based model for demand response service that forecasts building energy consumption was presented by Noh et al. [144]. In mimicking a building performance simulation, Rastogi et al. [145] compared the accuracy of GP with linear regression and demonstrated that, in EnergyPlus simulated case studies based in the US, GP’s accuracy is four times higher than linear regression testing.

To train the model under uncertain data, Burkhart et al. [146] combined GP with a Monte Carlo expectation maximization technique. Predicting the daily energy demand of an office building was intended to maximize the system’s performance. The study analyzed relative humidity, ambient temperature, and daily occupancy under two distinct scenarios (moderate and vigorous) to identify particular input variables and questionable data. The findings showed that by using a rough approximation and data range in place of sensor data, the models may be trained even with sparse measurements or little data.

A technique for calibrating and analyzing uncertainty in a building energy simulation model was created by Manfren et al. [29]. Their method for predicting the monthly electricity and gas consumption of heating and cooling systems was a thorough simulation, together with GP with RFB kernel and MLR. The findings demonstrated that GP is more accurate than a piece-wise regression model and offers a tool for building energy model optimization and uncertainty analysis.

To forecast the daily/hourly energy consumption of commercial buildings, Srivastava et al. [147] used the Gaussian mixture model (GMM), which is a DOE reference model for a supermarket and a retail shop building. To create data, this parametrized model enables locally adaptive uncertainty quantification.

To anticipate the hot water energy consumption of an office building’s HVAC system, Zhang et al. [148] examined change-point models, GP, GMM, and FF-ANN models while taking weather data (ambient dry bulb temperature) into consideration. Using a tangent sigmoid transfer function, one hidden layer of the ANN used in this work is activated. The outcomes demonstrated that GMM provides the best performance, whereas ANN provides the lowest. The authors concluded that the ANN was an inappropriate model for the case study since it was not provided with enough data. Change-point regression yields somewhat better results than GMM and GP in terms of accuracy, but because of its simplicity, the latter is advised. It should be mentioned that the most effective option for analyzing uncertainty and capturing intricate construction behavior is to use Gaussian methods.

Clustering is a well-liked ML method for locating distributions, patterns, and implicit correlations in data sets. A set of unlabeled data can have a latent structure that can be found using an unsupervised learning technique called clustering. This method, which is very helpful in building energy benchmarking, is mainly used in building energy to categorize buildings based on a range of characteristics and qualities, rather than only using type or topology.

The following four procedures are involved in clustering for such an application [149]: Data collection, feature identification and selection, suitable feature adaptation (a,b), and (c) clustering algorithm, and (d) benchmarking every building in the categories that have been classified. The most popular clustering algorithm is k-means, which looks for a local maximum repeatedly. The technique starts by randomly choosing k centroids or cluster centers and then assigns each piece of data to the closest center point. Next, utilizing the average of every data point within a group, all centroids are recalculated. This procedure continues until it meets a criterion for stopping (such as reaching a minimal aggregation of distances, for example).

Santamouris et al. [150] presented a fuzzy clustering-based approach for classifying building energy that was intended for use in 320 Greek schools [150]. Three years’ worth of data on operating hours, student population, building features, and overall energy usage (heating and electricity) were gathered. Five building energy rating classes were identified by using a clustering approach. When the clustering-based classification was compared to a similar frequency rating technique, it became clear that the problem of low and imbalanced or very large class constitution could be solved by clustering, which provides more resilient classes. The authors examined the possibility of energy conservation by applying the findings to ten case studies. To evaluate possible energy savings, Gaitani et al. [151] developed a system for grading heating energy usage using 1100 school samples. A PCA approach combined with k-means clustering was used to identify representative buildings for each cluster and create five rating classes. A cluster-based energy audit that took into account the cooling and heating loads of Greek hotel buildings was presented by Pieri et al. [152].

The ability of the clustering algorithm to integrate all the architectural factors that influence energy consumption makes it more accurate and resilient than the US Energy Star scheme for energy performance benchmarking, as shown by Gao & Malkawi [149].

Using the clustering technique, Uddin et al. [28] show how occupancy behavior affects building energy use. Clusters were created based on similarities between building elements unrelated to occupant behavior, and each cluster’s impact on energy demand was examined as a result of user action. A clustering approach was presented by Petcharat et al. [153] to evaluate possible energy savings concerning the lighting system in Thailand’s non-domestic stock. According to the authors, cluster-based analysis performs better than comparing the target building’s power density with reference cases that are specified by the nation’s Energy Act.

Yang et al. [154] used the k-shape algorithm, which was first suggested for clustering time series, to find patterns in energy usage. They then used SVM to improve the prediction accuracy of building energy demand. By doing this, data-driven energy forecasting models are further enhanced, which boosts BMS performance.

(b): Deep Learning Models

Convolutional neural networks (CNN) and recurrent neural networks (RNN) a complex versions of ANN with multiple hidden layers between input and output layers [155]. Typically, DNN is a feedforward network without looping back [156]. Generally, DNN refers to fully connected networks (shown in Figure 8a), meaning that each neuron in one layer receives information from all neurons in previous layers. Goodfellow et al. [157] have argued that the following reasons should be used instead of simple ANN: (1) DNN requires fewer neurons than simple ANN in representing complex tasks; (2) In practice, DNN generally presents higher prediction accuracy than ANN. However, the implementation of DNN models should be done with careful attention to two common issues: overfitting and computational intensity.

Convolutional layers, as seen in Figure 8b, are used by CNN, a specific class of DNN, to group input units and apply the same function to gathered groupings (i.e., parameter sharing). CNN lowers the danger of overfitting by lowering the structural complexity and connectivity scale as compared to a general DNN. Consequently, CNN might also be thought of as a regularized form of a standard DNN.

CNN is well-known in the field of visual imagery analysis, including natural language processing [158], medical image analysis [159], image recognition, and picture classification [158]. Sadaei et al. [160] transformed hourly load data, hourly temperature data, and fuzzified load data into multi-channel pictures, which they then fed into a CNN model to incorporate CNN into load prediction. The CNN model outperformed Long Short-Term Memory (LSTM) models, which are a type of RNN, in terms of prediction performance.

The use of RNNs in energy prediction has garnered increasing research interest in recent years; however, because the weight for the loop is the same for each time step, gradients in the traditional RNNs tend to explode or vanish when the loop runs many times [161]. This problem is known as long dependency, and one commonly used RNN model, called LSTM, could be applied to remember information for a long time [161]. This is because RNNs differ from other deep learning algorithms in that they incorporate loops (shown as the cycle in Figure 8c) in their structure, allowing information to flow in any direction.

Data-driven models have steadily progressed from simple ML methods to more sophisticated deep learning as computing power and data volumes have increased. This move to deep learning makes it possible to directly extract features from data through multiple network layers, allowing for thorough end-to-end learning and demonstrating powerful model expression capabilities. As the volume of training data grows, deep learning models’ prediction accuracy gradually improves [162].

Autoencoders (AE) [163] (See Figure 9 below for the architecture of an AE where the input x is equal to 4, which is ideally recovered as the reconstructed output x’ is the initial sent value (4)), RNN [164], LSTM [102], CNN [165], and Generative Adversarial Networks (GAN) [163] are the most widely used deep learning techniques in the field of building load prediction. RNNs were used by Rahman et al. [164] to forecast the medium- and long-term electricity demand curves of residential and commercial buildings at an hourly resolution. When Zhou et al. [165] used the LSTM model to predict the HVAC system load in a Guangzhou college library, they showed that LSTM performed better than ARIMA and Back Propagation Neural Networks (BPNN). In comparison to the ASHRAE standard schedule, Wang et al. [159] achieved decreased prediction errors by using LSTM to anticipate heating load in office buildings in the United States (12% to 8% and 26% to 16% reduction). A Recurrent Initial Convolutional Neural Network (RICNN) was created by Kim et al. [166]. By calibrating prediction times and values of hidden state vectors calculated from nearby time steps using a one-dimensional convolutional initial module, the network was optimized to predict short-term electric load. Using images generated from sequence values of multivariate time series, Sadaei et al. [160] used CNN in conjunction with fuzzy time series (FTS) to develop a technique for automatically extracting features for STLF. RNNs and CNNs outperformed ARIMAX in terms of accuracy, resilience, computational efficiency, and generalization ability when used by Cai et al. [25] to anticipate the day-ahead load of commercial buildings.

Figure 8. Schematic of (a) fully connected layer, (b) convolutional layer, (c) loop in RNNs [164].

Figure 9. Illustration of autoencoder model architecture [164].

The selection of a deep learning architecture is critical for time-series forecasting of building energy. Each architecture has distinct strengths and weaknesses, making it more or less suitable for specific tasks. Table 2 provides a comparison of deep learning architectures for building energy forecasting.

In practice, for most building-level forecasting with a few years of hourly data, well-tuned LSTM or CNN-LSTM models often strike a good balance. However, for portfolio-level forecasting across thousands of buildings or for capturing complex, long-term seasonal effects, transformer architectures are emerging as state-of-the-art.

(c): Ensemble Models

To improve the prediction performance of individual data-driven models, an ensemble approach aggregates the results of several learning algorithms [167]. Bagging, boosting, and stacking models (also known as parallel homogeneous, sequential homogeneous, and heterogeneous ensemble approaches [167] are three types of commonly used ensemble methods. Figure 10 displays the schematics for these three categories of ensemble approaches.

By training the same baseline models in parallel on several sub-datasets that are evenly sampled from the original input datasets by replacement, a technique known as “bagging,” or “bootstrap aggregating,” predicts the output. Because each baseline model is independent, this technique tends to reduce variance when applying the trained model to the validation set.

Decision trees serve as the foundation models for RF, the most widely used bagging technique [167]. According to Wang et al. [103], when it comes to predicting hourly electricity use, RF is more accurate than RT and SVR. An ensemble bagging tree model was presented by Wang et al. [168] to forecast the hourly electricity demand of educational buildings. Their findings demonstrate that the suggested ensemble model outperforms RT in terms of accuracy. The bagging tree model’s longer training time than RT would be a problem, though. Additionally, the use of the suggested bagging method is limited by the need for an extra step to generate sub-datasets and the fact that it is less interpretable than RT [167].

Bagging and boosting vary in that boosting trains the baseline models gradually, meaning that each new model attempts to correct the errors of its predecessors.

The fundamental way is to assign greater weight to misclassified data points, represented by the orange markers in Figure 10b. This reweighting helps the boosting algorithm focus on harder-to-classify instances, thereby reducing training errors.

A gradient-boosting regression model was used by Robinson et al. [169] to forecast the yearly energy usage of several commercial building types situated in various locations. According to their findings, the gradient boosting regression model performs better than bagging models with a small number of features as well as conventional linear models (LR and SVR, for example). Furthermore, the gradient boosting decision trees (GBDT) are accurate and versatile for very short-term factory load predictions, according to Walther et al. [170].

It is important to examine the interpretability, robustness, and efficiency of various models in addition to comparing their prediction accuracy. Using a case study of a 2-h-ahead heating load forecast for a residential quarter, Wang et al. [171] examined these four features of five models (XGBoost, GBDT, RF, ANN, and SVM). Taking into account all the performance, they concluded that there is no ideal model. For example, XGBoost exhibits greater efficiency, but RF exhibits the highest accuracy, interpretability, and resilience.

Stacking operates on a random collection of models, in contrast to bagging and boosting, which use the same baseline models. To create the final prediction, a meta-model is built based on the outputs of several models that have been trained on the given input information, as seen in Figure 10c.

Huang et al. [172] used an ensemble learning approach that included XGBoost, extreme learning machine (ELM), LR, and SVR to anticipate the heating demand for a ground source heat pump that provides space heating for a residential area two hours in advance. According to their findings, the suggested ensemble model outperforms XGBoost, ELM, LR, and SVR in terms of accuracy.

Despite an established track record of use in other disciplines, the application of ensemble ML techniques (e.g., RF and gradient-boosted regression trees) in the building energy domain has been limited in recent years [103,173,174,175]. To compare SVM, ANN, and ensemble models for the prediction of building energy performance, Li et al. [176] used a trust metric to assess the models’ dependability. It was determined that SVM and ML were better than linear and ensemble models. The models were not, however, optimized by the authors to produce the Pareto frontier. To estimate the energy performance of residential buildings, comprising 768 variants of a model building that were assessed using Ecotect software, Papadopoulos et al. [174] also compared various ensemble models.

Limitations and application

ML techniques are typically applied to “big data” because growing levels of metering and monitoring make vast amounts of data accessible for study. When enough data is available, this makes them effective computational methods; nevertheless, it also makes them reliant on the gathering of this data [177].

Similar to statistical models, it appears that one of the primary disadvantages of ML techniques is that their “black-box” nature makes it challenging to provide tangible, real-world interpretations for the models’ results [178].

But when the right datasets and monitoring data are available, this is a new field that obviously has a lot of promise for enhancing the assessment and forecasting of building energy performance [179,180].

(d): Hybrid and Multi-Category Models

Building type and energy end-use are not the basis for hybrid modelling and classification; rather, the modeling process and building application are [181]. Figure 11 depicts the two primary hybrid techniques: data-driven and physics-based models with distinct approaches, advantages, and disadvantages. The “white box” approach, which is based on physics, uses physical equations to depict building energy usage [182]. The hybrid model is a “gray box” that combines data-driven and physics-based models.

The core logic of the physics-based model, or “white box,” is characterized as using comprehensive information about the building system, manufacturing materials, and surrounding environment to mimic internal energy use. Because it involves competent expertise and takes a lot of time, the calculation procedure is sophisticated [184]. The use of this paradigm is made more challenging by the restrictions on the gathering of material data and the size and specifications of mechanical systems.

The data-driven model, sometimes known as the “black box,” is very flexible and performs well in terms of prediction. It analyzes past energy usage and forecasts future trends using statistics and machine learning [185]. It uses response surface methodology (RSM), SVR, MLR, ANN, and other methods with minimal physical information requirement [186,187]. Data collection, data preprocessing, model training, and model testing are the four processes involved in creating this type of AI-based model. The choice of input data determines how accurate the model simulation is; therefore, more accurate simulation results are obtained with high influence and relevant input data [188].

The hybrid model, often known as the “gray box,” blends the black box with the white box, using optimization techniques to improve integrated machine learning algorithms and single data-driven technology. By integrating benefits, reducing drawbacks, and applying to real-world phenomena, “the gray box” maintains the equilibrium between the two earlier approaches [183].

A significant trend in the recent literature is the move away from relying on a single model category towards hybrid approaches that combine the strengths of multiple methods. These models often cross the boundaries between classical ML, DL, and ensemble methods, or integrate data-driven with physics-based concepts, to achieve superior performance and robustness [189]. Models like that developed by Wang et al. [105] combined Random Forest (an ensemble) with LSTM (a deep learning model) for more accurate hourly load forecasting than either method alone. Physics-informed neural networks (PINNs) are a class of hybrid models where the loss function of a neural network is regularized by physical laws (e.g., thermodynamic equations), forcing the model to produce physically plausible predictions, thereby improving generalizability with less data [190]. Some studies employ models in a sequence. For example, Lu et al. [105] used the CEEMDAN algorithm for data decomposition first, then applied XGBoost for prediction, leveraging the strengths of signal processing and ensemble learning. Huang et al. [172] used a stacking ensemble, where predictions from XGBoost, ELM, LR, and SVR were used as inputs to a meta-model to make the final prediction, often outperforming any single model. The use of hybrid models demonstrates a pragmatic approach to tackling the complexity of building energy performance, moving towards more robust and deployable solutions. Table 3 below provides a summary of the hybrid and multi-category modeling approaches.

Although all three approaches have their limitations, the hybrid model is mostly applied before construction in contemporary building design for performance optimization, cost savings, and risk reduction to create a robust, efficient, and sustainable building [51]. They require enough information to back up the outcome and guarantee the accuracy of the predictions. The performance gap between simulation results and the reality construction process is influenced by unpredictable variables such as geographic location, material alteration, and user behavior [191].

(e): Frameworks for Model Integration and Collaboration

The successful implementation of hybrid models relies on well-defined integration frameworks that specify the collaborative mechanisms between physics-based and data-driven components. Moving beyond the conceptual” gray”, we identify three primary architectural patterns for model coupling:

Sequential Calibration Framework: In this architecture, a physics-based model generates an initial simulation. A data-driven model is then used to calibrate the simulation outputs against real-world measurement data, learning the residual error. The final prediction is the sum of the simulation output and the data-driven correction term. This is particularly effective for simulation calibration and post-retrofit evaluation, where the physical model provides a structurally sound baseline and the ML component fine-tunes it for a specific building.
Surrogate-Assisted Optimization Framework: Here, a data-driven model is trained to act as a fast-to-evaluate surrogate for a computationally expensive physics-based simulation. This surrogate is then embedded within an optimization loop to rapidly explore thousands of design or control options (e.g., setpoint schedules, retrofit packages). This framework is invaluable for design-phase optimization and real-time optimal control, where directly using the simulation would be prohibitively slow.
Physics-Informed Learning Framework: This is a tighter form of integration, where physical laws are embedded directly into the loss function or architecture of a neural network. For example, a Physics-Informed Neural Network (PINN) for building temperature forecasting would have been a loss function comprising both the data mismatch (compared to sensor data) and the residual of the governing heat equation. This penalizes physically implausible solutions, significantly improving generalizability and robustness, especially in data-sparse regimes.

The above frameworks provide a blueprint for developers, outlining how to structurally combine the interpretability and physical consistency of white-box models with the adaptability and pattern-recognition power of black-box models.

(f): Regional and Temporal Analysis of Research Focus

The application and development of energy performance models are not uniformly distributed globally. Research focus is often influenced by regional energy policies, climate challenges, economic development, and data availability. To provide a clear picture, we analyzed the geographic distribution and temporal trends of the studies cited in this review. See Table 4 below for a summary of the regional analysis and Figure 12 for the temporal analysis.

To illustrate the geographic and temporal distribution of research focus within the scope of our review, we analyzed the number of key publications per region over time. As shown in Figure 12, the research landscape is dominated by studies from North America and Europe across all time periods. While Asia shows a significant and growing body of work, there is a pronounced and persistent lack of research focused on Africa, a critical gap that this review aims to highlight, especially given the continent’s unique challenges and opportunities in improving energy performance in non-domestic buildings.

2.2.5. Evaluation Metrics

The evaluation metrics of predictive models serve as indispensable quantitative tools for assessing their performance and effectiveness. By precisely measuring the model’s performance on a test set, they provide researchers with clear benchmarks for evaluating the merits of different models [192]. Among the numerous evaluation metrics, mean absolute error (MAE) (see Equation (8)), Mean Absolute Percentage Error (MAPE) (see Equation (9)), mean square error (MSE) (see Equation (10)), RMSE (see Equation (11)), and R² score (see Equation (12)), are frequently used [193].

M A E = \frac{1}{n} \sum_{i = 1}^{n} | x_{i} - y_{i} |

(8)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} | \frac{y_{i} - x_{i}}{y_{i}} |

(9)

M S E = \frac{1}{n} \sum_{i = 1}^{n} (x_{i} - y_{i})^{2}

(10)

R M S E = \sqrt{\frac{\sum_{i}^{n} (y_{i} - x_{i})^{2}}{n}}

(11)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - x_{i})^{2}}{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}

(12)

where n represents the number of samples.

x_{i}

denotes the i-th simulated value;

y_{i}

indicates the i-th actual measurement;

\bar{x}

represents the mean of the simulated values; and

\bar{y}

signifies the mean of the actual measurements.

These assessment criteria have a close relationship with various algorithms used to anticipate building energy usage. Consider the algorithm for LR. It looks for the straight line that best describes the link between connected factors and data on energy use. In this instance, the average degree of variation between the actual building energy consumption figures and the predicted values from linear regression can be intuitively represented by the MAE [194]. Due to LR’s sensitivity to the data’s general trend, the MAE provides a reliable way to gauge its prediction inaccuracy. Even if there are a few anomalous energy usage data points, they won’t significantly affect the model’s total error assessment. However, when LR runs into intricate nonlinear correlations in the data, the MAE might not be able to accurately capture how the model performs differently in various geographical areas. Currently, it can be used in conjunction with the MSE or RMSE to better assess the scenario [195].

By creating a tree structure, the DT algorithm categorizes or forecasts building energy usage. When assessing the decision tree algorithm, the MAPE is quite important. Researchers can better comprehend the accuracy of the decision tree’s prediction within various energy consumption level intervals by using the MAPE, which can intuitively display the relative prediction error in the form of a percentage when the decision tree processes distinct categories of data [196]. However, when splitting nodes, little variations in the input could cause the decision tree to vary significantly. A variation in the assessment of the model’s overall performance will result from outliers in the data, which will cause the MAPE to rise noticeably [197].

The NN technique is frequently used to anticipate energy usage because of its intricate structure, which allows it to recognize intricate patterns in data. When assessing a neural network model, the MSE can encourage the network to focus more on energy consumption data points with higher prediction deviations during training because of its magnifying effect on greater errors. The network parameters are regularly modified using the backpropagation technique to raise the overall prediction accuracy [198]. The RMSE, which is the square root of the MSE, offers a more logical way to analyze the neural network model’s prediction outcomes. Using the RMSE, researchers can rapidly ascertain how well the neural network model’s projected values match the actual building energy consumption numbers before assessing the model’s performance [199].

When evaluating different algorithms, the R² value is always used. The more closely the R² value for the aforementioned neural network, decision tree, and linear regression algorithms approaches 1, the better the model fits the building energy consumption data; that is, the larger the percentage of the energy consumption data variance that the model can account for. Researchers can use the R² value, for instance, in a neural network model to assess whether the network structure makes sense and whether the model has fully understood the inherent rules in the energy consumption data [200].

Regarding assessment techniques, it is frequently impossible to fully assess a building energy consumption forecast model’s performance using a single metric. Because of its sensitivity to outliers, the MAPE alone may misjudge the model’s overall performance due to individual abnormal energy consumption data, whereas depending only on the MAE may ignore the model’s prediction differences in various data distribution regions [200]. As a result, using a mix of several indicators for evaluation is typically advised. When the MAE and RMSE are used simultaneously, for example, the MAE offers a reliable estimate of the error, while the RMSE more naturally captures the degree of similarity between the true and predicted values, thoroughly assessing the model’s performance from several perspectives. Furthermore, techniques like cross-validation can be applied. To more correctly assess the model’s generalization capacity in predicting building energy consumption, the dataset is split up into several subsets, and training and testing are conducted on distinct subsets to lessen assessment bias brought on by the dataset’s division [200].

2.2.6. Comparative Performance Analysis

The effectiveness of many data-driven models for forecasting energy use in non-domestic buildings is assessed in this section. Three main criteria, namely accuracy, scalability, and interpretability, form the basis of the comparison. The models fall into three categories: ensemble approaches, deep learning, and traditional machine learning. For clarity, a summary table is included in Table 5.

2.2.7. Quantitative Benchmarks from Contemporary Research

While Table 6 provides a generalized performance overview across model categories, it is instructive to examine quantitative results from specific, advanced applications in the broader energy systems domain. These studies often tackle complex, multi-variable problems that share similarities with the non-linear, dynamic nature of building energy performance metrics from three such studies, highlighting the achieved accuracy and the context of their application.

These benchmarks demonstrate that advanced ML models, particularly those integrated with physical knowledge, can yield significant improvements in both accuracy and economic and operational outcomes. This reinforces the potential of similar hybrid and AI-driven approaches for non-domestic building energy management, where the goals of cost, efficiency, and system longevity also align.

2.2.8. Conclusion for Energy Performance in Non-Domestic Buildings Using ML

Statistical benchmarking works well in determining the energy performance of a building but does not provide a complete insight into the mechanisms and reasons behind the performance. ML techniques also don’t perceive the physical aspects of buildings, and this may reduce their ability to recognize areas of improvement for energy performance. Although they are not very reliable, engineering calculation procedures are useful in approximating the potential for improvement of individual retrofit measures. Although simulation models may be very realistic, their application is limited as they are demanding in terms of building detail and calibration. These approaches are contrasted in Table 7.

The above information lists four primary methods of building analysis: Engineering Calculations, Simulation, Statistical methods, and Machine Learning. Each method has varied characteristics concerning its foundation, underlying approach, sources of information, level of exactness, typical deployment scenarios, and inherent limitations.

Engineering calculations in the referenced sources [54] are based on fundamental engineering principles and simplifying building information. This method is of mixed accuracy but is extremely flexible and can be applied for design and end-use calculations. Its main weakness is its inherently limited accuracy due to its inputs being simplified. Simulations from references [60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203] provide detailed building data for high levels of accuracy. They are ideally suited for design use and checks for compliance, as well as detailed analyses of difficult situations or complex buildings that demand accurate responses. They are highly user skill-dependent and require much data input, which can therefore be a disadvantage.

Statistical methods, like those in ref. [76] and ref. [77], apply sets of building data and provide their mean accuracy. Such methods are used principally for benchmarking schemes and for making simple assessments. Their main shortcomings are their dependence on statistical information’s quality and coverage and on trends towards limited accuracy.

Finally, Machine Learning from sources [88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184] utilises large sets and can achieve a level of correctness ranging from average to high. This technique suits large data-rich buildings and multi-parameter problems. While promising, building such models can be challenging, and their primary weakness is their inability to incorporate physical building qualities directly.

Hong et al. [116] point out that top-down and bottom-up benchmarking techniques are typically poorly calibrated, which degrades both. Furthermore, despite extensive research on the topic, reconciliation (calibration) remains a significant challenge and an area of active research, rather than a neglected one. In addition to mapping end-uses and identifying the performance of particular systems and areas for development, an efficient benchmarking framework would be able to rate the overall energy efficiency of buildings using insights from statistical or machine learning evaluations.

Buildings present real-world challenges with intricate relationships between humans, systems, and the environment. Fitness for purpose is perhaps the most important factor to take into account when evaluating building processes. To determine the proper distribution of resources and, thus, the modeling approach for creating benchmarks and assessing performance, the evaluation’s objective and the data at hand must be well-defined [6].

3. Methodologies for Data Preparation

3.1. Current State of Building Energy Consumption Data

Building energy data collection has been made easier by automation, smart technologies, and massive data storage. It comes into two categories: data on specific energy consumption and data about factors that influence energy consumption (also known as energy consumption influencing variables).

On-site data correctly depicts how much energy buildings actually use. However, the quality of the data collected is low due to problems with metering devices and data transmission, resulting in several issues that have always been challenging to resolve, including anomalies, breakpoints, noise, etc. In addition to being inaccessible, information and data from energy consumption monitoring platforms are often underutilized due to low richness (few static data points of buildings) and poor data quality (often includes missing and abnormal values).

Building energy consumption databases have been generated by government agencies or research groups. Table 8 compares data from many publicly accessible databases on building energy consumption.

3.2. Data Preprocessing Methods

Enhancing the quality of the data is the main goal of preprocessing, which guarantees that the prediction model can learn and generalize efficiently. According to pertinent research, data preprocessing is one of the most crucial jobs and accounts for more than 80% of the labor in the overall prediction process [214].

Data cleansing, transformation, time-series processing, dimensionality reduction, and augmentation are common preprocessing techniques for HVAC system load forecasting.

Anomaly detection and management of missing values are two aspects of data cleansing. These issues result in incomplete or erroneous building energy consumption records and are caused by things like sensor failures, communication breakdowns, or recording accidents [215]. There are several imputation techniques for dealing with missing values. Methods like mean imputation, forward and backward filling, linear and polynomial interpolation, Kalman filtering, and moving averages are examples of single-variable imputations. Techniques like K-Nearest Neighbors (KNN), RF, Multiple Singular Spectral Analysis, and Matrix Factorization can be used to fill up multivariate data gaps. The size of the data gap and the unique characteristics of the method itself determine which imputation technique is best [216].

Cho et al. [217] examined the accuracy and computing requirements of six distinct data imputation techniques using Normalized Root Mean Square Error (NRMSE) as a primary criterion to assess imputation strategies’ effectiveness. The findings showed that the size of the missing data gap has the biggest influence on the effectiveness of an imputation technique. Among the techniques for detecting anomalies are (1) distance-based techniques like KNN; (2) statistical techniques like box plots and standard deviation techniques; and (3) decision tree-based techniques [101]. The use of wavelet analysis algorithms for data filtering during data preprocessing was suggested by Chen et al. [132], demonstrating the high efficiency and good applicability of wavelet analysis techniques in handling noisy data in large datasets.

Normalization and standardization of data are commonly referred to as data transformation. For the majority of ML techniques, particularly those that are sensitive to the variable scale, this is crucial. Data transformation also includes feature encoding of non-numeric information (such as categorical variables). Label encoding and One-Hot encoding are popular feature encoding techniques. To decrease the volume of data, Fan et al. [214] created the Symbolic Aggregate Approximation, which converts original time series Building Automation System (BAS) data into intelligible symbol streams.

One effective method for developing performance analysis is time series analysis. Finding trends in energy use, spotting irregularities in building operations, and eventually improving building performance are all made possible by the analysis of time series data that includes building environment, energy use, and operational information. To capture time periodicity, processing time labels and transforming timestamps into numerous features are frequently necessary for time-series parameter analysis. To derive the dynamic features, it is also necessary to take into account past data points, such as load statistics from earlier hours or days [218].

Reducing data dimensionality aims to save as much of the original information as feasible while minimizing the dataset’s dimensions. In addition to increasing computing efficiency, this procedure enhances ML algorithms’ efficacy by removing characteristics that don’t support discrimination. PCA and Kernel Principal Component Analysis (KPCA) are two techniques for lowering the dimensionality of data. The performance of SVM in combination with PCA, SVM with KPCA, and SVM without any dimension reduction strategies was assessed by Xuemei et al. [219], who used PCA and KPCA to reduce the dimensions of the data.

3.3. Data Fusion Methods

Compared to depending just on one source, data fusion technologies combine data from other sources to produce a richer and more complete information view. Insufficient data can also be addressed with this data augmentation technique. It becomes crucial to create data collection plans and suitable information fusion tactics based on prioritized goals [152]. From several angles, including data fusion levels and approaches, Himeur et al. [55] discussed data fusion strategies and possible uses in developing energy-saving systems. Depending on the processing step, data fusion approaches can be divided into three primary categories: data-level fusion, feature-level fusion, and decision-level fusion.

The simplest type of data fusion is called “data-level fusion,” in which unprocessed sensor data is combined directly to produce an extensive dataset [219]. Usually, sensor or picture data can be used for this. Fusion preserves the most original information because it typically takes place before other higher-level data processing. By using a Kalman filter for data fusion, Li et al. [114] tackled the problem of increasing the load prediction’s robustness. By integrating multi-modal data, this method produces more trustworthy prediction results. In the absence of historical data for the target building, a combined building morphology, onsite test energy data, and simulated energy data were used to forecast energy for the target building based on comparable building data [218].

In contrast to data-level fusion, feature-level fusion involves using certain fusion techniques to extract pertinent features from raw data. Because it handles less data, this approach is typically more effective. Himeur [55] suggested a successful feature extraction technique after investigating the complementary qualities of several home appliances. This method was created to improve feature discrimination and was based on the fusion of time-domain features. To determine which feature combination yields the most accurate forecast, Wang et al. [220] applied several feature combinations, such as illumination, occupant count, WiFi connections, and other electrical demands, based on the LSTM prediction algorithm.

At the highest level, decision-level fusion combines output decisions from many models, including categorization and prediction outcomes. This technique is frequently used in multi-expert systems, where each system makes independent decisions that are then combined using a strategy (voting, weighted averaging, etc.) to maximize the benefits of several independent decisions and improve the final decision’s accuracy and dependability. To create efficient energy-saving measures, Fotopoulou et al. [221] gathered data from various sensor modes, translated it to an entropy semantic model, and used several data fusion techniques. Xiao et al. [51] suggested an ontology-based semantic retrieval technique to improve building energy management in BIM systems. Depending on the data processing approach, data fusion algorithms are classified into statistical-based algorithms, mathematical model-based algorithms, and ML-based algorithms.

Statistical-based fusion algorithms concentrate on the properties that the data itself presents, such as mean, variance, correlation coefficients, etc., rather than primarily depending on a theoretical or physical knowledge of the data production process. For example, in multi-sensor systems, information from many sensors can be fused using the weighted averaging method, where weights are determined by the accuracy or dependability of each sensor. By combining the indirectly measured chiller energy consumption and evaporation/condensation temperature with the directly measured frozen water flow and supply/return water temperature difference, Huang et al. [172] were able to predict cooling loads more accurately and optimize the unit’s control.

3.4. Transfer Learning

Applying information gained from solving one problem to a related but different new challenge is made possible by transfer learning [222]. In essence, this uses pre-existing data and models to accelerate the learning process for new tasks. This implies that the target task’s performance and learning efficiency can be enhanced by applying elements from the source task. When there is not enough data for the target job, transfer learning is especially useful.

One of the main benefits of transfer learning in building load forecasting is its ability to effectively address the issue of data inadequacy. There is often a lack of sufficient historical data required to properly train prediction models for new buildings or buildings that have not been subjected to long-term energy consumption monitoring. In these situations, transfer learning enables us to take advantage of rich data that has been gathered in other surroundings or structures. The load forecast capabilities for particular structures or buildings with limited data can be greatly improved by applying knowledge from two related but distinct jobs. Kim et al. [80] achieved knowledge transfer from commercial to residential buildings and from residntial buildings with static electricity pricing to those with time-based electricity pricing by using cross-building transfer learning for unsupervised energy consumption prediction. Zhou et al. [223] used transfer learning technology to construct efficient load prediction for residential and commercial customers in situations of data scarcity, based on BiGAN for data augmentation. Fang et al. [224] performed a prediction for office buildings with limited historical measurement data under various energy consumption characteristics and climate conditions by using LSTM to obtain temporal features and Domain Adversarial Neural Networks (DANN) to extract domain-invariant features for cross-building energy consumption prediction.

A building’s location has a big influence on how much energy it uses. Regional variations in climate and seasonal patterns also impact a building’s energy consumption. Even with relatively little data in the new region, the cross-geographical transfer enables the model to adjust to patterns of energy consumption under various climatic and environmental conditions. To improve the accuracy of predictions for new buildings, Gaurav et al. [225] forecasted energy use across four schools located in Newfoundland, Canada, using seasonally and trend-adjusted transfer learning. The prediction accuracy incorporating data from several schools increased by 11.2% when compared to a model that used only one month’s worth of data from the target school.

The effectiveness of transfer learning can be demonstrated with a concrete example. In a study by Fang et al. [224], a model was pre-trained on a source office building with a full year of data. This model was then fine-tuned using only two weeks of data from a target building with different architectural characteristics and located in a different climate zone. The transfer learning approach achieved a 20% lower MAPE compared to a model trained from scratch on the limited target data alone.

When even transferable source data is scarce, data augmentation techniques can be employed. Generative Adversarial Networks (GANs) can be trained to synthesize realistic building energy consumption time-series data that preserves the statistical properties of the original, small dataset. This artificially expanded dataset can then be used to train more robust models, mitigating overfitting. Similarly, few-shot learning techniques or specific faults in HVAC systems, where only a handful of labeled examples are available.

Using transfer learning to estimate building loads can greatly improve the effectiveness and application of prediction models, especially when dealing with data scarcity and geographic variety.

Data preprocessing transforms data into a format that is more appropriate for machine learning models, which is crucial for improving the accuracy and generalizability of the models.

The efficacy of transfer learning and data augmentation in overcoming data scarcity best demonstrated through concrete applications. The following Table 9 summarizes documented performance gains from recent studies.

The examples in Table 9 underscore that transfer learning is not merely a conceptual remedy but a practical tool that can deliver substantial improvements in prediction accuracy and model robustness when historical data is limited, a common scenario for new buildings or those with recently installed metering.

4. Barriers, Challenges, and Lessons Learned

The review’s primary objective was to explore and synthesize the models and methods used for estimating and predicting energy performance, CO₂ emissions, and the shortcomings of traditional energy management systems. Energy Performance metrics for non-domestic buildings and their forecasting capabilities were evaluated through a data-driven approach. This study integrated more powerful, emerging methods stemming from AI and ML to increase the efficiency of energy-related processes in buildings. The main findings are as follows:

Non-domestic buildings occupy a considerable segment of the world’s energy consumption—40% use and 24% CO₂ emissions. Energy usage is often tied directly to HVAC systems, building engineering, occupancy trends, and behavioral routines [226].

Engineering calculations are useful during the design stage. However, they tend to be overly simplistic [226].

Simulation Models (EnergyPlus, TRNSYS) have a higher accuracy rate, but they are burdensome with respect to required input data, as well as needing expert calibration [22,73].

Statistical models like Regression or stochastic frontier analysis generally produce moderate accuracy results, which provide value when benchmarking but lack physically interpretable indicators [227]. On the other hand, ML techniques (ANN, SVM, Gaussian Processes) were applied with great success for forecasting evolving non-linear energy flows due to their high accuracy and flexibility [228]. Additional techniques, such as ensemble modeling and transfer learning, improved generalization even further. With some methods predicting errors between 1.9% to 11.5% [229].

The overall quality of data was greatly enhanced through Kalman filtering alongside wavelet analysis and PCA [230]. Also helpful were Transfer learning techniques, where data was drawn from analog buildings or similar climate conditions, and reused data from nearby structures close to refuge conditions [225].

ML models have “black-box” interpretability problems and need massive amounts of data. Engineering and simulation methods are less scalable and are resource-intensive. Generalization of models is rendered challenging by local climate variability and building diversity [231].

Integration with IoT and hybrid models (such as ANN and hybrid models) has potential for real-time energy management and retrofit planning [179].

4.1. Practical Implementation Barriers

Issues related to data present many challenges to building energy consumption forecasting in real instances. Data is characterized by noise, missing information, and format variability with respect to quality and integrity. Loss of data, for instance, is induced by defective sensors, and integration is impeded by format discrepancies among devices. Effective preprocessing and data-cleaning methods are in dire necessity. Security and confidentiality of the data are also very critical. Preventing information leaks and protecting user privacy in the process of acquiring accurate energy consumption forecasting while sharing and transmitting information is a challenging issue when information is carried by data. For example, there should be tight control of access rights and encryption when cloud infrastructure handles information. Moreover, merging data from different sources is very challenging. The quality, location granularity, and time resolution of data from several sources are vastly different. For example, the temporal scales of smart meters’ data and weather data are different. Further study is required to ascertain how to reconcile and match them so that forecast accuracy is improved.

The application of building energy consumption prediction models in operation is limited by model problems. In the sense of model simplicity and explainability, deep learning models exist in a “black-box” state and have intricate structures, though they make precise predictions.

It is hard for building managers and energy professionals to comprehend their decision-making process. For instance, if used for commercial building cooling-load prediction, the Transformer model is not understandable. Hence, integrating explainable technology or developing explainable deep-learning models is an imperative. Generality and flexibility in models are bad. The settings, usage patterns, and structural characteristics of buildings differ, and current models find it difficult to learn new situations quickly. For example, a southern commercial building cannot be replicated precisely from a northern residential building. The models also require dynamic adjustment and real-time performance more urgently. Since the energy system is dynamic, models need to be refreshed rapidly with real-time data to adjust to the changes in the environment and the patterns of building usage. For instance, the model will need to instantly scale up the prediction outputs in the event of sudden increased activity among employees that requires higher energy usage.

Forecasting energy usage by buildings is further complicated by a range of factors in the real application scenario. Building a functioning environment is ever-changing. Existing models struggle to model real-time energy usage patterns due to factors like aging equipment, retrofits, and end-user behavior variations. For instance, with the replacement of energy-saving lighting equipment or air conditioning usage patterns being altered, the accuracy of prediction by the model is easily influenced.

4.2. Policy and Economic Barriers

It is impossible to exclude market, regulation, and policy influences. Building energy consumption behaviors can be adjusted by modifications of energy law and regulations and modifications of energy market price, although the models rarely fully account for adjustments. An energy-saving subsidy program, for instance, needs to be included in the model if the model needs to simulate reality. Moreover, it is hard to optimize more than one system at a time. Estimation of building energy consumption requires expertise from local energy grids and smart grids, but coordination procedures for the systems are intricate, with numerous stakeholders. To create a win-win scenario, for instance, one of the biggest challenges is how to customize power consumption behavior appropriately according to estimates of building energy consumption under peak grid loads.

Addressing these barriers requires concerted effort. Policymakers can update building codes to be performance-based, incentivizing outcomes rather prescribing methods. Governments and industry bodies can fund demonstration projects to showcase return on investment (ROI) and develop standardized data schemes to improve interoperability. Finally, academic and training programs must evolve to create a new generation of building technology experts fluent in both the physical and digital domains.

5. Current Trends and Open Research Areas on Data-Driven Energy Performance Models

ML for Energy Prediction: Given the enormous amount of energy consumed by city buildings nowadays, especially in modern nations, this should be emphasized as one of the major global concerns. There is a need to develop a range of assessment techniques to create the best predictive tool for energy monitoring and energy consumption in buildings. On the one hand, the Internet of Things (IoT) and its features are currently the most researched topics in practical applications. However, the IoT’s ability to regulate energy use has greatly improved thanks to ML approaches [232].

IoT technology in smart buildings improves the functionality and efficiency of many building components by utilizing advanced sensors, data analytics, and real-time control systems. A network of Internet of Things sensors is used by automated climate control systems to track temperature, humidity, and occupancy levels in different areas of the building [233]. These sensors supply information to centralized controllers, which employ machine learning algorithms to adjust HVAC settings dynamically, maximizing energy efficiency and thermal comfort. IoT-enabled thermostats have the potential to reduce unnecessary energy use by lowering HVAC output in vacant rooms while maintaining optimal temperatures in frequently used regions [234]. IoT sensors are used in intelligent lighting systems to detect occupancy, the amount of ambient light, and the time of day to adjust artificial illumination. Occupancy sensors are frequently added to these systems, enabling lights to turn off automatically when a room is empty. Furthermore, light dimming technologies may be used by advanced IoT lighting systems to adjust brightness in response to real-time data, improving energy efficiency [235]. Facility managers can improve energy efficiency and reduce operating costs by combining lighting control systems with building management software, which provides real-time insights into power consumption trends.

IoT technology improves security and surveillance in smart buildings through networked devices such as biometric sensors, motion detectors, and smart cameras [236]. These gadgets allow access control, event detection, and real-time video streaming by integrating with local servers or cloud-based systems. To automatically detect suspicious activity and alert security personnel, advanced IoT surveillance systems frequently use AI-driven analytics, such as facial recognition and anomaly detection. The ability to monitor and manage security systems remotely from any location is greatly enhanced by mobile devices [237]. One of the main issues with IoT-enabled smart buildings is energy management. Power use monitors and energy meters with Internet of Things capabilities continuously track how much energy is used by building systems, including HVAC, lighting, and electrical equipment. To identify energy inefficiencies, this data is collected and analyzed using on-site computers or cloud computing platforms [238]. By combining these technologies with predictive analytics models, smart buildings can reduce peak load demand, enhance power distribution, and proactively alter patterns of energy consumption. The optimal use of clean energy is ensured by IoT systems, which enable the integration of renewable energy sources, such as solar panels, into the building’s energy grid [239].

Digital Twins and Simulation: Predictive modeling and simulation of building energy usage, forecasting energy requirements, and optimizing systems based on future conditions are all made possible by digital twins. These studies’ main goal is to proactively modify energy systems by utilizing real-time data and sophisticated algorithms, ref. [240] combining CNN and LSTM models for predictive modeling and HVAC system management, forecasting energy demands in real time while protecting data privacy via thermal noise injection.

A conceptual architecture for a digital twin building energy management is illustrated in Figure 13. The framework typically consists of five layers: (1) a Physical Layer comprising the building, its systems (HVAC, lighting), and IoT sensors; (2) a Data Layer that ingests and fuses real-time sensor data, weather forecasts, and static BIM data; (3) a Model Layer which is the core digital twin, hosting both physics-based simulation models and data-driven models that are continuously updated with live data; (4) a Services and Analytics managers. This closed-loop system enables virtual testing of control strategies before real-world implementation.

A case study demonstrating this architecture in practice is presented by Clausen et al. [241], who implemented a digital twin for a commercial building in Denmark. The twin integrated a detailed thermal simulation model with real-time IoT data on occupancy and weather. It was used to perform model predictive control (MPC) for the HVAC system, dynamically adjusting setpoints based on forests. This implementation resulted in a documented 23% reduction in heating energy consumption while maintaining thermal comfort, showcasing the tangible benefits of a fully deployed digital twin.

Likewise, ref. [240] uses a microgrid, a real-time simulator, and an LSTM model to modify energy systems according to load forecasts. Reference [241] uses a digital twin in public and commercial buildings to apply model predictive control (MPC), which allows HVAC systems to be dynamically adjusted based on occupancy patterns and weather forecasts. While ref. [194] suggests a digital twin framework for HVAC system management in commercial buildings, combining BIM data and real-time measurements with ANN and MOGA (multi-objective genetic algorithm) algorithms, ref. [242] uses neural networks to optimize HVAC system performance in an office building in Norway. To mimic appliance operations while taking past consumption and weather conditions in a smart building into account, ref. [243] suggests an optimization method (whale IWOA) in conjunction with an LSTM model.

To optimize energy resources in residential structures, ref. [244] compares several predictive models, including LSTM, regression, and Prophet. Furthermore, in a test environment, ref. [244] simulates and optimizes HVAC systems in real time using OpenModelica and machine learning methods. A digital twin based on a deep learning GNN (graph neural network) model is introduced in reference [245] to examine occupant-building interactions. It integrates physiological and environmental data to enhance indoor comfort in real time. To simulate building energy performance, ref. [246] uses evolutionary algorithms (NSGA-II), tools such as EnergyPlus, and observed data (thermal inertia, thermal bridges). Similar to this, ref. [246] trains an ANN model in Spain that can optimize thermal and energy management in real time by utilizing EnergyPlus to refine energy models using data gathered from IoT sensors; however, this study is restricted to a single building. To optimize HVAC and lighting systems in residential structures while taking thermal envelope factors into account, ref. [247] creates a digital twin using Autodesk Revit and Insight. In certain situations, ref. [248] models a virtual pigsty setting to investigate feeding circumstances and extrapolate the findings to actual situations. Although this work is restricted to a single building, ref. [249] trains an ANN model that can optimize thermal and energy management in real time using data gathered from a university building in Spain.

Data-Driven Retrofit Planning: Data-driven approaches are enabling more targeted and effective building retrofits by identifying specific areas where energy efficiency improvements can be made. It is clear from examining the optimization goals and parameters that there are many factors influencing the performance of buildings during the decarbonization process of building retrofits. A methodical and successful optimization approach is necessary to increase efficiency [250]. To effectively find the best solutions, retrofit strategies can be sampled, screened, and iteratively improved using mathematical techniques like optimization algorithms. The non-dominated sorting genetic algorithm II (NSGA-II) is the most commonly used optimization technique.

Data fusion and augmentation technique optimization: Data fusion and augmentation methods are to be optimized to minimize noise injection with increased data diversity and model transferability in multi-source data and sparse data setting support to maximize transfer performance and model adaptability across various buildings and settings through establishment of smart transfer learning approaches that are capable of dynamically tuning model parameters relative to the target task and data characteristics.

Usage of transfer learning and data fusion between various sources in feature engineering: By implementing multi-source data fusion techniques and transfer learning approaches to feature engineering, improved features with improved performance can be obtained from other data from similar buildings for improving the performance of a model in sparsely populated and unbalanced data environments. This also involves the improvement of the overall generalization of models and the creation of even more generalized building load forecast models through the integration of ML and domain expertise in creating standard and generalized feature engineering frameworks that are adjustable to various types and conditions of building’ requirements.

Building load prediction general model building: This includes constructing generalized models that are applicable in different building and environmental classes to mitigate the poor generalization and customization problem of current models, as well as exploring online learning and adaptive learning methods such that model parameters are made real-time tunable to the changing environment. Hybrid models are dynamically controlled and improve data-limited and multi-source data settings’ generalizability and prediction accuracy by capturing the best of various types of models. Their applicability can be extended to other building types and climates, especially in less well-represented regions like Africa.

Successful inference and model training studies: This includes investigating successful model training and inference methods through computation speedup with the assistance of edge and distributed computing capabilities; building light-weight models for the simplicity of processing in real-time for resource-constrained environments; and minimizing the computation cost while sacrificing little model performance.

Development of room-scale load prediction models: To complement the room-scale prediction research, the right load prediction models at the room level will be created from accurate data collection and modeling. This will improve the accuracy of HVAC system control and the comfort of the occupants.

This also includes sensor data, indoor environment, and users’ behavior integration to improve the responsiveness and accuracy of prediction models.

Blending domain knowledge and data-driven techniques: This involves blending domain knowledge in the building industry with data-driven techniques by incorporating building physical characteristics and usage patterns knowledge into model building and developing explainable ML models to make the results of predictions explainable, hence making the models more legitimate and usable.

EnergyPlus model library retrieval algorithms: This involves designing more efficient retrieval algorithms in a larger-scale EnergyPlus model library to find more similar cases for the target building and enhance optimal key variable inference further.

6. Conclusions

This work first provides a thorough analysis for creating data-driven models in terms of general procedures, which includes feature engineering, data-driven algorithms, and factors reflected from outputs, to close research gaps in earlier review studies. A quick introduction to data gathering and data cleansing is given before this. Aspects to be taken into account for data collection include the quantity of data points gathered for model training and validation, the number of collection devices (such as buildings or meters), data sources, and the kind of features gathered. To inspire data collecting, potential feature types (such as weather, indoor environmental, occupancy-related, time index, building characteristics, socioeconomic, and historical data) are initially compiled for feature engineering.

This paper demonstrates how data-driven models can revolutionize building energy efficiency. Accurate load forecasting and retrofit prioritization are made possible by ML techniques, especially ANN and SVM, which perform better in terms of accuracy and scalability than conventional methods. Regional adaptability, model transparency, and data scarcity are still issues, though. These problems are lessened by combining preprocessing, fusion, and transfer learning, which provides useful information for energy managers, architects, and legislators. We hope that the paper provides valuable insights for enhancing forecasting accuracy and implementing energy efficiency measures, particularly in the context of rising energy costs and environmental concerns in Africa.

Author Contributions

Conceptualization, L.P. and T.O.O.; methodology, L.P. and T.E.M.; software, L.P. and T.O.O.; formal analysis, T.O.O. and T.E.M.; writing—original draft preparation, L.P.; writing—review and editing, L.P., T.O.O. and T.E.M. All authors have read and agreed to the published version of the manuscript.

Funding

The APC fees as been paid by the Author’s Research Funds, the Department of Electrical Engineering/F’SATI and the Faculty of Engineering and the Built Environment at the Tshwane University of Technology.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Acronym	Definition
ANN	Artificial Neural Networks
AR	Auto Regression
BEMS	Building Energy Management System
BMS	Building Management System
CV-RMSE	Coefficient of Variation of the Root Mean Square Error
DEA	Data Envelopment Analysis
DSM	Demand Side Management
EEM	Energy Efficiency Measures
EMS	Energy Management System
EPBD	Energy Performance of Buildings Directive
ESP-r	Energy Systems Performance Research
EUI	Energy Use Intensity
FFN	Feedforward Neural Network
GMM	Gaussian Mixture Model
GP	Gaussian Process
GPR	Gaussian Process Regression
HVAC	Heating, Ventilation, and Air Conditioning
IEA	International Energy Agency
ISO	International Organisation for Standardisation
LSTM	Long Short-Term Memory
MAPE	Mean Absolute Percentage Error
ML	Machine Learning
MLR	Multivariate Regression Models
MRM	Multivariate Regression Models
OLS	Ordinary Least Squares
PCA	Principal Component Analysis
RF	Random Forest
RMSE	Root Mean Square Error
RNNs	Recurrent Neural Networks
SVM	Support Vector Machines

References

Global Warming of 1.5 °C. Available online: https://www.ipcc.ch/sr15/ (accessed on 24 June 2025).
Opoku, E.E.O.; Boachie, M.K. The environmental impact of industrialization and foreign direct investment. Energy Policy 2020, 137, 111178. [Google Scholar] [CrossRef]
Yang, J.; Yu, S.; Sun, Y.-F. Restructuring effects of industrial and energy structures on sectoral CO₂ emission peak trajectories in China. iScience 2024, 27, 110541. [Google Scholar] [CrossRef]
Xuan, V.N. Energy factors affecting environmental pollution for sustainable development goals: The case of India. Energy Explor. Exploit. 2024, 43, 410–450. [Google Scholar] [CrossRef]
Leherbauer, D.; Schulz, J.; Egyed, A.; Hehenberger, P. Demand-side management in less energy-intensive industries: A systematic mapping study. Renew. Sustain. Energy Rev. 2025, 212, 115315. [Google Scholar] [CrossRef]
Santamouris, M.; Vasilakopoulou, K. Present and future energy consumption of buildings: Challenges and opportunities towards decarbonisation. e-Prime 2021, 1, 100002. [Google Scholar] [CrossRef]
Gupta, J.; Chakraborty, M. Energy Efficiency in Buildings. In Sustainable Fuel Technologies Handbook; Academic Press: Cambridge, MA, USA, 2020; pp. 457–480. [Google Scholar] [CrossRef]
Lin, B.; Li, Z. Is more use of electricity leading to less carbon emission growth? An analysis with a panel threshold model. Energy Policy 2020, 137, 111121. [Google Scholar] [CrossRef]
Fuhr, H. The rise of the Global South and the rise in carbon emissions. Third World Q. 2021, 42, 2724–2746. [Google Scholar] [CrossRef]
Nunes, L.J.R. The Rising Threat of Atmospheric CO₂: A Review on the Causes, Impacts, and Mitigation Strategies. Environments 2023, 10, 66. [Google Scholar] [CrossRef]
Climate Change: Atmospheric Carbon Dioxide|NOAA Climate.gov. Available online: https://www.climate.gov/news-features/understanding-climate/climate-change-atmospheric-carbon-dioxide (accessed on 24 June 2025).
Hansen, J.E.; Sato, M.; Simons, L.; Nazarenko, L.S.; Sangha, I.; Kharecha, P.; Zachos, J.C.; von Schuckmann, K.; Loeb, N.G.; Osman, M.B.; et al. Global warming in the pipeline. Oxf. Open Clim. Change 2023, 3, 8. [Google Scholar] [CrossRef]
González-Torres, M.; Pérez-Lombard, L.; Coronel, J.F.; Maestre, I.R.; Yan, D. A review on buildings energy information: Trends, end-uses, fuels and drivers. Energy Rep. 2022, 8, 626–637. [Google Scholar] [CrossRef]
Aldhshan, S.R.S.; Maulud, K.N.A.; Jaafar, W.S.W.M.; Karim, O.A.; Pradhan, B. Energy Consumption and Spatial Assessment of Renewable Energy Penetration and Building Energy Efficiency in Malaysia: A Review. Sustainability 2021, 13, 9244. [Google Scholar] [CrossRef]
Buildings—Energy System—IEA. Available online: https://www.iea.org/energy-system/buildings (accessed on 24 June 2025).
Use of Energy in Commercial Buildings—U.S. Energy Information Administration (EIA). Available online: https://www.eia.gov/energyexplained/use-of-energy/commercial-buildings.php (accessed on 24 June 2025).
Kathirgamanathan, A.; De Rosa, M.; Mangina, E.; Finn, D.P. Data-driven predictive control for unlocking building energy flexibility: A review. Renew. Sustain. Energy Rev. 2021, 135, 110120. [Google Scholar] [CrossRef]
Wu, J.; Chen, S.; Ying, X.; Shu, J. Influencing Factors on Air Conditioning Energy Consumption of Naturally Ventilated Research Buildings Based on Actual HVAC Behaviours. Buildings 2023, 13, 2710. [Google Scholar] [CrossRef]
Awoyera, P.O.; Effiong, J.; Nagaraju, V.; Haque, A.; Mydin, A.O.; Onyelowe, K. Alternative construction materials: A point of view on energy reduction and indoor comfort parameters. Discov. Sustain. 2024, 5, 419. [Google Scholar] [CrossRef]
Mokhtari, R.; Jahangir, M.H. The effect of occupant distribution on energy consumption and COVID-19 infection in buildings: A case study of university building. Build. Environ. 2021, 190, 107561. [Google Scholar] [CrossRef] [PubMed]
Bayat, H.; Kashani, A. Reducing material and energy consumption in single-story buildings through 3D-printed wall designs. Energy Build. 2025, 333, 115497. [Google Scholar] [CrossRef]
Mazzeo, D.; Matera, N.; Cornaro, C.; Oliveti, G.; Romagnoni, P.; De Santoli, L. EnergyPlus, IDA ICE and TRNSYS predictive simulation accuracy for building thermal behaviour evaluation by using an experimental campaign in solar test boxes with and without a PCM module. Energy Build. 2020, 212, 109812. [Google Scholar] [CrossRef]
Chen, Y.; Xu, P.; Chu, Y.; Li, W.; Wu, Y.; Ni, L.; Bao, Y.; Wang, K. Short-term electrical load forecasting using the Support Vector Regression (SVR) model to calculate the demand response baseline for office buildings. Appl. Energy 2017, 195, 659–670. [Google Scholar] [CrossRef]
Wang, M.; Yu, J.; Zhou, M.; Quan, W.; Cheng, R. Joint Forecasting Model for the Hourly Cooling Load and Fluctuation Range of a Large Public Building Based on GA-SVM and IG-SVM. Sustainability 2023, 15, 16833. [Google Scholar] [CrossRef]
Cai, W.; Wen, X.; Li, C.; Shao, J.; Xu, J. Predicting the energy consumption in buildings using the optimized support vector regression model. Energy 2023, 273, 127188. [Google Scholar] [CrossRef]
Karatasou, S.; Santamouris, M.; Geros, V. Modeling and predicting building’s energy use with artificial neural networks: Methods and results. Energy Build. 2006, 38, 949–958. [Google Scholar] [CrossRef]
Ascione, F.; Bianco, N.; Iovane, T.; Mastellone, M.; Mauro, G.M. Conceptualization, development and validation of EMAR: A user-friendly tool for accurate energy simulations of residential buildings via few numerical inputs. J. Build. Eng. 2021, 44, 102647. [Google Scholar] [CrossRef]
Uddin, M.N.; Wei, H.-H.; Chi, H.L.; Ni, M. Influence of Occupant Behavior for Building Energy Conservation: A Systematic Review Study of Diverse Modeling and Simulation Approach. Buildings 2021, 11, 41. [Google Scholar] [CrossRef]
Manfren, M.; Aste, N.; Moshksar, R. Calibration and uncertainty analysis for computer models—A meta-model based approach for integrated building energy simulation. Appl. Energy 2013, 103, 627–641. [Google Scholar] [CrossRef]
Alexakis, K.; Benekis, V.; Kokkinakos, P.; Askounis, D. Genetic algorithm-based multi-objective optimisation for energy-efficient building retrofitting: A systematic review. Energy Build. 2025, 328, 115216. [Google Scholar] [CrossRef]
Owolabi, A.B.; Yahaya, A.; Li, H.X.; Suh, D. Analysis of the Energy Performance of a Retrofitted Low-Rise Residential Building after an Energy Audit. Sustainability 2023, 15, 12129. [Google Scholar] [CrossRef]
Grillone, B.; Danov, S.; Sumper, A.; Cipriano, J.; Mor, G. A review of deterministic and data-driven methods to quantify energy efficiency savings and to predict retrofitting scenarios in buildings. Renew. Sustain. Energy Rev. 2020, 131, 110027. [Google Scholar] [CrossRef]
Thravalou, S.; Michopoulos, A.; Alexandrou, K.; Artopoulos, G. Energy retrofit strategies of built heritage: Using Building Information Modelling tools for streamlined energy and economic analysis. IOP Conf. Ser. Earth Environ. Sci. 2023, 1196, 012115. [Google Scholar] [CrossRef]
Carlander, J.; Thollander, P. Barriers to implementation of energy-efficient technologies in building construction projects—Results from a Swedish case study. Resour. Environ. Sustain. 2023, 11, 100097. [Google Scholar] [CrossRef]
Sardar, A.; Islam, R.; Anantharaman, M.; Garaniya, V. Advancements and obstacles in improving the energy efficiency of maritime vessels: A systematic review. Mar. Pollut. Bull. 2025, 214, 117688. [Google Scholar] [CrossRef]
Johari, F.; Lindberg, O.; Ramadhani, U.; Shadram, F.; Munkhammar, J.; Widén, J. Analysis of large-scale energy retrofit of residential buildings and their impact on the electricity grid using a validated UBEM. Appl. Energy 2024, 361, 122937. [Google Scholar] [CrossRef]
Tuominen, P.; Reda, F.; Dawoud, W.; Elboshy, B.; Elshafei, G.; Negm, A. Economic appraisal of energy efficiency in buildings using cost-effectiveness assessment. Procedia Econ. Financ. 2015, 21, 422–430. [Google Scholar] [CrossRef]
Mandel, T.; Pató, Z. Towards effective implementation of the energy efficiency first principle: A theory-based classification and analysis of policy instruments. Energy Res. Soc. Sci. 2024, 115, 103613. [Google Scholar] [CrossRef]
Frederiks, E.R.; Stenner, K.; Hobman, E.V.; Fischle, M. Evaluating energy behavior change programs using randomized controlled trials: Best practice guidelines for policymakers. Energy Res. Soc. Sci. 2016, 22, 147–164. [Google Scholar] [CrossRef]
Labaran, Y.H.; Mathur, V.S.; Muhammad, S.U.; Musa, A.A. Carbon footprint management: A review of construction industry. Clean. Eng. Technol. 2022, 9, 100531. [Google Scholar] [CrossRef]
He, L.; Zhang, L. A bi-objective optimization of energy consumption and investment cost for public building envelope design based on the ε-constraint method. Energy Build. 2022, 266, 112133. [Google Scholar] [CrossRef]
Min, J.; Yan, G.; Abed, A.M.; Elattar, S.; Khadimallah, M.A.; Jan, A.; Ali, H.E. The effect of carbon dioxide emissions on the building energy efficiency. Fuel 2022, 326, 124842. [Google Scholar] [CrossRef]
Adom, P.K. An evaluation of energy efficiency performances in Africa under heterogeneous technologies. J. Clean. Prod. 2019, 209, 1170–1181. [Google Scholar] [CrossRef]
Tachega, M.A.; Yao, X.; Liu, Y.; Ahmed, D.; Li, H.; Mintah, C. Energy efficiency evaluation of oil producing economies in Africa: DEA, malmquist and multiple regression approaches. Clean. Environ. Syst. 2021, 2, 100025. [Google Scholar] [CrossRef]
Agradi, M.; Adom, P.K.; Vezzulli, A. Towards sustainability: Does energy efficiency reduce unemployment in African societies? Sustain. Cities Soc. 2022, 79, 103683. [Google Scholar] [CrossRef]
Gava, E.; Seabela, M.; Ogujiuba, K. Energy Efficiency, Consumption, and Economic Growth: A Causal Analysis in the South African Economy. Economies 2025, 13, 118. [Google Scholar] [CrossRef]
African Union Summit Adopts Bold Strategies for Clean and Sustainable Energy and Transport Pathways|African Union. Available online: https://au.int/en/pressreleases/20250219/african-union-summit-adopts-bold-strategies-clean-and-sustainable-energy-and (accessed on 24 June 2025).
Zambia Energy Efficiency Strategy and Action Plan—Ministry of Energy Integrated Resource Plan. Available online: https://www.moe.gov.zm/irp/?wpdmpro=zambia-energy-efficiency-strategy-and-action-plan-rar (accessed on 24 June 2025).
Oyejobi, D.; Firoozi, A.A. Innovations in energy-efficient construction: Pioneering sustainable building practices. Clean. Eng. Technol. 2025, 26, 100957. [Google Scholar] [CrossRef]
Yin, S.; Wu, J.; Zhao, J.; Nogueira, M.; Lloret, J. Green buildings: Requirements, features, life cycle, and relevant intelligent technologies. Internet Things Cyber-Phys. Syst. 2024, 4, 307–317. [Google Scholar] [CrossRef]
Pan, Y.; Zhu, M.; Lv, Y.; Yang, Y.; Liang, Y.; Yin, R.; Yang, Y.; Jia, X.; Wang, X.; Zeng, F.; et al. Building energy simulation and its application for building performance optimization: A review of methods, tools, and case studies. Adv. Appl. Energy 2023, 10, 100135. [Google Scholar] [CrossRef]
Deb, C.; Schlueter, A. Review of data-driven energy modelling techniques for building retrofit. Renew. Sustain. Energy Rev. 2021, 144, 110990. [Google Scholar] [CrossRef]
Borgstein, E.; Lamberts, R.; Hensen, J. Evaluating energy performance in non-domestic buildings: A review. Energy Build. 2016, 128, 734–755. [Google Scholar] [CrossRef]
Sun, Y.; Haghighat, F.; Fung, B.C. A review of the-state-of-the-art in data-driven approaches for building energy prediction. Energy Build. 2020, 221, 110022. [Google Scholar] [CrossRef]
Himeur, Y.; Elnour, M.; Fadli, F.; Meskin, N.; Petri, I.; Rezgui, Y.; Bensaali, F.; Amira, A. AI-big data analytics for building automation and management systems: A survey, actual challenges and future perspectives. Artif. Intell. Rev. 2023, 56, 4929–5021. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Li, P.; Wang, F.; Osmani, M.; Demian, P. Building Information Modeling (BIM) Driven Carbon Emission Reduction Research: A 14-Year Bibliometric Analysis. Int. J. Environ. Res. Public Health 2022, 19, 12820. [Google Scholar] [CrossRef]
Carvalho, J.P.; Bragança, L.; Mateus, R. BIM-Based Sustainability Assessment: Insights for Building Circularity. In Creating a Roadmap Towards Circularity in the Built Environment; Springer: Berlin/Heidelberg, Germany, 2024; Volume Part F1844, pp. 395–406. [Google Scholar] [CrossRef]
Kaczmarczyk, M. Building energy characteristic evaluation in terms of energy efficiency and ecology. Energy Convers. Manag. 2024, 306, 118284. [Google Scholar] [CrossRef]
Shahin, M.; Babar, M.A.; Chauhan, M.A. Architectural Design Space for Modelling and Simulation as a Service: A Review. J. Syst. Softw. 2020, 170, 110752. [Google Scholar] [CrossRef]
Azar, E.; O’BRien, W.; Carlucci, S.; Hong, T.; Sonta, A.; Kim, J.; Andargie, M.S.; Abuimara, T.; El Asmar, M.; Jain, R.K.; et al. Simulation-aided occupant-centric building design: A critical review of tools, methods, and applications. Energy Build. 2020, 224, 110292. [Google Scholar] [CrossRef]
Natividade, J.; Cruz, C.O.; Silva, C.M. Improving the Efficiency of Energy Consumption in Buildings: Simulation of Alternative EnPC Models. Sustainability 2022, 14, 4228. [Google Scholar] [CrossRef]
Degerfeld, F.B.M.; Piro, M.; De Luca, G.; Ballarini, I.; Corrado, V. The application of EN ISO 52016-1 to assess building cost-optimal energy performance levels in Italy. Energy Rep. 2023, 10, 1702–1717. [Google Scholar] [CrossRef]
Thebuwena, A.C.H.J.; Samarakoon, S.M.S.M.K.; Ratnayake, R.M.C. Optimization of energy consumption in vertical mobility systems of high-rise office buildings: A case study from a developing economy. Energy Effic. 2024, 17, 68. [Google Scholar] [CrossRef]
Qaisar, I.; Zhao, Q. Energy baseline prediction for buildings: A review. Results Control Optim. 2022, 7, 100129. [Google Scholar] [CrossRef]
Wu, T.; Wang, B.; Zhang, D.; Zhao, Z.; Zhu, H. Benchmarking Evaluation of Building Energy Consumption Based on Data Mining. Sustainability 2023, 15, 5211. [Google Scholar] [CrossRef]
Shahee, A.; Abdoos, M.; Aslani, A.; Zahedi, R. Reducing the energy consumption of buildings by implementing insulation scenarios and using renewable energies. Energy Inform. 2024, 7, 18. [Google Scholar] [CrossRef]
Jenkins, D.; McCallum, P.; Patidar, S.; Semple, S. Accommodating new calculation approaches in next-generation energy performance assessments. J. Build. Perform. Simul. 2024, 17, 406–421. [Google Scholar] [CrossRef]
Zhou, J.; Fennell, P.; Korolija, I.; Fang, Z.; Tang, R.; Ruyssevelt, P. Review of non-domestic building stock modelling studies under socio-technical system framework. J. Build. Eng. 2024, 97, 110873. [Google Scholar] [CrossRef]
Mohamed, O.; Fakhoury, S.; Aldalou, G.; Almasri, G. Energy Auditing and Conservation for Educational Buildings: A Case Study on Princess Sumaya University for Technology. Process Integr. Optim. Sustain. 2022, 6, 901–920. [Google Scholar] [CrossRef]
Lam, K.P. Sustainability Performance Simulation Tools for Building Design. In Sustainable Built Environments; Springer: Berlin/Heidelberg, Germany, 2020; pp. 589–655. [Google Scholar] [CrossRef]
Chaudhary, G.; Johra, H.; Georges, L.; Austbø, B. Synconn_build: A python based synthetic dataset generator for testing and validating control-oriented neural networks for building dynamics prediction. MethodsX 2023, 11, 102464. [Google Scholar] [CrossRef]
Field, J.; Soper, J.; Jones, P.; Bordass, W.; Grigg, P. Energy performance of occupied non-domestic buildings: Assessment by analysing end-use energy consumptions. Build. Serv. Eng. Res. Technol. 1997, 18, 39–46. [Google Scholar] [CrossRef]
EnergyPlus Simulation Software. Available online: https://energyplus.net (accessed on 26 June 2025).
DOE2.com Home Page. Available online: https://www.doe2.com/ (accessed on 26 June 2025).
ESP-r|University of Strathclyde. Available online: https://www.strath.ac.uk/research/energysystemsresearchunit/applications/esp-r/ (accessed on 26 June 2025).
Bjørnskov, J.; Jradi, M.; Wetter, M. Automated model generation and parameter estimation of building energy models using an ontology-based framework. Energy Build. 2025, 329, 115228. [Google Scholar] [CrossRef]
Shobha, G.; Rangaswamy, S. Machine Learning. In Handbook of Statistics; Elsevier: Amsterdam, The Netherlands, 2018; Volume 38, pp. 197–228. [Google Scholar] [CrossRef]
Tjøstheim, D.; Otneim, H.; Støve, B. Time Series Dependence and Spectral Analysis. In Statistical Modeling Using Local Gaussian Approximation; Academic Press: Cambridge, MA, USA, 2022; pp. 261–299. [Google Scholar] [CrossRef]
Rajaee, T.; Khani, S.; Ravansalar, M. Artificial intelligence-based single and hybrid models for prediction of water quality in rivers: A review. Chemom. Intell. Lab. Syst. 2020, 200, 103978. [Google Scholar] [CrossRef]
Kim, M.K.; Kim, Y.-S.; Srebric, J. Predictions of electricity consumption in a campus building using occupant rates and weather elements with sensitivity analysis: Artificial neural network vs. linear regression. Sustain. Cities Soc. 2020, 62, 102385. [Google Scholar] [CrossRef]
Piscitelli, M.S.; Giudice, R.; Capozzoli, A. A holistic time series-based energy benchmarking framework for applications in large stocks of buildings. Appl. Energy 2024, 357, 122550. [Google Scholar] [CrossRef]
Home|ashrae.org. Available online: https://www.ashrae.org/technical-resources/bookstore/standards-15-34 (accessed on 26 June 2025).
Milić, V.; Rohdin, P.; Moshfegh, B. Further development of the change-point model—Differentiating thermal power characteristics for a residential district in a cold climate. Energy Build. 2020, 231, 110639. [Google Scholar] [CrossRef]
Chung, W.; Yeung, I.M. Benchmarking by convex non-parametric least squares with application on the energy performance of office buildings. Appl. Energy 2017, 203, 454–462. [Google Scholar] [CrossRef]
Delnava, H.; Khosravi, A.; Assad, M.E.H. Metafrontier frameworks for estimating solar power efficiency in the United States using stochastic nonparametric envelopment of data (StoNED). Renew. Energy 2023, 213, 195–204. [Google Scholar] [CrossRef]
Koyuncuoğlu, M.U.; Yeşilyurt, M.E.; Akbaş-Yeşilyurt, F.; Şahin, E.; Elbi, M.D. A New Approach to Efficiency Measurement: Hybrid JAYA Algorithm and Data Envelopment Analysis. Expert Syst. Appl. 2024, 268, 126342. [Google Scholar] [CrossRef]
Arabmaldar, A.; Sahoo, B.K.; Ghiyasi, M. A generalized robust data envelopment analysis model based on directional distance function. Eur. J. Oper. Res. 2023, 311, 617–632. [Google Scholar] [CrossRef]
Favero, M.; Luparelli, A.; Carlucci, S. Analysis of subjective thermal comfort data: A statistical point of view. Energy Build. 2023, 281, 112755. [Google Scholar] [CrossRef]
Li, X.; Sun, W.; Qin, C.; Yan, Y.; Zhang, L.; Tu, J. Evaluation of supervised machine learning regression models for CFD-based surrogate modelling in indoor airflow field reconstruction. Build. Environ. 2024, 267, 112173. [Google Scholar] [CrossRef]
Du, K.-L.; Jiang, B.; Lu, J.; Hua, J.; Swamy, M.N.S. Exploring Kernel Machines and Support Vector Machines: Principles, Techniques, and Future Directions. Mathematics 2024, 12, 3935. [Google Scholar] [CrossRef]
Amasyali, K.; El-Gohary, N. Building Lighting Energy Consumption Prediction for Supporting Energy Data Analytics. Procedia Eng. 2016, 145, 511–517. [Google Scholar] [CrossRef]
Vrablecová, P.; Ezzeddine, A.B.; Rozinajová, V.; Šárik, S.; Sangaiah, A.K. Smart grid load forecasting using online support vector regression. Comput. Electr. Eng. 2018, 65, 102–117. [Google Scholar] [CrossRef]
Kumar, A.; Kumar, A.; Al Zohbi, G.; Salau, A.O.; Maitra, S.K. Prediction of building cooling load: An innovative design for GA-SVM and IG-SVM system for the estimation of hourly precision and computational fluctuation range. Adv. Build. Energy Res. 2024, 18, 668–695. [Google Scholar] [CrossRef]
Massana, J.; Pous, C.; Burgas, L.; Melendez, J.; Colomer, J. Short-term load forecasting in a non-residential building contrasting models and attributes. Energy Build. 2015, 92, 322–330. [Google Scholar] [CrossRef]
Shabunko, V.; Lim, C.; Brahim, S.; Mathew, S. Developing building benchmarking for Brunei Darussalam. Energy Build. 2014, 85, 79–85. [Google Scholar] [CrossRef]
Wan, X.; Cai, X.; Dai, L. Prediction of building HVAC energy consumption based on least squares support vector machines. Energy Inform. 2024, 7, 113. [Google Scholar] [CrossRef]
Moradzadeh, A.; Mansour-Saatloo, A.; Mohammadi-Ivatloo, B.; Anvari-Moghaddam, A. Performance Evaluation of Two Machine Learning Techniques in Heating and Cooling Loads Forecasting of Residential Buildings. Appl. Sci. 2020, 10, 3829. [Google Scholar] [CrossRef]
Prajapati, G.L.; Patle, A. on performing classification using svm with radial basis and polynomial kernel functions. In Proceedings of the 3rd International Conference on Emerging Trends in Engineering and Technology, ICETET 2010, Washington, DC, USA, 19–21 November 2010; pp. 512–515. [Google Scholar]
Jain, R.K.; Smith, K.M.; Culligan, P.J.; Taylor, J.E. Forecasting energy consumption of multi-family residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy. Appl. Energy 2014, 123, 168–178. [Google Scholar] [CrossRef]
Edwards, R.E.; New, J.; Parker, L.E. Predicting future hourly residential electrical consumption: A machine learning case study. Energy Build. 2012, 49, 591–603. [Google Scholar] [CrossRef]
Yao, G.; Chen, Y.; Han, C.; Duan, Z. Research on the Decision-Making Method for the Passive Design Parameters of Zero Energy Houses in Severe Cold Regions Based on Decision Trees. Energies 2024, 17, 506. [Google Scholar] [CrossRef]
Zhou, F.; Yang, C.; Wang, Z. Prediction of building energy consumption for public structures utilizing BIM-DB and RF-LSTM. Energy Rep. 2024, 12, 4631–4640. [Google Scholar] [CrossRef]
Wang, Z.; Wang, Y.; Zeng, R.; Srinivasan, R.S.; Ahrentzen, S. Random Forest based hourly building energy prediction. Energy Build. 2018, 171, 11–25. [Google Scholar] [CrossRef]
Lahouar, A.; Slama, J.B.H. Day-ahead load forecast using random forest and expert input selection. Energy Convers. Manag. 2015, 103, 1040–1051. [Google Scholar] [CrossRef]
Wang, Z.; Hong, T.; Piette, M.A. Building thermal load prediction through shallow machine learning and deep learning. Appl. Energy 2020, 263, 114683. [Google Scholar] [CrossRef]
Lu, H.; Cheng, F.; Ma, X.; Hu, G. Short-term prediction of building energy consumption employing an improved extreme gradient boosting model: A case study of an intake tower. Energy 2020, 203, 117756. [Google Scholar] [CrossRef]
Zhu, J.; Dong, H.; Zheng, W.; Li, S.; Huang, Y.; Xi, L. Review and prospect of data-driven techniques for load forecasting in integrated energy systems. Appl. Energy 2022, 321, 119269. [Google Scholar] [CrossRef]
McCulloch-Pitts Neuron—Mankind’s First Mathematical Model of a Biological Neuron|by Akshay L Chandra|TDS Archive|Medium. Available online: https://medium.com/data-science/mcculloch-pitts-model-5fdf65ac5dd1 (accessed on 27 June 2025).
Rosenblatt’s Perceptron, the First Modern Neural Network|by Jean-Christophe B. Loiseau|TDS Archive|Medium. Available online: https://medium.com/data-science/rosenblatts-perceptron-the-very-first-neural-network-37a3ec09038a (accessed on 27 June 2025).
El Alaoui, M.; Rougui, M. Examining the Application of Artificial Neural Networks (ANNs) for Advancing Energy Efficiency in Building: A Comprehensive Reviews. J. Sustain. Res. 2024, 6, e240001. [Google Scholar] [CrossRef]
Mena, R.; Rodríguez, F.; Castilla, M.; Arahal, M. A prediction model based on neural networks for the energy consumption of a bioclimatic building. Energy Build. 2014, 82, 142–155. [Google Scholar] [CrossRef]
Mihalakakou, G.; Santamouris, M.; Tsangrassoulis, A. On the energy consumption in residential buildings. Energy Build. 2002, 34, 727–736. [Google Scholar] [CrossRef]
González, P.A.; Zamarreño, J.M. Prediction of hourly energy consumption in buildings based on a feedback artificial neural network. Energy Build. 2005, 37, 595–601. [Google Scholar] [CrossRef]
Li, K.; Hu, C.; Liu, G.; Xue, W. Building’s electricity consumption prediction using optimized artificial neural networks and principal component analysis. Energy Build. 2015, 108, 106–113. [Google Scholar] [CrossRef]
Platon, R.; Dehkordi, V.R.; Martel, J. Hourly prediction of a building’s electricity consumption using case-based reasoning, artificial neural networks and principal component analysis. Energy Build. 2015, 92, 10–18. [Google Scholar] [CrossRef]
Hong, S.-M.; Paterson, G.; Mumovic, D.; Steadman, P. Improved benchmarking comparability for energy consumption in schools. Build. Res. Inf. 2014, 42, 47–61. [Google Scholar] [CrossRef]
Wong, S.; Wan, K.K.; Lam, T.N. Artificial neural networks for energy analysis of office buildings with daylighting. Appl. Energy 2010, 87, 551–557. [Google Scholar] [CrossRef]
Lundin, M.; Andersson, S.; Östin, R. Development and validation of a method aimed at estimating building performance parameters. Energy Build. 2004, 36, 905–914. [Google Scholar] [CrossRef]
Khayatian, F.; Sarto, L.; Dall’o’, G. Application of neural networks for evaluating energy performance certificates of residential buildings. Energy Build. 2016, 125, 45–54. [Google Scholar] [CrossRef]
Ascione, F.; Bianco, N.; De Stasio, C.; Mauro, G.M.; Vanoli, G.P. Artificial neural networks to predict energy performance and retrofit scenarios for any member of a building category: A novel approach. Energy 2017, 118, 999–1017. [Google Scholar] [CrossRef]
Dombaycı, Ö.A. The prediction of heating energy consumption in a model house by using artificial neural networks in Denizli–Turkey. Adv. Eng. Softw. 2010, 41, 141–147. [Google Scholar] [CrossRef]
Kialashaki, A.; Reisel, J.R. Modeling of the energy demand of the residential sector in the United States using regression models and artificial neural networks. Appl. Energy 2013, 108, 271–280. [Google Scholar] [CrossRef]
Antanasijević, D.; Pocajt, V.; Ristić, M.; Perić-Grujić, A. Modeling of energy consumption and related GHG (greenhouse gas) intensity and emissions in Europe using general regression neural networks. Energy 2015, 84, 816–824. [Google Scholar] [CrossRef]
Neto, A.H.; Fiorelli, F.A.S. Comparison between detailed model simulation and artificial neural network for forecasting building energy consumption. Energy Build. 2008, 40, 2169–2176. [Google Scholar] [CrossRef]
Popescu, D.; Ungureanu, F.; Hernández-Guerrero, A. Simulation models for the analysis of space heat consumption of buildings. Energy 2009, 34, 1447–1453. [Google Scholar] [CrossRef]
Deb, C.; Eang, L.S.; Yang, J.; Santamouris, M. Forecasting diurnal cooling energy load for institutional buildings using Artificial Neural Networks. Energy Build. 2016, 121, 284–297. [Google Scholar] [CrossRef]
Olofsson, T.; Andersson, S. Long-term energy demand predictions based on short-term measured data. Energy Build. 2001, 33, 85–91. [Google Scholar] [CrossRef]
Ekici, B.B.; Aksoy, U.T. Prediction of building energy consumption by using artificial neural networks. Adv. Eng. Softw. 2009, 40, 356–362. [Google Scholar] [CrossRef]
Paudel, S.; Elmtiri, M.; Kling, W.L.; Le Corre, O.; Lacarrière, B. Pseudo dynamic transitional modeling of building heating energy demand using artificial neural network. Energy Build. 2014, 70, 81–93. [Google Scholar] [CrossRef]
Ben-Nakhi, A.E.; Mahmoud, M.A. Cooling load prediction for buildings using general regression neural networks. Energy Convers. Manag. 2004, 45, 2127–2141. [Google Scholar] [CrossRef]
Hou, Z.; Lian, Z.; Yao, Y.; Yuan, X. Cooling-load prediction by the combination of rough set theory and an artificial neural-network based on data-fusion technique. Appl. Energy 2006, 83, 1033–1046. [Google Scholar] [CrossRef]
Cheng-Wen, Y.; Jian, Y. Application of ANN for the prediction of building energy consumption at different climate zones with HDD and CDD. In Proceedings of the 2010 2nd International Conference on Future Computer and Communication, ICFCC 2010, Wuhan, China, 21–24 May 2010; Volume 3. [Google Scholar]
Biswas, M.R.; Robinson, M.D.; Fumo, N. Prediction of residential building energy consumption: A neural network approach. Energy 2016, 117, 84–92. [Google Scholar] [CrossRef]
Aydinalp, M.; Ugursal, V.I.; Fung, A.S. Modeling of the appliance, lighting, and space-cooling energy consumptions in the residential sector using neural networks. Appl. Energy 2002, 71, 87–110. [Google Scholar] [CrossRef]
Gholami, R.; Nishant, R.; Emrouznejad, A. Modeling Residential Energy Consumption. J. Glob. Inf. Manag. 2021, 29, 166–193. [Google Scholar] [CrossRef]
Azadeh, A.; Ghaderi, S.; Sohrabkhani, S. Annual electricity consumption forecasting by neural network in high energy consuming industrial sectors. Energy Convers. Manag. 2008, 49, 2272–2278. [Google Scholar] [CrossRef]
Kialashaki, A.; Reisel, J.R. Development and validation of artificial neural network models of the energy demand in the industrial sector of the United States. Energy 2014, 76, 749–760. [Google Scholar] [CrossRef]
Xiaoqian, J.; Bing, D.; Le, X.; Latanya, S. Adaptive gaussian process for short-term wind speed forecasting. Front. Artif. Intell. Appl. 2010, 215, 661–666. [Google Scholar] [CrossRef]
Grosicki, E.; Abed-Meraim, K.; Hua, Y. A weighted linear prediction method for near-field source localization. IEEE Trans. Signal Process. 2005, 53, 3651–3660. [Google Scholar] [CrossRef]
Cheng, C.; Sa-Ngasoongsong, A.; Beyca, O.; Le, T.; Yang, H.; Kong, Z.; Bukkapatnam, S.T. Time series forecasting for nonlinear and non-stationary processes: A review and comparative study. IIE Trans. 2015, 47, 1053–1071. [Google Scholar] [CrossRef]
Heo, Y.; Choudhary, R.; Augenbroe, G. Calibration of building energy models for retrofit analysis under uncertainty. Energy Build. 2012, 47, 550–560. [Google Scholar] [CrossRef]
Heo, Y.; Zavala, V.M. Gaussian process modeling for measurement and verification of building energy savings. Energy Build. 2012, 53, 7–18. [Google Scholar] [CrossRef]
Zhang, Y.; O’neill, Z.; Wagner, T.; Augenbroe, G. An Inverse Model with Uncertainty Quantification to Estimate the Energy Performance of an Office Building. Available online: https://www.researchgate.net/publication/286062289_An_inverse_model_with_uncertainty_quantification_to_estimate_the_energy_performance_of_an_office_building (accessed on 28 June 2025).
Noh, H.Y.; Rajagopal, R. Data-driven forecasting algorithms for building energy consumption. SPIE 2013, 8692, 86920T. [Google Scholar] [CrossRef]
Rastogi, P.; Khan, M.E.; Andersen, M. Gaussian-Process-Based Emulators for Building Performance Simulation. In Proceedings of the 15th IBPSA Conference 2017, San Francisco, CA, USA, 7–9 August 2017; pp. 1701–1709. [Google Scholar]
Burkhart, M.C.; Heo, Y.; Zavala, V.M. Measurement and verification of building systems under uncertain data: A Gaussian process modeling approach. Energy Build. 2014, 75, 189–198. [Google Scholar] [CrossRef]
Srivastav, A.; Tewari, A.; Dong, B. Baseline building energy modeling and localized uncertainty quantification using Gaussian mixture models. Energy Build. 2013, 65, 438–447. [Google Scholar] [CrossRef]
Zhang, Y.; O’NEill, Z.; Dong, B.; Augenbroe, G. Comparisons of inverse modeling approaches for predicting building energy performance. Build. Environ. 2015, 86, 177–190. [Google Scholar] [CrossRef]
Gao, X.; Malkawi, A. A new methodology for building energy performance benchmarking: An approach based on intelligent clustering algorithm. Energy Build. 2014, 84, 607–616. [Google Scholar] [CrossRef]
Santamouris, M.; Mihalakakou, G.; Patargias, P.; Gaitani, N.; Sfakianaki, K.; Papaglastra, M.; Pavlou, C.; Doukas, P.; Primikiri, E.; Geros, V.; et al. Using intelligent clustering techniques to classify the energy performance of school buildings. Energy Build. 2007, 39, 45–51. [Google Scholar] [CrossRef]
Gaitani, N.; Lehmann, C.; Santamouris, M.; Mihalakakou, G.; Patargias, P. Using principal component and cluster analysis in the heating evaluation of the school building sector. Appl. Energy 2010, 87, 2079–2086. [Google Scholar] [CrossRef]
Pieri, S.P.; Tzouvadakis, I.; Santamouris, M. Identifying energy consumption patterns in the Attica hotel sector using cluster analysis techniques with the aim of reducing hotels’ CO₂ footprint. Energy Build. 2015, 94, 252–262. [Google Scholar] [CrossRef]
Petcharat, S.; Chungpaibulpatana, S.; Rakkwamsuk, P. Assessment of potential energy saving using cluster analysis: A case study of lighting systems in buildings. Energy Build. 2012, 52, 145–152. [Google Scholar] [CrossRef]
Yang, J.; Ning, C.; Deb, C.; Zhang, F.; Cheong, D.; Lee, S.E.; Sekhar, C.; Tham, K.W. k-Shape clustering algorithm for building energy usage patterns analysis and forecasting model accuracy improvement. Energy Build. 2017, 146, 27–37. [Google Scholar] [CrossRef]
Bengio, Y. Learning deep architectures for AI. Found. Trends Mach. Learn. 2009, 2, 1–127. [Google Scholar] [CrossRef]
Teso-Fz-Betoño, A.; Zulueta, E.; Cabezas-Olivenza, M.; Teso-Fz-Betoño, D.; Fernandez-Gamiz, U. A Study of Learning Issues in Feedforward Neural Networks. Mathematics 2022, 10, 3206. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; pp. 1–23. Available online: https://mitpress.mit.edu/9780262035613/deep-learning/ (accessed on 28 June 2025).
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Wang, Z.; Hong, T.; Piette, M.A. Data fusion in predicting internal heat gains for office buildings through a deep learning approach. Appl. Energy 2019, 240, 386–398. [Google Scholar] [CrossRef]
Sadaei, H.J.; Silva, P.C.d.L.e.; Guimarães, F.G.; Lee, M.H. Short-term load forecasting by using a combined method of convolutional neural networks and fuzzy time series. Energy 2019, 175, 365–377. [Google Scholar] [CrossRef]
Mienye, I.D.; Swart, T.G.; Obaido, G. Recurrent Neural Networks: A Comprehensive Review of Architectures, Variants, and Applications. Information 2024, 15, 517. [Google Scholar] [CrossRef]
Caron, C.; Lauret, P.; Bastide, A. Machine Learning to speed up Computational Fluid Dynamics engineering simulations for built environments: A review. Build. Environ. 2025, 267, 112229. [Google Scholar] [CrossRef]
Fan, C.; Sun, Y.; Zhao, Y.; Song, M.; Wang, J. Deep learning-based feature engineering methods for improved building energy prediction. Appl. Energy 2019, 240, 35–45. [Google Scholar] [CrossRef]
Rahman, A.; Srikumar, V.; Smith, A.D. Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks. Appl. Energy 2018, 212, 372–385. [Google Scholar] [CrossRef]
Zhou, C.; Fang, Z.; Xu, X.; Zhang, X.; Ding, Y.; Jiang, X.; Ji, Y. Using long short-term memory networks to predict energy consumption of air-conditioning systems. Sustain. Cities Soc. 2020, 55, 102000. [Google Scholar] [CrossRef]
Kim, J.; Moon, J.; Hwang, E.; Kang, P. Recurrent inception convolution neural network for multi short-term load forecasting. Energy Build. 2019, 194, 328–341. [Google Scholar] [CrossRef]
Sanjeev Kumar, T.M.; Kurian, C.P.; Varghese, S.G. Ensemble Learning Model-Based Test Workbench for the Optimization of Building Energy Performance and Occupant Comfort. IEEE Access 2020, 8, 96075–96087. [Google Scholar] [CrossRef]
Wang, Z.; Wang, Y.; Srinivasan, R.S. A novel ensemble learning approach to support building energy use prediction. Energy Build. 2018, 159, 109–122. [Google Scholar] [CrossRef]
Robinson, C.; Dilkina, B.; Hubbs, J.; Zhang, W.; Guhathakurta, S.; Brown, M.A.; Pendyala, R.M. Machine learning approaches for estimating commercial building energy consumption. Appl. Energy 2017, 208, 889–904. [Google Scholar] [CrossRef]
Walther, J.; Spanier, D.; Panten, N.; Abele, E. Very short-term load forecasting on factory level—A machine learning approach. Procedia CIRP 2019, 80, 705–710. [Google Scholar] [CrossRef]
Wang, C.; Yuan, J.; Zhang, J.; Deng, N.; Zhou, Z.; Gao, F. Multi-criteria comprehensive study on predictive algorithm of heating energy consumption of district heating station based on timeseries processing. Energy 2020, 202, 117714. [Google Scholar] [CrossRef]
Huang, Y.; Yuan, Y.; Chen, H.; Wang, J.; Guo, Y.; Ahmad, T. A novel energy demand prediction strategy for residential buildings based on ensemble learning. Energy Procedia 2019, 158, 3411–3416. [Google Scholar] [CrossRef]
Tsanas, A.; Xifara, A. Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build. 2012, 49, 560–567. [Google Scholar] [CrossRef]
Papadopoulos, S.; Azar, E.; Woon, W.-L.; Kontokosta, C.E. Evaluation of tree-based ensemble learning algorithms for building energy performance estimation. J. Build. Perform. Simul. 2018, 11, 322–332. [Google Scholar] [CrossRef]
Deng, H.; Fannon, D.; Eckelman, M.J. Predictive modeling for US commercial building energy use: A comparison of existing statistical and machine learning algorithms using CBECS microdata. Energy Build. 2018, 163, 34–43. [Google Scholar] [CrossRef]
Li, Z.; Han, Y.; Xu, P. Methods for benchmarking building energy consumption against its past or intended performance: An overview. Appl. Energy 2014, 124, 325–334. [Google Scholar] [CrossRef]
Barja-Martinez, S.; Aragüés-Peñalba, M.; Munné-Collado, Í.; Lloret-Gallego, P.; Bullich-Massagué, E.; Villafafila-Robles, R. Artificial intelligence techniques for enabling Big Data services in distribution networks: A review. Renew. Sustain. Energy Rev. 2021, 150, 111459. [Google Scholar] [CrossRef]
Shadi, M.R.; Mirshekali, H.; Shaker, H.R. Explainable artificial intelligence for energy systems maintenance: A review on concepts, current techniques, challenges, and prospects. Renew. Sustain. Energy Rev. 2025, 216, 115668. [Google Scholar] [CrossRef]
Natarajan, Y.; Sri Preethaa, K.R.; Wadhwa, G.; Choi, Y.; Chen, Z.; Lee, D.-E.; Mi, Y. Enhancing Building Energy Efficiency with IoT-Driven Hybrid Deep Learning Models for Accurate Energy Consumption Prediction. Sustainability 2024, 16, 1925. [Google Scholar] [CrossRef]
Daki, H.; Saad, B.; El Hannani, A.; Haidine, A.; Ouahmane, H. Forecasting Energy Consumption in Educational Buildings with Big Data Analytics. In ICT for Smart Grid-Recent Advances, New Perspectives, and Applications; IntechOpen: London, UK, 2024. [Google Scholar] [CrossRef]
Dong, B.; Li, Z.; Rahman, S.M.; Vega, R. A hybrid model approach for forecasting future residential electricity consumption. Energy Build. 2016, 117, 341–351. [Google Scholar] [CrossRef]
Tardioli, G.; Kerrigan, R.; Oates, M.; O’dOnnell, J.; Finn, D. Data Driven Approaches for Prediction of Building Energy Consumption at Urban Level. Energy Procedia 2015, 78, 3378–3383. [Google Scholar] [CrossRef]
Schweidtmann, A.M.; Zhang, D.; von Stosch, M. A review and perspective on hybrid modeling methodologies. Digit. Chem. Eng. 2024, 10, 100136. [Google Scholar] [CrossRef]
Althaus, P.; Redder, F.; Ubachukwu, E.; Mork, M.; Xhonneux, A.; Müller, D. Enhancing Building Monitoring and Control for District Energy Systems: Technology Selection and Installation within the Living Lab Energy Campus. Appl. Sci. 2022, 12, 3305. [Google Scholar] [CrossRef]
Chen, Y.; Guo, M.; Chen, Z.; Chen, Z.; Ji, Y. Physical energy and data-driven models in building energy prediction: A review. Energy Rep. 2022, 8, 2656–2671. [Google Scholar] [CrossRef]
Alnuwaiser, M.A.; Javed, M.F.; Khan, M.I.; Ahmed, M.W.; Galal, A.M. Support vector regression and ANN approach for predicting the ground water quality. J. Indian Chem. Soc. 2022, 99, 100538. [Google Scholar] [CrossRef]
Ghattas, B.; Manzon, D. Machine Learning Alternatives to Response Surface Models. Mathematics 2023, 11, 3406. [Google Scholar] [CrossRef]
Sarker, I.H. AI-Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems. SN Comput. Sci. 2022, 3, 158. [Google Scholar] [CrossRef] [PubMed]
Haq, I.U.; Kumar, A.; Rathore, P.S. Machine learning approaches for wind power forecasting: A comprehensive review. Discov. Appl. Sci. 2025, 7, 1139. [Google Scholar] [CrossRef]
Parsa, S.M. Physics-informed machine learning meets renewable energy systems: A review of advances, challenges, guidelines, and future outlooks. Appl. Energy 2025, 402, 126925. [Google Scholar] [CrossRef]
Ibrahim, A.; Zayed, T.; Lafhaj, Z. Enhancing Construction Performance: A Critical Review of Performance Measurement Practices at the Project Level. Buildings 2024, 14, 1988. [Google Scholar] [CrossRef]
Boutahri, Y.; Tilioua, A. Machine learning-based predictive model for thermal comfort and energy optimization in smart buildings. Results Eng. 2024, 22, 102148. [Google Scholar] [CrossRef]
Kato, T. Chapter 4—Prediction of photovoltaic power generation output and network operation. In Integration of Distributed Energy Resources in Power Systems; Funabashi, T., Ed.; Academic Press: Cambridge, MA, USA, 2016; pp. 77–108. [Google Scholar] [CrossRef]
Hosamo, H.; Hosamo, M.H.; Nielsen, H.K.; Svennevig, P.R.; Svidt, K. Digital Twin of HVAC system (HVACDT) for multiobjective optimization of energy consumption and thermal comfort based on BIM framework with ANN-MOGA. Adv. Build. Energy Res. 2023, 17, 125–171. [Google Scholar] [CrossRef]
Satan, A.; Zhakiyev, N.; Nugumanova, A.; Friedrich, D. Hybrid feature-based neural network regression method for load profiles forecasting. Energy Inform. 2025, 8, 19. [Google Scholar] [CrossRef]
Golafshani, E.; Chiniforush, A.A.; Zandifaez, P.; Ngo, T. An artificial intelligence framework for predicting operational energy consumption in office buildings. Energy Build. 2024, 317, 114409. [Google Scholar] [CrossRef]
Ball, R.; Rague, B. Machine Learning. In The Beginner’s Guide to Data Science; Springer: Berlin/Heidelberg, Germany, 2022; pp. 155–194. [Google Scholar] [CrossRef]
D’AMico, A.; Ciulla, G.; Traverso, M.; Brano, V.L.; Palumbo, E. Artificial Neural Networks to assess energy and environmental performance of buildings: An Italian case study. J. Clean. Prod. 2019, 239, 117993. [Google Scholar] [CrossRef]
Biswal, B.; Deb, S.; Datta, S.; Ustun, T.S.; Cali, U. Review on smart grid load forecasting for smart energy management using machine learning and deep learning techniques. Energy Rep. 2024, 12, 3654–3670. [Google Scholar] [CrossRef]
Walker, S.; Khan, W.; Katic, K.; Maassen, W.; Zeiler, W. Accuracy of different machine learning algorithms and added-value of predicting aggregated-level energy performance of commercial buildings. Energy Build. 2020, 209, 109705. [Google Scholar] [CrossRef]
Alazemi, T.; Darwish, M.; Radi, M. Renewable energy sources integration via machine learning modelling: A systematic literature review. Heliyon 2024, 10, e26088. [Google Scholar] [CrossRef]
Jia, X.; Xia, Y.; Yan, Z.; Gao, H.; Qiu, D.; Guerrero, J.M.; Li, Z. Coordinated operation of multi-energy microgrids considering green hydrogen and congestion management via a safe policy learning approach. Appl. Energy 2025, 401, 109705. [Google Scholar] [CrossRef]
Sayed, K.; Aref, M.; Almalki, M.M.; Mossa, M.A. Optimizing fast charging protocols for lithium-ion batteries using reinforcement learning: Balancing speed, efficiency, and longevity. Results Eng. 2025, 25, 104302. [Google Scholar] [CrossRef]
Gonzalo, F.D.A.; Santamaría, B.M.; Burgos, M.J.M. Assessment of Building Energy Simulation Tools to Predict Heating and Cooling Energy Consumption at Early Design Stages. Sustainability 2023, 15, 1920. [Google Scholar] [CrossRef]
GitHub—wwzjustin/CER-Smart-Meter-Project-by-Irish-Social-Science-Data-Archive.: CER Smart Meter Project by Irish Social Science Data Archive. Available online: https://github.com/wwzjustin/CER-Smart-Meter-Project-by-Irish-Social-Science-Data-Archive (accessed on 29 June 2025).
Part of Energy Data Set (REDD). Available online: https://www.kaggle.com/datasets/pawelkauf/redd-part (accessed on 29 June 2025).
Energy Information Administration (EIA)-Commercial Buildings Energy Consumption Survey (CBECS). Available online: https://www.eia.gov/consumption/commercial/ (accessed on 29 June 2025).
Hong, T.; Pinson, P.; Fan, S. Global energy forecasting competition 2012. Int. J. Forecast. 2014, 30, 357–363. [Google Scholar] [CrossRef]
Hong, T.; Pinson, P.; Fan, S.; Zareipour, H.; Troccoli, A.; Hyndman, R.J. Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond. Int. J. Forecast. 2016, 32, 896–913. [Google Scholar] [CrossRef]
Hong, T.; Xie, J.; Black, J. Global energy forecasting competition 2017: Hierarchical probabilistic load forecasting. Int. J. Forecast. 2019, 35, 1389–1399. [Google Scholar] [CrossRef]
Building Performance Database|Building Technology and Urban Systems. Available online: https://buildings.lbl.gov/cbs/bpd (accessed on 29 June 2025).
Miller, C.; Meggers, F. The Building Data Genome Project: An open, public data set from non-residential building electrical meters. Energy Procedia 2017, 122, 439–444. [Google Scholar] [CrossRef]
Miller, C.; Kathirgamanathan, A.; Picchetti, B.; Arjunan, P.; Park, J.Y.; Nagy, Z.; Raftery, P.; Hobson, B.W.; Shi, Z.; Meggers, F. The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition. Sci. Data 2020, 7, 368. [Google Scholar] [CrossRef]
Fan, C.; Chen, M.; Wang, X.; Wang, J.; Huang, B. A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery from Building Operational Data. Front. Energy Res. 2021, 9, 652801. [Google Scholar] [CrossRef]
Liu, H.; Liang, J.; Liu, Y.; Wu, H. A Review of Data-Driven Building Energy Prediction. Buildings 2023, 13, 532. [Google Scholar] [CrossRef]
Mantuano, C.; Omoyele, O.; Hoffmann, M.; Weinand, J.M.; Panella, M.; Stolten, D. Data imputation methods for intermittent renewable energy sources: Implications for energy system modeling. Energy Convers. Manag. 2025, 339, 119857. [Google Scholar] [CrossRef]
Kim, T.-Y.; Cho, S.-B. Predicting residential energy consumption using CNN-LSTM neural networks. Energy 2019, 182, 72–81. [Google Scholar] [CrossRef]
Duan, Z.; de Wilde, P.; Attia, S.; Zuo, J. Challenges in predicting the impact of climate change on thermal building performance through simulation: A systematic review. Appl. Energy 2025, 382, 125331. [Google Scholar] [CrossRef]
Xuemei, L.; Lixing, D.; Jinhu, L.; Gang, X.; Jibin, L. A Novel Hybrid Approach of KPCA and SVM for Building Cooling Load Prediction. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, WKDD 2010, Phuket, Thailand, 9–10 January 2010; pp. 522–526. [Google Scholar] [CrossRef]
Wang, C.; Song, J.; Shi, D.; Reyna, J.L.; Horsey, H.; Feron, S.; Zhou, Y.; Ouyang, Z.; Li, Y.; Jackson, R.B. Impacts of climate change, population growth, and power sector decarbonization on urban building energy use. Nat. Commun. 2023, 14, 6434. [Google Scholar] [CrossRef]
Fotopoulou, E.; Zafeiropoulos, A.; Terroso-Sáenz, F.; Şimşek, U.; González-Vidal, A.; Tsiolis, G.; Gouvas, P.; Liapis, P.; Fensel, A.; Skarmeta, A. Providing Personalized Energy Management and Awareness Services for Energy Efficiency in Smart Buildings. Sensors 2017, 17, 2054. [Google Scholar] [CrossRef] [PubMed]
Sheng, Y.; Arbabi, H.; Ward, W.O.; Mayfield, M. Learning from other cities: Transfer learning based multimodal residential energy prediction for cities with limited existing data. Energy Build. 2025, 338, 115723. [Google Scholar] [CrossRef]
Zhou, D.; Ma, S.; Hao, J.; Han, D.; Huang, D.; Yan, S.; Li, T. An electricity load forecasting model for Integrated Energy System based on BiGAN and transfer learning. Energy Rep. 2020, 6, 3446–3461. [Google Scholar] [CrossRef]
Fang, X.; Gong, G.; Li, G.; Chun, L.; Li, W.; Peng, P. A hybrid deep transfer learning strategy for short term cross-building energy prediction. Energy 2021, 215, 119208. [Google Scholar] [CrossRef]
Chaudhary, G.; Johra, H.; Georges, L.; Austbø, B. Transfer learning in building dynamics prediction. Energy Build. 2025, 330, 115384. [Google Scholar] [CrossRef]
Harputlugil, T.; de Wilde, P. The interaction between humans and buildings for energy efficiency: A critical review. Energy Res. Soc. Sci. 2021, 71, 101828. [Google Scholar] [CrossRef]
Parmeter, C.F.; Zelenyuk, V. Combining the virtues of stochastic frontier and data envelopment analysis. Oper. Res. 2019, 67, 1628–1658. [Google Scholar] [CrossRef]
Mystakidis, A.; Koukaras, P.; Tsalikidis, N.; Ioannidis, D.; Tjortjis, C. Energy Forecasting: A Comprehensive Review of Techniques and Technologies. Energies 2024, 17, 1662. [Google Scholar] [CrossRef]
Kim, H.; Dorjgochoo, S.; Park, H.; Lee, S. Personalized Federated Transfer Learning for Building Energy Forecasting via Model Ensemble with Multi-Level Masking in Heterogeneous Sensing Environment. Electronics 2025, 14, 1790. [Google Scholar] [CrossRef]
Li, D.; Qi, Z.; Zhou, Y.; Elchalakani, M. Machine Learning Applications in Building Energy Systems: Review and Prospects. Buildings 2025, 15, 648. [Google Scholar] [CrossRef]
Wang, M.; Zhou, J.; Liang, Y.; Yu, H.; Jing, R. Climate change impacts on city-scale building energy performance based on GIS-informed urban building energy modelling. Sustain. Cities Soc. 2025, 125, 106331. [Google Scholar] [CrossRef]
Li, Y.; Feng, H. Integrating urban building energy modeling (UBEM) and urban-building environmental impact assessment (UB-EIA) for sustainable urban development: A comprehensive review. Renew. Sustain. Energy Rev. 2025, 213, 115471. [Google Scholar] [CrossRef]
Daissaoui, A.; Boulmakoul, A.; Karim, L.; Lbath, A. IoT and Big Data Analytics for Smart Buildings: A Survey. Procedia Comput. Sci. 2020, 170, 161–168. [Google Scholar] [CrossRef]
Zhou, S.; Shah, A.; Leung, P.; Zhu, X.; Liao, Q. A comprehensive review of the applications of machine learning for HVAC. DeCarbon 2023, 2, 100023. [Google Scholar] [CrossRef]
Omar, A.; AlMaeeni, S.; Attia, H.; Takruri, M.; Altunaiji, A.; Sanduleanu, M.; Shubair, R.; Ashhab, M.S.; Al Ali, M.; Al Hebsi, G. Smart City: Recent Advances in Intelligent Street Lighting Systems Based on IoT. J. Sens. 2022, 2022, 5249187. [Google Scholar] [CrossRef]
Hakawati, B.; Mousa, A.; Draidi, F. Smart energy management in residential buildings: The impact of knowledge and behavior. Sci. Rep. 2024, 14, 1702. [Google Scholar] [CrossRef]
Li, C.; Wang, J.; Wang, S.; Zhang, Y. A review of IoT applications in healthcare. Neurocomputing 2024, 565, 127017. [Google Scholar] [CrossRef]
Aljohani, A. Deep learning-based optimization of energy utilization in IoT-enabled smart cities: A pathway to sustainable development. Energy Rep. 2024, 12, 2946–2957. [Google Scholar] [CrossRef]
Karuna, G.; Ediga, P.; Akshatha, S.; Anupama, P.; Sanjana, T.; Mittal, A.; Rajvanshi, S.; Habelalmateen, M.I. Smart energy management: Real-time prediction and optimization for IoT-enabled smart homes. Cogent Eng. 2024, 11, 2390674. [Google Scholar] [CrossRef]
Song, Y.; Xia, M.; Chen, Q.; Chen, F. A data-model fusion dispatch strategy for the building energy flexibility based on the digital twin. Appl. Energy 2023, 332, 120496. [Google Scholar] [CrossRef]
Clausen, A.; Arendt, K.; Johansen, A.; Sangogboye, F.C.; Kjærgaard, M.B.; Veje, C.T.; Jørgensen, B.N. A digital twin framework for improving energy efficiency and occupant comfort in public and commercial buildings. Energy Inform. 2021, 4, 40. [Google Scholar] [CrossRef]
Renganayagalu, S.K.; Bodal, T.; Bryntesen, T.-R.; Kvalvik, P. Optimising Energy Performance of buildings through Digital Twins and Machine Learning: Lessons learnt and future directions. In Proceedings of the 2024 4th International Conference on Applied Artificial Intelligence, ICAPAI 2024, Halden, Norway, 16 April 2024; pp. 1–6. [Google Scholar]
Li, W.; Xu, X. A hybrid evolutionary and machine learning approach for smart building: Sustainable building energy management design. Sustain. Energy Technol. Assess. 2024, 65, 103709. [Google Scholar] [CrossRef]
Henzel, J.; Wróbel, Ł.; Fice, M.; Sikora, M. Energy Consumption Forecasting for the Digital-Twin Model of the Building. Energies 2022, 15, 4318. [Google Scholar] [CrossRef]
Gnecco, V.M.; Vittori, F.; Pisello, A.L. Digital Twins for Decoding Human-Building Interaction in Multi-Domain Test-Rooms for Environmental Comfort and Energy Saving Via Graph Neural Networks. Energy Build. 2022, 279, 112652. [Google Scholar] [CrossRef]
González, V.G.; Ruiz, G.R.; Bandera, C.F. Empirical and Comparative Validation for a Building Energy Model Calibration Methodology. Sensors 2020, 20, 5003. [Google Scholar] [CrossRef] [PubMed]
Agouzoul, A.; Tabaa, M.; Chegari, B.; Simeu, E.; Dandache, A.; Alami, K. Towards a Digital Twin model for Building Energy Management: Case of Morocco. Procedia Comput. Sci. 2021, 184, 404–410. [Google Scholar] [CrossRef]
Jo, S.-K.; Park, D.-H.; Park, H.; Kwak, Y.; Kim, S.-H. Energy Planning of Pigsty Using Digital Twin. In Proceedings of the ICTC 2019—10th International Conference on ICT Convergence: ICT Convergence Leading the Autonomous Future, Jeju Island, Republic of Korea, 16–18 October 2019; pp. 723–725. [Google Scholar]
Santos-Herrero, J.M.; Lopez-Guede, J.M.; Abascal, I.F.; Zulueta, E. Energy and thermal modelling of an office building to develop an artificial neural networks model. Sci. Rep. 2022, 12, 8935. [Google Scholar] [CrossRef]
Huang, Y.; Niu, J.-L. Optimal building envelope design based on simulated performance: History, current status and new potentials. Energy Build. 2016, 117, 387–398. [Google Scholar] [CrossRef]

Figure 1. Research Methodology Flowchart.

Figure 2. Application of Data-driven Energy Efficiency in Buildings [50].

Figure 3. Illustration of the end-user energy use [53].

Figure 4. Illustration of energy performance assessment using simulation method for new designs [72].

Figure 5. Illustration of energy performance assessment using simulation for existing buildings [72].

Figure 6. Schematic of a decision tree.

Figure 10. Schematic of different ensemble methods (a) Bagging, (b) Boosting, (c) Stacking (ensemble learning framework base models M₁, …, M_n, and original dataset into a meta-model M*. Black dots denote base outputs, yellow denotes meta-model output [167].

Figure 11. Hybrid model illustration [183].

Figure 12. The Geographic and Temporal Distribution of Research Focus.

Figure 13. A conceptual architecture for a digital twin building energy management.

Table 1. A Systematic Framework for Building Energy Modeling Tasks.

Task	Objective	Key Metrics	Common Methods
Short-Term Load Forecasting	Predict energy use over a short period (e.g., hourly, daily)	Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE), Coefficient of Variation of RMSE (CV(RMSE))	ARIMA, LSTM, RNNs, SVR [23,24]
Energy Benchmarking	Compare a building’s energy performance against similar buildings or standards.	Energy Use Intensity (EUI), Energy Cost	Regression Analysis, Neural Networks, Cluster Analysis [25,26]
Simulation Calibration	Adjust a physics-based model to match real-world energy consumption data	Coefficient of Determination (R²), CV(RMSE)	Inverse Modeling, Optimization Algorithms (e.g., Genetic Algorithms) [27,28,29]
Post-Retrofit Evaluation	Assess the energy savings achieved after implementing efficiency measures	Savings, Energy Use Reduction	Baseline Models, Inverse Regression Models [30,31,32,33]

Table 2. Comparison of Deep Learning Architectures for Building Energy Forecasting.

Architecture	Core Strength	Typical Forecasting Horizon	Data Requirements	Key Limitations for Building Energy
CCN (1D)	Excellent at extracting local, translation-invariant patterns (e.g., daily load shapes)	Short-Term (Hourly, Daily)	Moderate	Struggles with long-term dependencies (e.g., the effect of a cold day several days prior). Primarily spatial, it requires careful framing for time series.
LSTM/GRU	Designed to capture long-term temporal dependencies and sequential patterns.	Short to Medium-Term (Hourly to Weekly)	High	Computationally intensive to train. It can be slow for very long sequences. May forget irrelevant historical information.
Transformer	Superior at modeling extremely long-range dependencies via self-attention mechanisms. All parts of the sequence are related directly.	All horizons excel in the long term	Very High	High computational and memory complexity; requires massive amounts of data to train effectively from scratch; prone to overfitting on small building datasets.
Hybrid (e.g., CNN-LSTM)	CNN extracts features from sub-sequences; LSTM models the temporal evolution of these features.	Short to Medium-Term	High	Complex model architecture, requiring careful tuning. Combines the computational demands of both components.

Table 3. A Summary of the Hybrid and Multi-category Modeling Approaches.

Study Reference	Combined Categories	Hybrid Approach Description	Reported Advantage
[110]	Ensemble + DL	RF used for feature selection, LSTM for sequence prediction	Higher accuracy than individual models
[32]	Data-Driven + Physics-Based	Physics laws embedded as constraints in neural network training	Improved generalizability, reduced data needs
[110]	Signal processing + ensemble ML	CEEMDAN for decomposition, XGBoost for regression	Enhanced prediction stability and accuracy
[179]	Multiple ML (stacking)	Predictions from 4 base models into a meta-learner (LR)	Superior accuracy and robustness

Table 4. A Summary of the Regional Analysis.

Region	Number of Studies (Sample)	Primary Modeling Focus	Remarks
Americas	~35%	Advanced ML, Simulation Calibration	Strong policy drivers, high data availability
Europe	~30%
		Statistical Benchmarking, DL, Hybrid Models	Driven by EPBD, focus on existing building stock.
Asia	~25%	SVM, ANN, Ensemble methods	Rapid urbanisation, growing smart building focus
Africa	~10%	Classical ML, statistical models	Emerging field, challenges with data scarcity

Table 5. Comparative Performance Analysis.

References	Model Category	Example Models	MAE	R²	RMSE	CV	Scalability	Interpretability
[76,77]	Statistical Models	Linear Regression, ARIMA	8–15%	0.70–0.85	10–20%	0.12–0.25	Moderate	High
[88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149]	Machine Learning	SVM, Random Forest, XGBoost	5–12%	0.80–0.95	8–15%	0.08–0.20	High	Moderate
[150,151,152,153,154,155,156,157,158,159,160,161]	Deep Learning	LSTM, CNN, Autoencoders	3–8%	0.90–0.98	5–10%	0.05–0.15	High	Low
[162,163,164,165,166,167,168,169,170,171,172,173,174]	Ensemble Methods	Bagging, Boosting, Stacking	4–10%	0.85–0.97	6–12%	0.07–0.18	High	Moderate to Low
[175,176,177,178,179,180,181,182,183,184,185]	Hybrid Models	GP + ANN, LSTM + SVM	2–6%	0.92–0.99	4–8%	0.04–0.12	Moderate	Low

Table 6. Performance Benchmarks from Contemporary Energy System Models.

Study and Focus	Core Methodology	Application Context	Key Performance Metrics
Unit Commitment with Renewables (Alazemi et al., 2024) [201]	An “add-on tailor” model for prediction error correction, using advanced regression or ML.	Improving day-ahead unit commitment by enhancing the accuracy of renewable generation and reverse predictions.	RMSE Reduction: 15–30% in wind/solar forecasting vs. baseline models. Economic Impact: 2–5% reduction in total operational costs for the grid
Multi-Energy Microgrid Operation (Jia et al., 2025) [202]	Safe policy learning (safe reinforcement learning)	Coordinating electricity, heat, and green hydrogen storage in microgrids while managing network congestion.	Achieved ~12% lower operating costs compared to rule-based strategies. Zero constraint violations during testing, ensuring system reliability.
Fast Charging for Batteries (Sayed et al., 2025) [203]	Random Forest-enhanced electro-thermal-degradation model.	Optimizing electric vehicle charging protocols to balance speed, battery temperature, and long-term health.	Reduced by 25% compared to standard constant-current protocols. Limited capacity fade to <2% over 1000 simulated cycles, significantly better than conventional fast-charging.

Table 7. Comparison Of Principal Building Energy Performance Evaluation Methodologies.

Reference	Method	Data Sources	Accuracy	Deployment	Drawbacks
[54]	Engineering Calculations	Simplified building information	Variable	Design. End-use evaluations. Highly flexible.	Limited accuracy.
[60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204]	Simulations	Detailed building information	High	Design. Compliance. Complex buildings. Cases where high accuracy is necessary.	Dependent on user skill and significant data collection.
[76,77]	Statistical	Dataset of existing buildings	Average	Benchmarking systems. Simple evaluations.	Dependent on statistical data. Limited accuracy.
[88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185]	Machine Learning	Large dataset	Average to high	Buildings with highly detailed data collection. Complex problems with many parameters.	Model construction is complicated. Do not consider direct physical characteristics.

Table 8. Publicly Accessible Databases on Building Energy Consumption.

Dataset	Dataset Source	Description	Building Type	Survey Content	Spatial Scale
CER Smart Metering Project [205]	Irish Social Science Data Archive	Half-hourly meter data from 5000 Irish homes and small businesses is utilized in the Smart Electricity Metre Customer Behavior Test Project (CBT) to evaluate the effect on consumer electricity consumption.	Residential and Small/Medium Enterprises	-	Smart meter data
Energy Disaggregation Data Set (REDD) [206]	Massachusetts Institute of Technology	Contains consumption data for six families spanning eighteen days in the spring of 2011 and is used for energy disaggregation research, which identifies the contributions of individual appliances from a composite electrical signal.	Residential Buildings		Electricity meter, Appliance
Commercial Buildings Energy Consumption Survey (CBECS) [207]	U.S. Department of Energy and Environmental Protection Agency	Approximately 6 million commercial buildings have been assessed, according to the most recent study conducted in 2018	Commercial Buildings	Building information, equipment information	Regional, Single-building
National Non-Domestic Building Stock (NDBS) [68]	UK Department for Environment and Transport	Creates a comprehensive profile of the non-residential buildings in England and Wales by integrating real data at the building level, including information on energy use and efficiency, the geometry, and materials of each non-residential building	Non-residential buildings, including industrial buildings	Building physical characteristics, building geometric dimensions, and the main equipment overview	Single-building
Global Energy Forecasting Competition [208,209,210]	Dr. Tao Hong’s team	GEFCOM, which focused on short-term load forecasting using hourly load and outside temperature data, was held in 2012, 2014, and 2017	Distribution Area	Hourly meteorological data (temperature) and load data, holiday information	Distribution area
Building Performance Database (BPD) [211]	Lawrence Berkeley National Lab	The extensive dataset of energy-related information for US residential and business structures	Building energy consumption; Regional energy consumption	building type, location, and physical characteristics	Regional, Single-building
The Building Data Genome Project [212,213]	Clayton Miller et al.	Data on energy use for buildings serving a range of purposes, including households, workplaces, schools, and healthcare institutions, in nations like the USA and the UK	Building energy consumption	The whole building’s electricity meter data	Single-building

Table 9. Documented Performance Gains from Transfer Learning and Data Augmentation.

Technique	Application Details	Reported Performance Gain
Cross-Building Transfer Learning [224]	An LSTM model pre-trained on a source office building (1 year of data) was fine-tuned on a target building with only two weeks of data.	20% lower MAPE compared to a model trained from scratch only on the two weeks of target data.
Geographical Transfer with Adjustment [225]	Forecasting energy use for a school in Canada by transferring knowledge from schools in different regions, with seasonal and trend adjustments.	11.2% improvements in prediction accuracy (R²) over a model using only one month of the target school’s data.
Data Augmentation with GANs [223]	Used a Bidirectional GAN (BiGAN) to generate synthetic building load data to augment a small real dataset for model training.	Achieved comparable accuracy to models trained with 80% more real data, effectively mitigating data scarcity.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Phiri, L.; Olwal, T.O.; Mathonsi, T.E. A Comprehensive Review of Data-Driven and Physics-Based Models for Energy Performance in Non-Domestic Buildings. Energies 2025, 18, 6481. https://doi.org/10.3390/en18246481

AMA Style

Phiri L, Olwal TO, Mathonsi TE. A Comprehensive Review of Data-Driven and Physics-Based Models for Energy Performance in Non-Domestic Buildings. Energies. 2025; 18(24):6481. https://doi.org/10.3390/en18246481

Chicago/Turabian Style

Phiri, Lukumba, Thomas O. Olwal, and Topside E. Mathonsi. 2025. "A Comprehensive Review of Data-Driven and Physics-Based Models for Energy Performance in Non-Domestic Buildings" Energies 18, no. 24: 6481. https://doi.org/10.3390/en18246481

APA Style

Phiri, L., Olwal, T. O., & Mathonsi, T. E. (2025). A Comprehensive Review of Data-Driven and Physics-Based Models for Energy Performance in Non-Domestic Buildings. Energies, 18(24), 6481. https://doi.org/10.3390/en18246481

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comprehensive Review of Data-Driven and Physics-Based Models for Energy Performance in Non-Domestic Buildings

Abstract

1. Introduction

2. Survey of Papers Related to Data-Driven and Physics-Based Models for Energy Performance in Non-Domestic Buildings

2.1. Barriers and Enabling Mechanisms for Improving Energy Performance in Non-Domestic Buildings

2.2. Building Energy Performance Assessment

2.2.1. Physics-Based Engineering Calculations

2.2.2. Simulation Method for Energy Performance in Non-Domestic Buildings

2.2.3. Statistical Models for Energy Performance in Non-Domestic Buildings

2.2.4. Machine Learning for Energy Performance in Non-Domestic Buildings

2.2.5. Evaluation Metrics

2.2.6. Comparative Performance Analysis

2.2.7. Quantitative Benchmarks from Contemporary Research

2.2.8. Conclusion for Energy Performance in Non-Domestic Buildings Using ML

3. Methodologies for Data Preparation

3.1. Current State of Building Energy Consumption Data

3.2. Data Preprocessing Methods

3.3. Data Fusion Methods

3.4. Transfer Learning

4. Barriers, Challenges, and Lessons Learned

4.1. Practical Implementation Barriers

4.2. Policy and Economic Barriers

5. Current Trends and Open Research Areas on Data-Driven Energy Performance Models

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI