Next Article in Journal
Design and Validation of an Edge-AI Fire Safety System with SmartThings Integration for Accelerated Detection and Targeted Suppression
Previous Article in Journal
The Study of Tribological Characteristics of YSZ/NiCrAlY Coatings and Their Resistance to CMAS at High Temperatures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Data-Driven Model for the Energy and Economic Assessment of Building Renovations

Department of Astronautical, Electrical and Energy Engineering (DIAEE), Sapienza University of Rome, 00184 Roma, Italy
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(14), 8117; https://doi.org/10.3390/app15148117
Submission received: 27 May 2025 / Revised: 16 July 2025 / Accepted: 18 July 2025 / Published: 21 July 2025
(This article belongs to the Special Issue Advances in Building Energy Efficiency and Design)

Abstract

The architectural, engineering, construction, and operation (AECO) sector is one of the main contributors to energy consumption and greenhouse gas emissions in Europe, making the renovation of the existing building stock a priority. However, defining effective and economically sustainable interventions remains a challenge, partly due to the variability of building characteristics and the lack of digital tools to support data-driven decision making. This research aims to identify the main factors influencing the energy consumption of buildings by analyzing a large database of building characteristics using machine learning algorithms. Based on the parameters that the analysis shows to have the greatest impact, the average cost of energy retrofitting measures will be used to elaborate a cost–benefit analysis model and the economic payback time for each measure, individually or in combination. The expected result is the creation of a tool that will allow the operator to evaluate the choice of interventions based on the energy efficiency that can be achieved and/or the economic sustainability. The proposed methodology aims to provide a digital approach that is replicable and adaptable to different territorial realities and useful for strategic planning of energy transformation in the building sector.

1. Introduction

In recent decades, the growing awareness of the environmental impact of human activities has made the transition to sustainable energy one of the world’s main challenges. Recent energy and environmental crises have emphasized the importance of accelerating the shift towards a sustainable development model that focuses on reducing greenhouse gas emissions, making efficient use of resources, and integrating renewable energy sources. Against this backdrop, the European Union has set ambitious targets: reducing net emissions by 55% by 2030 compared to 1990 levels and achieving climate neutrality by 2050, as set out in the European Green Deal [1,2]. Within this strategic framework, the built environment sector, particularly the architecture, engineering, construction, and operation sector (AECO), plays a central role. According to data from the European Commission, buildings account for around 40% of energy consumption and 36% of CO2 emissions in Europe [3]. A significant proportion of this consumption is linked to residential buildings and their daily use, as well as to the design, construction, and management phases throughout a building’s entire life cycle. Reducing the environmental impact of buildings is therefore crucial for achieving climate targets and for the systemic transformation of the European energy sector. One of the most critical issues in this context is the poor energy efficiency of existing buildings, many of which were constructed before modern energy regulations were adopted. Most existing buildings are energy inefficient, often lacking efficient systems and being built with materials that are not thermally efficient [4]. The built heritage is extremely heterogeneous in terms of construction period, building techniques, intended use, climatic conditions, and energy requirements, making it difficult to adopt standardized retrofitting strategies that are truly effective [5]. Taking action on the built heritage through energy renovation measures is not only an obligation under environmental regulations but also an opportunity to reduce energy poverty, increase indoor comfort, enhance property values, and create skilled jobs in the construction sector [6]. However, the effective implementation of these measures on a large scale is hindered by several obstacles, including the fragmentation of available information, the uncertainty surrounding the economic costs and benefits of potential solutions, and the absence of digital decision-making tools capable of integrating available data and facilitating informed choices.

1.1. State of the Art

The increasing accessibility of energy data and the advancement of digital technologies have led to the creation of a variety of tools and models for analyzing the energy performance of buildings. Methods based on building energy modeling (BEM), dynamic simulations, and decision support systems have been widely adopted for estimating the benefits of retrofit interventions [7,8]. Meanwhile, the introduction of artificial intelligence and machine learning (ML) techniques has opened up new possibilities for automatically processing large amounts of data and constructing predictive models that are more flexible and adaptable than traditional deterministic approaches [9]. Despite these advances, the transition from experimental or specific tools to truly operational and scalable models remains limited. Many applications focus on specific case studies with ad hoc data that is difficult to replicate on a territorial scale [10,11]. Almazam et al. propose an advanced method combining radial basis function neural networks (RBFNNs) and model predictive control (MPC) to enhance the energy efficiency of HVAC systems in sustainable buildings. When applied to residential and commercial buildings in Saudi Arabia, the model uses new input and output parameters to optimize performance and achieves a 15% reduction in energy consumption compared to conventional controls. The predictions proved accurate, with an error margin of between 0.2% and 2.5%, and the system ensured more precise control of thermal dynamics and variable loads [12]. S. Mazzetto evaluates the effectiveness of six ML models for predictive maintenance in building systems using a high-resolution dataset collected in six offices at Aalborg University (Denmark). Of the analyzed models, XGBoost demonstrated the greatest effectiveness (95% accuracy and an F1-score of 0.93), successfully predicting over 100,000 historical failures in HVAC, lighting, and occupancy systems. Notably, it identified critical issues such as lights being left on in unoccupied rooms and open shutters in empty spaces. These results highlight the potential of XGBoost for real-time fault detection and energy optimization in building management systems [13]. Shan et al. developed an integrated artificial-intelligence-based framework to support multicriteria decision making in the energy retrofitting of existing buildings, combining ML surrogate models and evolutionary generative algorithms. Among the four ML models analyzed, LightGBM demonstrated the greatest accuracy and efficiency. By applying a heuristic algorithm and an entropy-weighted method, the system achieved average energy savings of 56.62%, a 51.60% reduction in emissions, and a 24.27% decrease in life cycle costs. The framework demonstrates the effectiveness of integrating AI and evolutionary optimization for sustainable and cost-effective energy retrofit strategies [14]. The study explores the prediction of energy consumption in commercial office buildings using an LSTM model that integrates time series electricity consumption data and occupancy information. Using real data collected at one-minute intervals from a building in Houston, USA, the model predicted energy consumption at different timescales (hourly, daily, and quarterly) for individual devices and the entire building. Evaluating the results using MAPE, MAE, and RMSE shows superior performance, highlighting the importance of occupancy in predictive models for improving energy efficiency by up to 20% [15]. Di Stefano et al. propose a three-phase framework for integrating ML into the architectural design process at the neighborhood scale to optimize energy efficiency from the outset. The focus is on Phase 1, where the CatBoost model is used to predict energy consumption with minimal inputs, which are typical of preliminary design. When applied to a New York case study, the model achieved high accuracy (R2 = 0.88), confirming the validity of the selected key parameters. The results showed limited discrepancies compared to a traditional physical model (−8.69% to +11.04%), highlighting the potential of ML in supporting sustainable decision making from the earliest design stages [16]. Baset et al. conducted an in-depth review of the integration of data-driven methods in energy retrofit strategies, highlighting the emerging role of explainable AI (XAI) in improving transparency and stakeholder adoption. The work emphasizes the importance of AI techniques for managing incomplete data, optimizing energy performance, and predicting the effects of interventions more accurately, while also suggesting future directions such as the integration of future climate scenarios and life cycle impact assessment [17]. Imani et al. proposed a hybrid methodology, combining physical models and data-driven approaches, to evaluate energy retrofit solutions in different climates in the United Kingdom. The system is based on an automated dataset validated by IoT sensors and used to train ML models. The results highlight the effectiveness of air source heat pumps in cold–humid climates and WSHPs in harsher areas, suggesting climate-specific strategies to optimize energy efficiency and comfort [18]. Tavano et al. introduced the concept of the retrofit optimization problem (ROP), based on the multidimensional knapsack problem, to analyze the socio-economic sustainability of retrofit interventions in the Italian context. The model highlights strong inequalities: many energy-efficient strategies are inaccessible to low-income households, raising issues of equity in the energy transition and highlighting the need for targeted and income-sensitive policies [19]. Gholamzadehmir et al. analyzed the impact of energy retrofits on the market value of multifamily residential buildings in Europe, integrating the cost approach with the contingent valuation method (CVM). The results show that refurbished buildings can achieve a market premium of 13.5%, demonstrating a direct correlation between energy efficiency and perceived property attractiveness [20].
However, most existing models primarily focus on energy aspects, neglecting an integrated assessment of intervention costs, return on investment, and the economic sustainability of different retrofit strategies. This limits their effectiveness as tools for supporting complex decisions, which necessarily require a balance of environmental requirements, technical constraints, and financial resources. As noted by Berti et al. [21], data-driven tools such as digital twins can support sustainability by enabling informed, real-time decisions that integrate environmental, technical, and economic dimensions. Another limitation is the poor interoperability of data: information on building characteristics, actual energy consumption, intervention costs, and available incentives is often fragmented, incomplete, or difficult to compare across different contexts. This fragmentation hinders the development of generalizable, large-scale, predictive tools that can provide effective guidance even in the absence of detailed information on individual buildings. The lack of open, standardized formats for data collection, processing, and exchange is another obstacle [22]. Adopting open data and interoperable formats (e.g., based on shared ontologies or semantic protocols) would enable integration between different information sources, improve the reliability of analyses, and facilitate the replicability of models in heterogeneous territorial contexts. The literature review reveals a variety of approaches, some of which diverge, to identifying the most relevant parameters for defining redevelopment strategies. Some studies highlight the importance of geometric and morphological building characteristics, while others focus on plant engineering or behavioral aspects [23,24]. This lack of consensus reflects the complexity of the phenomenon and the need to develop methodological approaches based on statistical evidence derived from large, representative samples.

1.2. Research Objective

Considering the critical issues that have emerged in the literature and the operational needs of the AECO sector, this study aims to develop an integrated, data-driven digital approach for the integrated assessment of energy renovation interventions in the built environment. In particular, the main objective of the research is to build a predictive and decision-making model based on the analysis of real data and aimed at supporting informed choices in terms of energy and economics. The methodological approach is divided into several phases. In the first phase, it is intended to analyze a large database of buildings, including construction, technological, climatic, and functional variables, in order to identify, through the use of ML algorithms, the parameters that have the most significant influence on the energy consumption of buildings [25]. The use of advanced ML techniques makes it possible to overcome the limitations of deterministic approaches and to capture complex and non-linear relationships between building characteristics and observed energy performance. Unlike many existing analyses, which focus on small datasets or individual case studies, this work aims to build a generalizable and adaptable model capable of returning reliable results even in different territorial contexts and with varying levels of detail. In this perspective, the adoption of open and interoperable data formats is a key element in ensuring the scalability and replicability of the methodology.
The second phase of the research consists of associating the main energy retrofit measures, such as thermal insulation, replacement of windows and doors, and upgrading of heating systems, with an estimate of the average unit costs, obtained through the processing of technical databases, market reports, and documented case studies [26,27]. By combining these costs with the energy benefits estimated in the previous phase, it will be possible to build a cost–benefit analysis model that provides summary indicators of economic viability, including the payback period and return on investment (ROI) [28].
The main hypothesis guiding the research is that the joint analysis of large-scale building data, combined with predictive ML techniques and economic cost/benefit modeling, can generate a practical and effective tool to guide decisions on renovation interventions. This tool would enable public and private operators, designers, municipal technicians, investors, and policy makers to objectively and transparently assess the effectiveness of different retrofit strategies in relation to the specific context and the energy and economic objectives set. The expected output is therefore the definition of an operational methodology, based on flexible interpretative models and accessible data, which can be integrated into spatial planning processes and public policy support systems for ecological transition. Looking ahead, the scientific contribution aims to bridge the gap between the theoretical potential of digital tools and their actual application on an urban and territorial scale, promoting a culture of data-driven urban design and transformation.

2. Materials and Methods

To systematically address the challenges associated with the energy transition of existing buildings, it is crucial to adopt advanced methodological approaches. The construction sector, particularly the residential sector, is characterized by a high degree of heterogeneity in terms of geometry, construction technologies, ages of buildings, and operating conditions. This makes it particularly difficult to compare energy performance and identify effective, sustainable retrofit strategies. In this context, a data-driven approach that integrates artificial intelligence techniques, statistical modeling, and economic analysis is a powerful and innovative method for objectively and reproducibly addressing the complexity of the issue.
ML techniques are selected for their ability to identify complex, non-linear relationships within multidimensional, potentially incomplete datasets. These techniques overcome the limitations of traditional deterministic approaches, which are often constrained by simplifying assumptions or limited generalizability. Two main tools were adopted in particular: a supervised regression algorithm (the random forest regressor), which was used to estimate the relative importance of building variables with respect to annual energy consumption; and an unsupervised learning algorithm (k-means clustering), which was aimed at classifying buildings into homogeneous groups based on similar characteristics and consumption profiles.
As an ensemble model, the random forest is based on the aggregation of multiple decision trees built on different subsamples of the dataset. It is distinguished by its high predictive accuracy, robustness to noise in the data, and ability to handle non-linear relationships without requiring particular assumptions about the distribution of variables. It is therefore particularly well-suited to analyzing complex phenomena such as energy in the building sector, where interactions between architectural, plant, and behavioral variables are difficult to model using conventional analytical approaches.
Meanwhile, k-means clustering was employed to categorize the building sample into distinct energy digital representative buildings by minimizing intra-cluster distance and maximizing group separability. This process identified recurring building types characterized by homogeneous energy behavior, providing an empirical basis for developing targeted intervention strategies adaptable to specific contexts. The quality of the clustering was validated by analyzing the silhouette coefficient to ensure the internal consistency and significance of the identified groups.
Finally, to ensure an integrated approach, the results of the predictive and classification analyses were combined with a multi-intervention economic assessment to estimate the financial viability of the main retrofit strategies. This assessment was based on real technical and economic data obtained from official regional sources and classic economic indicators were used: net present value (NPV), ROI, and payback period. Thus, the proposed approach enables objective, replicable comparisons to be made between the energy and economic benefits of the proposed measures, providing an operational tool for planning interventions at urban or territorial scales.
This study employed a comprehensive data-driven methodology integrating algorithms, statistical modeling, and economic analysis to investigate energy consumption patterns in residential buildings and evaluate the cost-effectiveness of retrofit interventions. The primary aim is to identify key factors influencing energy usage, cluster buildings based on their consumption profiles, and assess economic sustainability for different retrofit strategies.
All datasets, codes, and protocols used are available in a Github repository “https://github.com/zahraziran/Retrofit/blob/main/README.md (accessed on 10 July 2025)”.

2.1. Data Sources and Preprocessing

The data used in this study are extracted from a comprehensive dataset of residential buildings in 2022 and 2023, consisting of detailed building characteristics, energy consumption indicators, such as total, heating, cooling, and domestic hot water (DHW), and temporal consumption data. Additionally, economic data related to retrofit interventions are collected from the Lazio region price list [29], which included average costs and estimated savings of various renovation measures. Initial preprocessing included the cleaning of missing values: numerical fields are filled with column means, while categorical fields are imputed using the mode. Outlier analysis is conducted and data normalization is applied where required, particularly for cost and return computations. Monthly consumption trends were averaged and derived variables such as surface-to-volume ratio and percentage of glass surface were calculated. This pie chart shows the average distribution of energy consumption components across buildings (Figure 1):
-
Heating dominates, accounting for approximately 60–70% of total energy use.
-
DHW and cooling contribute smaller shares.
This insight reinforces that retrofit strategies targeting heating systems and insulation will likely have the greatest impact on reducing energy consumption.

2.2. Correlation Analysis of Building Characteristics and Energy Consumption

To better understand the interplay between building design parameters and energy performance, a normalized correlation matrix was constructed (Figure 2), where all values were scaled between 0 and 1 to reflect relative association strength. The results highlight that the wall heat transfer coefficient and total occupancy exhibit the highest normalized associations with total energy consumption.
Glass surface percentage and surface-to-volume ratio show moderate correlation levels. Variables such as dwelling size, floor count, and surface area display comparatively lower associations. These findings confirm the dominant role of envelope characteristics in shaping energy performance, supporting both the clustering and feature importance results.

3. Results

This section presents the complete workflow and outcomes of the proposed data-driven methodology aimed at assessing energy consumption patterns and evaluating the economic viability of retrofit interventions in residential buildings. The analysis begins with a feature importance study using a random forest model to identify the most influential building characteristics affecting energy use. Based on these insights, buildings are grouped into digital representative buildings through clustering analysis, revealing distinct consumption profiles that inform targeted intervention strategies [30]. A comprehensive economic assessment is then conducted for each building cluster, quantifying the financial performance of multiple retrofit measures using key indicators such as NPV, ROI, and payback period. The final results provide a clear, evidence-based foundation for prioritizing retrofit strategies tailored to different building typologies, enabling stakeholders to maximize both energy efficiency and economic return.

3.1. Feature Importance Analysis by Random Forest

Understanding which building features most significantly influence energy consumption is fundamental to developing effective retrofit strategies and making informed policy decisions. To address this, a random forest regressor model has been employed to analyze the predictive contribution of architectural and occupancy-related characteristics to the total annual energy consumption (measured in kWh/m2/year). In scenarios where datasets are limited or not large enough to support the training of complex models, such as deep learning (DL) architectures or ensemble methods, simple statistical models have proven effective for decision making on energy consumption [29]. The analysis was conducted across a combined dataset of residential buildings from the years 2022 and 2023 (refer to the Github of Energy4Rome “https://github.com/zahraziran/Energy4Rome (accessed on 10 July 2025)”. Key features included the number of floors, number of dwellings, average dwelling size, total occupancy, surface area, wall and glass heat transfer coefficients (U-values), percentage of glass surface, floor spacing, and surface-to-volume ratio. These variables were selected for their direct relevance to thermal performance and spatial efficiency, all of which are known to affect a building’s energy profile.
To enhance the robustness and transparency of the ML approach, we performed systematic hyperparameter tuning of the random forest model using a grid search method. Key parameters—such as the number of estimators (100, 200, 500), maximum tree depth (10, 20, 30, none), minimum samples per split (2, 5, 10), and feature selection criteria (“sqrt”, “log2”)—were optimized through 5-fold cross-validation on the training set. Model performance was assessed using R2, mean absolute error (MAE), and root mean squared error (RMSE). The final model achieved an R2 of 0.91, an MAE of 4.23 kWh/m2/year, and an RMSE of 6.71 kWh/m2/year on the test data, indicating strong predictive accuracy.
To ensure methodological rigor and evaluate the robustness of our predictive framework, the random forest model has been benchmarked against three alternative ML algorithms commonly used in energy modeling: gradient boosting (XGBoost), support vector regression (SVR), and a multilayer perceptron (MLP). Among these, XGBoost achieved slightly higher predictive performance (R2 = 0.93, MAE = 4.12), but its implementation required significantly more computational resources, complex parameter tuning, and lacked native interpretability. SVR and MLP produced lower accuracy (R2 = 0.86 and 0.89, respectively) and demonstrated higher sensitivity to feature scaling and initialization settings. The random forest model, by contrast, provided an optimal trade-off between accuracy, stability, and explainability. After systematic hyperparameter tuning using a grid search approach and 5-fold cross-validation, the final model achieved strong performance on the test set: R2 = 0.91, MAE = 4.23 kWh/m2/year, and RMSE = 6.71 kWh/m2/year. The model was trained using an 80/20 split between training and testing data and consisted of an ensemble of 100 decision trees. Random forest’s non-parametric nature, robustness to multicollinearity and outliers, and capacity to model non-linear relationships without requiring distributional assumptions made it especially suitable for the diverse and heterogeneous nature of building energy datasets.
Furthermore, the ability of random forest to directly compute interpretable feature importance rankings is a key advantage in retrofit applications, where stakeholders demand both predictive performance and transparent justification for decisions. Based on this comparative evaluation, random forest was selected as the primary model for subsequent analyses, offering both technical robustness and operational value in the context of data-driven energy renovation planning.
To assess the importance of each feature, permutation importance, a model-agnostic method that evaluates how shuffling the values of each variable affects prediction accuracy, has been employed. This approach provided a reliable, unbiased ranking of the most influential variables. The model achieved an R2 score of 0.91 on the test set, indicating excellent predictive performance. To enhance interpretability, the resulting importance scores were normalized, offering a clear comparative view of feature impacts.
Figure 3 presents the feature importance analysis results, illustrating the relative contribution of each building characteristic to energy consumption prediction. The horizontal bar chart clearly demonstrates that envelope components—particularly wall U-values (28%), glass U-values (18%), and glass surface percentage (12%)—have the most substantial impact on energy performance. This visualization quantifies the relative influence of each parameter, showing that the top three factors alone account for approximately 60% of the model’s predictive power. Notably, occupancy-related parameters contribute minimally (5% combined), suggesting that physical building characteristics offer significantly greater leverage for energy efficiency improvements. The color-coded categorization further enhances interpretability by grouping similar architectural elements, enabling stakeholders to identify which building systems present the most promising retrofit opportunities. This visualization serves as a critical decision-support tool, allowing building owners and policy makers to direct limited resources toward modifications with the highest potential impact. The ranking of feature importance in Figure 3 guided the identification of a digital representative building and the selection of retrofit strategies in the subsequent sections.
The results of the analysis revealed that wall insulation performance (wall U-value), the percentage of glass surface, and year of construction were the most critical predictors of total energy consumption. These findings are consistent with physical building energy simulation models and reinforce the significance of envelope design in determining energy efficiency. By statistically quantifying the influence of each parameter, this methodology provides a concrete basis for prioritizing retrofit measures. Instead of applying generic solutions, stakeholders can now identify and target the most impactful areas for intervention. For building owners, designers, and policymakers, this analysis offers a valuable tool for maximizing energy savings and return on investment, particularly in the context of constrained renovation budgets and evolving energy performance regulations.

3.2. Clustering Analysis for Building Digital Representative Building

The clustering analysis phase plays a pivotal role in this methodology, enabling the classification of high-energy-consuming buildings into distinct, data-driven digital representative buildings that support tailored retrofit interventions. Various clustering techniques have been explored in the literature, generally categorized into four main types: partitional, hierarchical, density-based, and grid-based methods [31]. Among these, k-means clustering (a partitional approach) and agglomerative hierarchical clustering (AHC) are the most widely adopted for building energy analysis. Tardioli et al. [32] conducted a comprehensive comparison of clustering algorithms, testing different normalization techniques and validation indices to identify representative residential multifamily buildings. Their results showed that k-means consistently outperformed other methods, including AHC, particularly for applications involving energy use segmentation [33], thermal zoning, and load pattern classification in office buildings [34]. Furthermore, k-means has proven effective in identifying homogeneous subgroups based on energy consumption characteristics, reinforcing its utility for clustering tasks in the built environment [35].
In this study, clustering was employed as an unsupervised technique to segment the building dataset into groups with similar energy performance and physical characteristics. The k-means algorithm was used, a computationally efficient and scalable method well-suited to high-dimensional numerical data. Despite its simplicity, k-means may be less effective for detecting arbitrarily shaped clusters, as it relies on Euclidean distance to define cluster boundaries [36]. The input features for clustering included normalized values of energy consumption per square meter, wall and glass U-values, glazing percentage, building volume, and occupancy density—selected for their demonstrated relevance to energy use.
The optimal number of clusters (k = 3) was determined using the elbow method, which identified a clear inflection point in the within-cluster sum of squares curve, and validated through silhouette analysis, which yielded a score of 0.64—indicating well-differentiated and cohesive clusters. This outcome is also consistent with the findings of Yilmaz et al. [37], who reported that k = 3 offered the best cluster quality for electricity usage profiles in residential buildings. To visualize the cluster structure and further assess validity, principal component analysis (PCA) was applied to project the high-dimensional feature space onto two principal components. The resulting segmentation—Critical, High, and Moderate–High clusters—captures distinct patterns in building energy behavior, supporting strategic retrofit prioritization. As shown in Figure 4, this data-driven segmentation reveals that nearly 75% of buildings fall into the Critical and High categories, emphasizing the value of clustering for guiding resource allocation and targeted intervention in the energy transition process.
The identification of these distinct high-consumption digital representative buildings directly supports the research objective of developing a data-driven model for cost–benefit assessment of retrofit interventions. Each consumption level cluster exhibits a unique combination of influential factors—with Critical-level buildings showing the highest wall and glass U-values (0.87 and 0.76 normalized) and earliest construction years (0.25 normalized), while Moderate–High buildings demonstrate better envelope performance but still require substantial improvements. These differentiated profiles enable the prioritization of retrofit measures based on cluster-specific characteristics rather than applying generic solutions across the entire building stock. For instance, buildings in the Critical category may benefit most from comprehensive envelope retrofitting, whereas Moderate–High-consumption buildings might achieve optimal results through more targeted interventions that address specific deficiencies. This nuanced understanding aligns perfectly to identify the most effective and economically sustainable interventions for each building typology.
Beyond its statistical value, the clustering approach offers strategic insights for policy development and the implementation of energy transition programs. The distribution of buildings across the three identified categories, 28.6% Critical, 46.1% High, and 25.3% Moderate–High, as shown in Figure 4, provides a quantitative foundation for resource allocation at urban and regional scales. Notably, the finding that nearly three-quarters of buildings fall into the Critical- or High-consumption levels underscores the need for targeted retrofit policies and incentive structures.
Additionally, the cluster-based visualization of normalized building characteristics enhances communication with non-technical stakeholders by illustrating the links between physical attributes and energy performance. This evidence-based segmentation transforms complex building data into actionable insights and supports the development of tailored renovation strategies that optimize both energy savings and return on investment. Ultimately, the clustering framework strengthens the digital retrofit model by bridging predictive analytics and policy action, facilitating scalable and cost-effective energy transformation at the city or district level.

3.3. Economic Analysis of Retrofit Interventions

The economic analysis of retrofit interventions represents a critical component of our data-driven methodology, translating energy efficiency improvements into financial metrics that stakeholders can use for decision making. This phase builds upon the feature importance and clustering analyses to evaluate the cost-effectiveness of various retrofit measures across different building typologies.
For each building cluster identified in the previous phase, a comprehensive cost–benefit analysis of five key retrofit interventions was performed: Wall Insulation, Window Replacement, HVAC System Upgrades, Solar Panel Installation, and Smart Home Systems. The analysis incorporated actual cost data from [29], which included average costs per square meter, expected lifespans, and energy-saving percentages for each intervention. To ensure comparability across different building types and intervention scales, all cost-related metrics were normalized for visualization purposes (Algorithm 1).
Algorithm 1: Economic Analysis of Retrofit Interventions
Input:
B = {b1, b2, …, bn}: Set of buildings with characteristics and energy consumption data
M = {m1, m2, …, mk}: Set of retrofit measures with associated parameters
E = {r, p}: Economic parameters where r is discount rate and p is energy price
Output:
R: Economic indicators (NPV, ROI, payback period) for each measure by cluster
Procedure:
1: function CalculateRetrofitEconomics(B, M, E)
2: R ← ∅ ▶ Initialize empty results collection
3: r ← E.discount_rate ▶ Extract discount rate
4: p ← E.energy_price ▶ Extract energy price
5: C ← ClusterBuildings(B) ▶ Group buildings into clusters
6: for each cluster c in C do
7: Ec = mean(energy_consumption(c)) ▶ Average energy consumption
8: Ac = mean(building_area(c)) ▶ Average building area
9: Rc ← ∅ ▶ Initialize results for this cluster
10: for each measure m in M do
11: id ← m.identifier
12: cost0 = m.cost_per_m2 × Ac ▶ Initial investment cost
13: η = m.energy_saving_percentage / 100 ▶ Energy saving efficiency
14: s_energy ← Ec × η × Ac ▶ Annual energy savings (kWh)
15: s_annual = s_energy × p ▶ Annual monetary savings (€)
16: L = m.expected_lifespan ▶ Lifespan in years
17: NPV = −cost0 ▶ Begin with negative investment
18: for t ← 1 to L do
19: NPV = NPV + s_annual / (1 + r)t ▶ Add discounted annual savings
20: end for
21: ROI = NPV / cost0 ▶ Return on investment ratio
22: PP = cost0 / s_annual ▶ Payback period in years
23: Rc[id] = (NPV, ROI, PP, cost0, s_annual, L)
24: end for
25: R[c] = Rc
26: end for
27: return R
28: end function
29: function RankInterventionsByMetric(R, metric)
30: T ← ∅ ▶ Initialize ranking result
31: for each cluster c in R do
32: Rc = R[c]
33: if metric ∈ {NPV, ROI} then
34: Tc = sort(Rc, metric, descending) ▶ Higher values are better
35: else
36: Tc = sort(Rc, metric, ascending) ▶ Lower values are better
37: end if
38: T[c] = Tc
39: end for
40: return T
41: end function
Three primary economic indicators were calculated as shown in Algorithm 1 (Figure 5):
  • NPV: The sum of discounted annual energy savings over the expected lifespan of the intervention minus the initial investment, using a discount rate of 4%. This metric accounts for the time value of money and provides a comprehensive measure of economic benefit.
  • ROI: The ratio of NPV to total cost, representing economic efficiency and enabling direct comparison between interventions regardless of scale. This dimensionless metric indicates how many euros are returned for each euro invested.
  • Payback Period: The number of years required to recover the initial investment through annual energy savings, representing a straightforward measure of temporal economic viability.

3.4. Building Cluster Characteristics and Energy Performance

Figure 4 presents the principal component analysis (PCA) visualization of the three building clusters, with axes normalized to a 0–1 scale for consistency with our overall approach. The clear separation between clusters (silhouette score: 0.64) confirms the distinct energy consumption profiles of Critical-consumption (red), High-consumption (dark orange) and Moderate–High-consumption buildings (light orange). Each cluster is characterized by a unique combination of features: Critical-consumption buildings exhibit the highest wall and glass U-values (0.87 and 0.76 normalized) and earliest construction years; High-consumption buildings show intermediate values; while Moderate–High buildings demonstrate somewhat better envelope performance but still require substantial improvements. This differentiation enables us to evaluate retrofit strategies in a context-specific manner, acknowledging the heterogeneity of the building stock.

3.5. Payback Period Analysis by Building Cluster

Figure 5 presents the payback period for each retrofit intervention across the three building clusters. This visualization reveals substantial variations in the time required to recover initial investments, ranging from approximately 5.5 years (Smart Home Systems in Moderate–High-consumption buildings) to over 15 years (Window Replacement across all clusters). HVAC System Upgrades consistently demonstrate favorable payback periods (8.0–10.6 years) across all clusters with High-consumption buildings showing the shortest recovery time (8.0 years).
This can be attributed to their combination of relatively high energy use and moderately aged HVAC systems, creating an optimal intervention point. Smart Home Systems also show attractive payback periods, particularly for Moderate–High-consumption buildings (5.5 years), suggesting that digital control systems offer a cost-effective pathway for buildings with better baseline efficiency. Window Replacement interventions exhibit the longest payback periods (>15 years) across all clusters, reflecting their high initial cost relative to energy savings potential.

3.6. Economic Performance and Sensitivity Analysis of Retrofit Measures

Figure 6 displays the ROI for each retrofit intervention across the three building clusters, providing a direct measure of economic efficiency. The highest ROI values are observed for Wall Insulation in Critical-consumption buildings (2.40), HVAC System Upgrades in High-consumption buildings (2.39), and Smart Home Systems in Moderate–High-consumption buildings (2.48).
These findings demonstrate how different building types benefit from targeted intervention strategies: buildings with poor thermal envelopes (Critical consumption) achieve the greatest returns from insulation improvements; buildings with moderate envelope performance but inefficient systems (High consumption) benefit most from mechanical upgrades; while buildings with relatively better baseline performance (Moderate–High consumption) achieve optimal returns through control system optimization. Solar panels consistently show the lowest ROI values (1.15–1.27) across all clusters, indicating that, under current cost structures and without considering incentives, renewable energy generation represents a less financially attractive option compared to energy efficiency measures.
Robustness to Financial Assumptions
To assess the stability and real-world reliability of the economic results, a sensitivity analysis was conducted on two key financial assumptions: the discount rate and energy price levels. These parameters strongly influence long-term investment outcomes and are subject to market fluctuations and policy shifts. This analysis enhances the robustness of the economic model by evaluating whether the identified retrofit priorities remain valid under different financial scenarios.
This evaluation focuses on the three core economic indicators: NPV, ROI, and payback period.
NPV estimates the cumulative benefit of a retrofit over its lifetime, accounting for the time value of money. It is calculated as:
N P V = t = 1 n S a v i n g t M a i n t e n a n c e C o s t t 1 + r  
where:
Savingt = Annual energy saving in year t.
MaintenanceCostt = Annual maintenance cost in year t.
r = Discount rate.
t = Year number (e.g., 1, 2, …, n).
n = Total lifetime of the retrofit (in years).
A positive NPV means the investment yields more financial benefit than it costs over time. ROI is calculated as:
R O I = N P V I n i t i a l _ I n v e s t m e n t
It expresses how much return is achieved per euro invested. An ROI above 1.0 indicates a profitable measure.
The payback period is defined by the year when:
t = 1 n S a v i n g t M a i n t e n a n c e C o s t t 1 + r t I n i t i a l _ I n v e s t m e n t
It shows how many years are needed to fully recover the initial investment through annual energy savings.
Sensitivity-Based NPV Formulation
To simulate economic uncertainty, a sensitivity-adjusted version of the NPV formula was used. This accounts for potential variation in key parameters:
N P V ( δ ) = t = 1 n S a v i n g t ( 1 + δ e n e r g y ) M a i n t e n a n c e C o s t t ( 1 + δ c o s t ) 1 + r + δ r a t e t
where:
  • δ e n e r g y = Percentage change in energy prices (e.g., ±20%).
  • δ c o s t = Change in maintenance or retrofit costs.
  • δ r a t e = Adjustment to the discount rate.
  •  r = Baseline discount rate.
These results demonstrate that the model’s economic recommendations remain valid even under significant variations in key financial assumptions. The sensitivity analysis confirms the overall robustness of the proposed methodology, reinforcing its applicability in real-world policy and investment decision making.

3.7. HVAC Upgrade Analysis Across Building Clusters

Figure 7 presents a detailed analysis of HVAC System Upgrades across the three building clusters, comparing ROI, payback period (years/5), system age (years/10), and system efficiency. This visualization effectively demonstrates the correlation between system characteristics and economic performance. High-consumption buildings exhibit the highest ROI (2.39) despite having intermediate system age (15.0 years) and identical efficiency (0.70) to other clusters, suggesting that other factors such as usage patterns and envelope interaction significantly influence upgrade benefits.
Critical-consumption buildings show slightly lower ROI (2.12) despite having the oldest systems (15.1 years), likely due to their poor envelope performance, which reduces the relative impact of mechanical improvements. Moderate–High-consumption buildings demonstrate the lowest relative benefit from HVAC upgrades (ROI: 2.05), consistent with their somewhat newer systems (14.4 years). This analysis reinforces the importance of considering interactions between building systems when prioritizing interventions.

3.8. HVAC Upgrade ROI vs. System Age Relationship

Figure 8 illustrates the relationship between HVAC system age and retrofit ROI for the three building clusters, with a theoretical curve demonstrating the generally increasing return potential as systems age. High-consumption buildings (dark orange) achieve the highest ROI (2.39) at a system age of approximately 15 years, outperforming both Critical-consumption buildings (red, ROI: 2.12) and Moderate–High-consumption buildings (light orange, ROI: 2.05) with similar system ages. This visualization reveals that, while system age is a significant factor influencing upgrade economics, it is not the sole determinant; building usage patterns and interactions with other systems also play crucial roles. The accompanying insights note how ROI increases with system age and how decision tree modeling predicts approximately 20% energy savings for systems older than 15 years. This information provides valuable guidance for timing HVAC system interventions to maximize economic returns.
The combined economic analysis results demonstrate that retrofit intervention strategies should be highly targeted to building-specific characteristics rather than applied as generic solutions. The optimal intervention varies significantly by building cluster: Wall Insulation for Critical-consumption buildings (ROI: 2.40), HVAC System Upgrades for High-consumption buildings (ROI: 2.39), and Smart Home Systems for Moderate–High-consumption buildings (ROI: 2.48). These findings provide a quantitative foundation for strategic decision making, enabling stakeholders to allocate limited resources to interventions with the greatest economic and environmental returns.
Table 1 summarizes a comparative overview of the five key retrofit interventions evaluated in this study, highlighting their economic viability across different building clusters based on ROI and payback period. Smart Home Systems emerge as the top-performing intervention in Moderate–High-consumption buildings, showing the highest ROI of 2.48 and the shortest payback period of 5.5 years.
This suggests that integrating digital control technologies in buildings that already possess relatively efficient envelopes can yield substantial returns with minimal upfront investment. Wall Insulation proves to be the most impactful for Critical-consumption buildings, which typically suffer from poor envelope performance. With an ROI of 2.40 and a payback period of approximately 10.2 years, insulation improvements substantially reduce heating demands, making them economically justifiable despite their moderate cost. HVAC System Upgrades are especially effective in High-consumption buildings, offering a nearly equivalent ROI of 2.39 and a quicker payback of 8.0 years. This is likely due to the combination of inefficient existing systems and high baseline energy use, which amplify the energy-saving impact of new HVAC technologies. In contrast, Window Replacement shows a lower ROI (~1.20) and a payback period exceeding 15 years across all clusters. Despite improving thermal comfort and performance, it has a high cost and relatively moderate energy savings limit its economic competitiveness, making it less suitable as a first-line intervention.
Similarly, Solar Panel Installation, while often associated with environmental benefits, demonstrates ROI values between 1.15 and 1.27 and payback periods ranging from 13 to over 15 years, suggesting that, under current market conditions and without incentives, it is less financially attractive compared to energy efficiency measures.
Overall, Table 1 reinforces the necessity of tailoring retrofit strategies to the specific characteristics of building clusters. Interventions should be selected not only based on potential energy savings but also on their economic performance within the context of building typologies. This ensures optimal resource allocation and maximizes both environmental and financial impact. Smart Home Systems, HVAC System Upgrades, and Wall Insulation emerge as the most cost-effective solutions in their respective clusters, offering the best ROI and the shortest payback periods. Cost-effectiveness is evaluated excluding subsidies or incentives; values may shift under incentive regimes.

4. Discussion

The results obtained confirm the initial hypothesis that jointly analyzing large-scale building data and applying predictive ML models make it possible to accurately identify the parameters that most influence energy consumption. In particular, the performance of the building envelope (i.e., the thermal transmittance values of walls and glazed surfaces) was found to be the main predictive factor, which is consistent with previous reports in the literature [14,22]. Compared to other state-of-the-art approaches, the proposed model stands out due to:
  • High generalizability, thanks to the use of a large, representative dataset of real buildings.
  • Cross-validation with clustering, which has made it possible to define energy digital representative buildings and support targeted, non-uniform retrofit strategies.
  • Economic integration, with a quantitative assessment of ROI and payback times for each intervention measure.
Many recent studies focus on using ML techniques to predict energy consumption or optimizing the physical simulation of retrofit interventions [16,17,18,19]. However, these approaches typically treat energy, economic, and strategic aspects in isolation, which limits their applicability in complex decision-making scenarios.
The proposed model, on the other hand, combines predictive analysis with functional segmentation of the building stock and comparative economic analysis considering the main profitability indicators (ROI, NPV, and payback period). This makes it possible to estimate expected energy performance and assess the financial sustainability of interventions based on buildings’ typological characteristics. Including socio-economic and asset dimensions, such as unequal access to retrofit measures and the link between energy efficiency and property value, enables a more comprehensive approach to the challenges of the energy transition that goes beyond mere technical optimization. In this sense, the model meets the need for integrated decision-making tools that can support public policies and planning strategies that are geared towards equity, sustainability, and efficiency.
The economic results confirm that the most effective intervention varies depending on the cluster. For instance, wall insulation is particularly beneficial in “Critical” buildings, while smart systems offer attractive returns in “Moderate–High” contexts. This demonstrates the need for adaptive approaches in redevelopment policies rather than applying standardized solutions.
It is also noteworthy that interventions on occupancy or plant parameters are less effective in terms of consumption forecasting, which reinforces the importance of acting on the physical characteristics of the building. This finding aligns with LSTM energy forecasting models, which, despite incorporating occupancy data, recognize the pivotal role of the building envelope.
Although the analysis was conducted on a sample of buildings in the Lazio region, the proposed model has characteristics that facilitate its transfer to different climatic and regulatory contexts. The selection of standardized architectural and technological variables, the use of non-parametric algorithms such as random forest, and the adoption of generalizable economic metrics all support its applicability in heterogeneous scenarios. Furthermore, the approach has been designed to accommodate different levels of detail, which is important for adaptation in areas where data is limited or building characteristics differ. The use of open formats and the focus on replicability further strengthen the model’s potential as a decision-making tool at a national or international level.
To further clarify the model’s generalizability, it is important to note that the variables used in both the predictive and clustering stages, such as wall and glass transmittance (U-values), surface-to-volume ratio, occupancy density, and HVAC system age, are not location-specific and are widely recognized as key factors in energy performance regardless of geographic context. Moreover, the adoption of non-parametric machine learning models like random forest avoids overfitting to local characteristics and allows the model to capture general patterns in energy behavior. The modular structure of the methodology, along with the use of open data formats and interoperable inputs, ensures that it can be adapted to different climates, building codes, and construction practices by substituting local datasets. This flexibility is already embedded in the model’s architecture, making it suitable for broader application beyond the Lazio region.

4.1. Limitations

Despite the proven effectiveness of the proposed model, it is worth highlighting some limitations that could affect its applicability. Firstly, the contextualization of economic data: the costs used for the economic evaluation of the interventions are derived from regional sources, particularly the Lazio region’s price list. While these data are representative of the analyzed context, they may not reflect the economic, regulatory, and market conditions of other Italian regions or European countries. Consequently, the accuracy of the ROI and payback estimates may vary depending on the local context. Regarding the absence of incentives and fiscal instruments: the economic model does not consider the impact of government incentives, tax breaks (such as the 110% Superbonus or the Conto Termico), or local subsidies, which could have a significant effect on the economic viability of the measures. The absence of these variables in the calculation results in a conservative estimate, which may underestimate the financial benefits for end users. The simulations depend on current data and are based on static conditions linked to building, climate, technological, and economic data for the two-year period 2022–2023. However, future developments in these variables, such as rising energy prices, climate change, technological advances, or new European regulations, could significantly impact the energy and economic performance of the assessed measures. The model therefore needs to be updated periodically to maintain its predictive validity. Regarding the lack of validation in real conditions: although the model has achieved high predictive performance (R2 = 0.91), it has not yet been tested in real retrofit cases. Field validation, through the concrete application of the model to actual interventions, is a necessary step to verify its operational effectiveness. In summary, despite some limitations relating to the contextual nature of the data and the exclusion of dynamic and incentive variables, the model provides a solid methodological basis for supporting informed decisions in energy retrofitting.

4.2. Future Directions

To strengthen the robustness and practical application of the proposed model, the following future lines of development can be envisaged: one option is to extend the model to non-residential buildings. Currently, the model has been developed on the basis of residential buildings. Expanding it to include non-residential buildings such as schools, offices, and hospitals would cover an important segment of the building stock and extend the model’s strategic relevance to public administrations and large property managers. Another possible development would be to apply the model to heterogeneous geographical contexts. This would involve replicating the analysis in other territorial, national, or international contexts to assess the model’s transferability and adapt it to local specifics in terms of climate, regulations, data availability, and costs. Adopting open and interoperable data formats will be crucial to ensuring scalability. Integrating the model with dynamic simulations and building information modeling (BIM) tools would enable multiobjective assessments, including energy efficiency, comfort indicators, environmental impact (e.g., CO2 emissions), and life cycle. This would enable more holistic and informed planning. User interfaces and decision-making tools could be developed to maximize operational impact. These tools could be implemented in interactive digital formats, such as dashboards or web platforms, and would be aimed at designers, local administrators, and investors. These tools would facilitate the adoption of data-driven strategies and the integration of the model into urban planning processes, public tenders, and energy sustainability plans. The future outlook outlines a path for the model’s evolution towards greater generalizability, interoperability, and operational integration. The aim is to make the model an adaptable and useful tool at urban, territorial, and political levels.

5. Conclusions

Within the framework of European climate neutrality and emission reduction targets, retrofitting existing buildings emerges as a key strategic lever. However, identifying truly effective and sustainable solutions is difficult due to the wide variety of building types, the lack of standardized predictive tools, and the economic uncertainty associated with interventions. Addressing these challenges, this research has developed and tested a data-driven digital methodology to inform decision making in retrofitting. The methodology consists of four main phases. First, a statistical and correlation analysis was conducted on a large dataset of residential buildings to identify the most relevant architectural, technological, and dimensional variables for energy consumption. Second, an ML model (random forest regressor) was applied to quantify the relative importance of each parameter and build a predictive system capable of accurately estimating the annual energy demand of buildings (R2 = 0.91). The results confirmed the central role of the building envelope, particularly the transmittance of walls and glazed surfaces, as well as the percentage of window area. Thirdly, a k-means clustering phase was carried out on a subset of high-consumption buildings to define homogeneous performance digital representative building and guide targeted intervention strategies. Three main groups were identified: “Critical”, “High”, and “Moderate–High” buildings, each characterized by a specific combination of building parameters and energy performance. Finally, a comparative economic analysis was carried out for each cluster on five energy renovation measures: wall insulation; replacement of windows; upgrading of HVAC systems; installation of photovoltaic panels; and smart systems. Using real data on costs and expected benefits from regional technical sources, three key indicators were calculated: NPV, ROI, and payback period. This analysis revealed that the economic effectiveness of the measures varies significantly depending on the building profile, thus confirming the need for customized retrofit strategies.
In conclusion, this study has demonstrated that an integrated approach based on real data, predictive algorithms, and economic modeling can effectively support the energy transformation of the building stock. The proposed model provides a solid basis for making objective, transparent, and replicable intervention decisions, with the potential to positively impact both energy efficiency and financial sustainability.

Author Contributions

Conceptualization, G.P. and F.M.; methodology, G.P., F.M. and Z.Z.; software, G.P., F.M. and Z.Z.; validation, G.P., F.M., and Z.Z.; formal analysis, G.P., F.M. and Z.Z.; investigation, G.P., F.M. and Z.Z.; resources, G.P., F.M. and Z.Z.; data curation, G.P., F.M. and Z.Z.; writing—original draft preparation, G.P., F.M. and Z.Z.; writing—review and editing, G.P., F.M., and Z.Z.; visualization, G.P., F.M. and Z.Z.; supervision, G.P.; project administration, G.P. All authors have read and agreed to the published version of the manuscript.

Funding

The work was funded by Sapienza University Research Call 2023—CUP B83C23006090005, project title: Innovative Technological Solutions and Strategic Approaches to Enhance Energy Efficiency in Energy Production, Distribution and Utilization (SAEED); research topic: Optimization process for energy efficiency in buildings through the modelling, setting and monitoring of a Digital Twin and its integration of smart devices.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. European Concilium. Green Deal “Fit to 55”. Available online: https://www.consilium.europa.eu/en/policies/fit-for-55/ (accessed on 15 April 2025).
  2. IEA. Energy Efficiency 2023; IEA: Paris, France, 2023; Available online: https://www.iea.org/reports/energy-efficiency-2023 (accessed on 11 April 2025).
  3. IEA. Building Consumption. Available online: https://www.iea.org/energy-system/buildings (accessed on 20 April 2025).
  4. Italian Parliament. Ordinary Law of Parliament n. 373/76: Standards for the Reduction of Energy Consumption for Heating in Buildings; Italian Parliament: Rome, Italy, 1976. [Google Scholar]
  5. ISTAT. Inventory of the Built Heritage. Available online: http://dati-censimentopopolazione.istat.it/Index.aspx?DataSetCode=DICA_EDIFICI1 (accessed on 16 April 2025).
  6. Kylili, A.; Fokaides, P.A.; Lopez Jimenez, P.A. Key Performance Indicators (KPIs) Approach in Buildings Renovation for the Sustainability of the Built Environment: A Review. Renew. Sustain. Energy Rev. 2016, 56, 906–915. [Google Scholar] [CrossRef]
  7. Piras, G.; Muzi, F. BIM for Sustainable Redevelopment of a Major Office Building in Rome. Buildings 2025, 15, 824. [Google Scholar] [CrossRef]
  8. Du, Q.; Yang, M.; Wang, Y.; Wang, X.; Dong, Y. Dynamic Simulation for Carbon Emission Reduction Effects of the Prefabricated Building Supply Chain under Environmental Policies. Sustain. Cities Soc. 2024, 100, 105027. [Google Scholar] [CrossRef]
  9. Brozovsky, J.; Labonnote, N.; Vigren, O. Digital Technologies in Architecture, Engineering and Construction. Autom. Constr. 2024, 158, 105212. [Google Scholar] [CrossRef]
  10. Aghili, S.A.; Haji Mohammad Rezaei, A.; Tafazzoli, M.; Khanzadi, M.; Rahbar, M. Artificial Intelligence Approaches to Energy Management in HVAC Systems: A Systematic Review. Buildings 2025, 15, 1008. [Google Scholar] [CrossRef]
  11. Ngarambe, J.; Yun, G.Y.; Santamouris, M. The Use of Artificial Intelligence (AI) Methods in the Prediction of Thermal Comfort in Buildings: Energy Implications of AI-Based Thermal Comfort Controls. Energy Build. 2020, 211, 109807. [Google Scholar] [CrossRef]
  12. Almazam, K.; Humaidan, O.; Shannan, N.M.; Bashir, F.M.; Gammoudi, T.; Dodo, Y.A. Innovative Energy Efficiency in HVAC Systems with an Integrated Machine Learning and Model Predictive Control Technique: A Prospective Toward Sustainable Buildings. Sustainability 2025, 17, 2916. [Google Scholar] [CrossRef]
  13. Mazzetto, S. Hybrid Predictive Maintenance for Building Systems: Integrating Rule-Based and Machine Learning Models for Fault Detection Using a High-Resolution Danish Dataset. Buildings 2025, 15, 630. [Google Scholar] [CrossRef]
  14. Shan, R.; Lai, W.; Tang, H.; Leng, X.; Gu, W. Residential Building Renovation Considering Energy, Carbon Emissions and Cost: An Approach Integrating Machine Learning and Evolutionary Generation. Appl. Sci. 2025, 15, 1830. [Google Scholar] [CrossRef]
  15. Anan, M.; Kanaan, K.; Benhaddou, D.; Nasser, N.; Qolomany, B.; Talei, H.; Sawalmeh, A. Occupant-Aware Energy Consumption Prediction in Smart Buildings Using a LSTM Model and Time Series Data. Energies 2024, 17, 6451. [Google Scholar] [CrossRef]
  16. Liu, J.; Chen, J. Applications and Trends of Machine Learning in Building Energy Optimization: A Bibliometric Analysis. Buildings 2025, 15, 994. [Google Scholar] [CrossRef]
  17. Baset, A.; Jradi, M. Data-Driven Decision Support for Smart and Efficient Building Energy Retrofits: A Review. Appl. Syst. Innov. 2025, 8, 5. [Google Scholar] [CrossRef]
  18. Imani, E.; Dawood, H.; Williams, S.; Dawood, N. Physics-Based and Data-Driven Retrofitting Solutions for Energy Efficiency and Thermal Comfort in the UK: IoT-Validated Analysis. Buildings 2025, 15, 1050. [Google Scholar] [CrossRef]
  19. Tavano, D.; Salvo, F.; De Simone, M.; Bilotta, A.; Del Giudice, F.P. Who Can Afford to Decarbonize? Early Insights from a Socioeconomic Model for Energy Retrofit Decision-Making. Real Estate 2025, 2, 6. [Google Scholar] [CrossRef]
  20. Gholamzadehmir, M.; Pandolfi, A.M.; Del Pero, C.; Leonforte, F.; Sdino, L. Increasing the Market Value of Buildings Through Energy Retrofitting: A Comparison of Actual Retrofit Costs and Perceived Values. Buildings 2025, 15, 376. [Google Scholar] [CrossRef]
  21. Berti, N.; Arman, A.; Esmaili, P.; Zeynivand, M.; Battini, D.; Bianchini, D.; Cristaldi, L.; De Giuli, L.B.; Galeazzo, A.; Gruosso, G.; et al. Sustainability and Resilience in the MICS SPOKE8 project: The role of the Digital Twin. In Proceedings of the 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), St Albans, UK, 21 October 2024; IEEE: Piscataway, NJ, USA; pp. 669–673. [Google Scholar]
  22. Manfren, M.; James, P.A.B.; Tronchin, L. Data-driven building energy modelling–An analysis of the potential for generalisation through interpretable machine learning. Renew. Sustain. Energy Rev. 2022, 167, 112686. [Google Scholar] [CrossRef]
  23. Li, D.; Qi, Z.; Zhou, Y.; Elchalakani, M. Machine Learning Applications in Building Energy Systems: Review and Prospects. Buildings 2025, 15, 648. [Google Scholar] [CrossRef]
  24. di Stefano, A.G.; Ruta, M.; Masera, G.; Hoque, S. Leveraging Machine Learning to Forecast Neighborhood Energy Use in Early Design Stages: A Preliminary Application. Buildings 2024, 14, 3866. [Google Scholar] [CrossRef]
  25. Piras, G.; Muzi, F.; Ziran, Z. Open Tool for Automated Development of Renewable Energy Communities: Artificial Intelligence and Machine Learning Techniques for Methodological Approach. Energies 2024, 17, 5726. [Google Scholar] [CrossRef]
  26. Regione Lazio, Prezziario dei Lavori Pubblici. Available online: https://www.regione.lazio.it/cittadini/lavori-pubblici-infrastrutture/tariffa-prezzi-lavori-pubblici (accessed on 18 April 2025).
  27. Scrucca, F.; Palladino, D. Integration of Energy Simulations and Life Cycle Assessment in Building Refurbishment: An Affordability Comparison of Thermal Insulation Materials through a New Sustainability Index. Sustainability 2023, 15, 1412. [Google Scholar] [CrossRef]
  28. Roy, A.; Uddala, S.; Saboor, S.; Arıcı, M.; Saxena, K.K. Contemporary Roof Pattern for Energy Efficient Buildings: Air Conditioning Cost Alleviation, CO2 Emission Mitigation Potential and Acceptable Payback Period. J. Build. Eng. 2024, 95, 110250. [Google Scholar] [CrossRef]
  29. Prezziario Regione Lazio. Available online: https://www.regione.lazio.it/sites/default/files/documentazione/INF-DGR-101-14-04-2023.pdf (accessed on 15 May 2025).
  30. Ziran, Z.; Mecella, M.; Leotta, F. A Simplified and Sustainable Approach for Energy Prediction. In Proceedings of the Intelligent Environments 2024: Combined Proceedings of Workshops and Demos & Videos Session, Ljubljana, Slovenia, 17–20 June 2024; IOS Press: Amsterdam, The Netherlands, 2024; pp. 104–113. [Google Scholar] [CrossRef]
  31. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  32. Halkidi, M.; Batistakis, Y.; Vazirgiannis, M. On clustering validation techniques. J. Intell. Inf. Syst. 2001, 17, 107–145. [Google Scholar] [CrossRef]
  33. Tardioli, G.; Kerrigan, R.; Oates, M.; O’Donnell, J.; Finn, D.P. Identification of representative buildings and building groups in urban datasets using a novel pre-processing, classification, clustering and predictive modelling approach. Build. Environ. 2018, 140, 90–106. [Google Scholar] [CrossRef]
  34. Hsu, D. Comparison of integrated clustering methods for accurate and stable prediction of building energy consumption data. Appl. Energy 2015, 160, 153–163. [Google Scholar] [CrossRef]
  35. Deng, X.; Tan, Z.; Tan, M.; Chen, W. A clustering-based climatic zoning method for office buildings in China. J. Build. Eng. 2021, 42, 102778. [Google Scholar] [CrossRef]
  36. Borges, P.; Travesset-Baro, O.; Pages-Ramon, A. Hybrid approach to representative building digital representative building development for urban models–A case study in Andorra. Build. Environ. 2022, 215, 108958. [Google Scholar] [CrossRef]
  37. Yilmaz, S.; Chambers, J.; Patel, M.K. Comparison of clustering approaches for domestic electricity load profile characterisation-Implications for demand side management. Energy 2019, 180, 665–677. [Google Scholar] [CrossRef]
Figure 1. Average distribution of energy consumption components across buildings.
Figure 1. Average distribution of energy consumption components across buildings.
Applsci 15 08117 g001
Figure 2. A correlation matrix between building design parameters and their energy consumption.
Figure 2. A correlation matrix between building design parameters and their energy consumption.
Applsci 15 08117 g002
Figure 3. Feature importance derived from the Random Forest model.
Figure 3. Feature importance derived from the Random Forest model.
Applsci 15 08117 g003
Figure 4. Building clusters based on energy consumption and characteristics.
Figure 4. Building clusters based on energy consumption and characteristics.
Applsci 15 08117 g004
Figure 5. Payback period for each retrofit intervention across the three building clusters.
Figure 5. Payback period for each retrofit intervention across the three building clusters.
Applsci 15 08117 g005
Figure 6. ROI for each retrofit intervention across the three building clusters.
Figure 6. ROI for each retrofit intervention across the three building clusters.
Applsci 15 08117 g006
Figure 7. HVAC Upgrade analysis across building clusters.
Figure 7. HVAC Upgrade analysis across building clusters.
Applsci 15 08117 g007
Figure 8. Relationship between HVAC system age and retrofit ROI.
Figure 8. Relationship between HVAC system age and retrofit ROI.
Applsci 15 08117 g008
Table 1. Economic effectiveness of retrofit interventions by building clusters.
Table 1. Economic effectiveness of retrofit interventions by building clusters.
Retrofit MeasureBuilding ClusterROIPayback Period (Years)Most Effective Cluster
Wall InsulationCritical2.40~10.2Critical-Consumption Buildings
Window ReplacementAll Clusters~1.20>15.0Not Recommended as Primary Intervention
HVAC System UpgradesHigh2.398.0High-Consumption Buildings
Smart Home SystemsModerate–High2.485.5Moderate–High-Consumption Buildings
Solar Panel InstallationAll Clusters1.15–1.2713–15+Less Economically Competitive
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Piras, G.; Muzi, F.; Ziran, Z. A Data-Driven Model for the Energy and Economic Assessment of Building Renovations. Appl. Sci. 2025, 15, 8117. https://doi.org/10.3390/app15148117

AMA Style

Piras G, Muzi F, Ziran Z. A Data-Driven Model for the Energy and Economic Assessment of Building Renovations. Applied Sciences. 2025; 15(14):8117. https://doi.org/10.3390/app15148117

Chicago/Turabian Style

Piras, Giuseppe, Francesco Muzi, and Zahra Ziran. 2025. "A Data-Driven Model for the Energy and Economic Assessment of Building Renovations" Applied Sciences 15, no. 14: 8117. https://doi.org/10.3390/app15148117

APA Style

Piras, G., Muzi, F., & Ziran, Z. (2025). A Data-Driven Model for the Energy and Economic Assessment of Building Renovations. Applied Sciences, 15(14), 8117. https://doi.org/10.3390/app15148117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop