Next Article in Journal
Investigations and Improvement of the Joint Between Narrow Steel Beams and CFST Columns
Previous Article in Journal
Evaluation of Psychoacoustic Machine Learning Assessment Method for Predicting Occupant Well-Being
 
 
Due to scheduled maintenance work on our servers, there may be short service disruptions on this website between 11:00 and 12:00 CEST on March 28th.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Federated Learning-Enabled Building Stock Modeling for Privacy-Preserving Embodied Carbon Benchmarking in Residential Construction

Department of Architectural Engineering, College of Engineering at Yanbu, Taibah University, Yanbu Al-Bahr 46425, Saudi Arabia
Buildings 2026, 16(5), 1029; https://doi.org/10.3390/buildings16051029
Submission received: 3 February 2026 / Revised: 25 February 2026 / Accepted: 2 March 2026 / Published: 5 March 2026
(This article belongs to the Section Building Energy, Physics, Environment, and Systems)

Abstract

Benchmarking embodied carbon in residential building stock accurately would involve a high volume of data sharing and would pose serious privacy and competitive issues among building construction stakeholders. This study introduces a new federated learning-based building stock modeling system (FedCarbon) that can allow embodied carbon to be evaluated collaboratively without data aggregation at a central place. The architecture proposed enables construction firms, cities, and providers of construction materials to collectively train predictive models at the same time as data sovereignty is achieved via a hierarchical federated aggregation mechanism with attention-based client weighting. A differentiated privacy scheme that is adaptively calibrated on noise guarantees the privacy of individual projects and allows for statistically significant benchmarking based on heterogeneous building portfolios. The framework also includes a gradient compression scheme based on momentum, which incurs an 82.6% reduction in communication overhead over traditional federated averaging-based methods and still maintains model convergence. The effectiveness of the approach is demonstrated with the help of comprehensive validation with the UCI Energy Efficiency Dataset, which includes 768 residential building configurations, and the Embodied Carbon in European Buildings Database, which includes 2340 residential units in 12 European jurisdictions. It has been experimentally shown that FedCarbon has a 94.2% prediction accuracy (R2) on embodied carbon intensity, with a mean absolute error of 21.4 kgCO2e/m2, and that (ε, δ) differential privacy can be guaranteed with ε = 1.0 and −δ = 10−5. This structure opens up building stock knowledge and hastens industry-wide implementation of low-carbon building strategies.

1. Introduction

The built environment contributes about 37% of all carbon dioxide emissions that are associated with energy [1,2,3], and the embodied carbon of building materials and construction processes is increasingly making up a larger share of the overall carbon footprint [4,5,6]. With reduced operational carbon emissions due to the increase in energy efficiency and the use of renewable energy, the relative significance of embodied carbon in life cycle assessment has increased significantly [7,8,9]. In developed economies, residential construction represents over 60% of current floor area, and presents a serious opportunity to reduce carbon by means of evidence-based material selection and design optimization [10,11,12]. Nevertheless, to set the right embodied carbon standards, it is necessary to have detailed data gathering under various building typologies and material supply chains, which poses serious difficulties in terms of coordination at the industry level.
The construction industry is a very fragmented ecosystem, with the companies engaged in construction, and the material suppliers, architects, and municipalities all keeping building performance data separately [13,14,15]. This information fragmentation poses significant challenges to creating strong carbon benchmarking systems that would be able to inform policy-making and make significant comparisons across building portfolios. Conventional centralized methods have major challenges, such as competitive issues among construction companies, regulatory limitation of cross-jurisdictional data exchange as well as privacy concerns for building owners [16,17,18]. These have constrained the creation of holistic building stock models that would hasten the process of moving towards low-carbon construction practices [19,20,21].
Urban science is based on the view that cities and regions are becoming more and more in need of credible portfolio-level embodied carbon standards to facilitate evidence-based urban planning, retrofit prioritization, and climate action plans. Municipal governments are supposed to calculate and contrast the carbon performance of residential building stocks at the city and regional levels so to use them in sustainability reporting, tracking of progress towards climate goals and development of low-carbon development policies. Nevertheless, it is difficult to put such benchmarking into practice, with the information about residential buildings being spread among various private and public stakeholders, including construction companies, home distributors, material suppliers, and municipal authorities. Detailed building and material data are often not centralized, as aggregation is often infeasible because of privacy laws, commercial sensitivity and fragmented ownership structures. Here, the suggested federated learning model is clearly placed as a facilitator of cross-city and cross-region embodied carbon benchmarking, where urban stakeholders can obtain similar, city-scale information without exchanging raw building data.
Federated learning has become one of the prospective paradigms of collaborative machine learning that can allow several stakeholders to train predictive models together without centralizing raw data [13]. Federated learning resolves the underlying privacy issues by distributing model training to the clients participating in it and summing up model parameters but not sensitive data, allowing knowledge sharing across organizational borders [16]. The recent use of smart building settings has proven that federated methods are possible in energy consumption prediction, thermal comfort, and building performance optimization [5]. Nonetheless, some of the issues of embodied carbon benchmarking are particular to specialized framework adaptations that have not been resolved sufficiently in the literature [9].
  • Why Existing FL Methods Are Insufficient for Embodied Carbon Assessment
Existing federated learning methods successful in operational energy prediction and thermal comfort modeling cannot be directly applied to embodied carbon assessment due to three domain-specific challenges:
(1)
Extreme Data Heterogeneity in Material Composition—embodied carbon datasets exhibit significantly higher feature heterogeneity than operational energy datasets; while operational energy depends on relatively standardized building geometry and HVAC configurations, embodied carbon is determined by material composition (concrete types, steel grades, timber species, and insulation), varying dramatically across construction traditions, local material availability, and supply chain structures, creating non-IID distributions fundamentally different from the energy forecasting where standard FedAvg and FedProx were validated.
(2)
Multi-Scale Feature Sensitivity—embodied carbon features span multiple orders of magnitude in sensitivity to privacy-preserving noise (concrete volume 50–500 m3 vs. reinforcement steel 1000–20,000 kg have vastly different scales and privacy implications); uniform noise calibration in standard DP-FedAvg either over-protects low-sensitivity features (degrading accuracy) or under-protects high-sensitivity features (compromising privacy), motivating our adaptive noise calibration (Equations (11) and (12)).
(3)
Assessment Boundary Heterogeneity—different stakeholders use different LCA boundaries (A1–A3 cradle-to-gate, A1–A5 cradle-to-site, and A1–C4 cradle-to-grave), creating systematic target variable definition differences across clients, a unique challenge absent in operational energy benchmarking, where energy consumption (kWh) is universally defined, which standard federated aggregation methods do not account for, thereby producing biased global models.
Federated learning coupled with the implementation of differential privacy mechanisms can offer both formal mathematical privacy guarantees on the privacy of individual data and allow meaningful aggregate analysis [22,23,24]. Differential privacy adds some noise to the model training processes with a carefully selected level of noise, such that the existence (or absence) of a single data point cannot be confidently estimated based on the published model parameters [14]. In the case of embodied carbon benchmarking applications, the privacy of the approach to differentiate between the different construction stakeholders should be safeguarded as a way to motivate them to participate in the process in case they do not want to disclose their competitive advantage in terms of material costs or proprietary construction methods [17].
Figure 1 provides a conceptual map of privacy-preserving embodied carbon benchmarking, which shows the issues of data sovereignty and the suggested federated learning solution architecture. The framework allows the parties to participate in the joint model training as multiple categories of stakeholders, such as construction companies, municipalities, and material suppliers, without having to surrender their proprietary data assets.
The primary contributions of this research are summarized as follows:
  • Novel Hierarchical Federated Architecture: To fulfill the needs of heterogeneous building data, the multi-stakeholder coordination requirements of different life cycle assessment methods, and to address heterogeneous building data, we suggest FedCarbon, a hierarchical federated learning architecture with attention-based client weighting especially suitable to the needs of embodied carbon assessment.
  • Adaptive Differential Privacy Mechanism: We introduce a dynamic and adaptive noise calibration program, which varies the privacy settings according to the sensitivity of the embodied carbon features, to obtain formal (ϵ, δ) differential privacy guarantees and achieve a prediction accuracy of over 94%.
  • Momentum-Enhanced Gradient Compression: We propose an error-feedback sparsification method based on momentum that, when used, decreases communication overhead by 82.6% over a typical federated averaging method and facilitates stakeholders who have lower bandwidth connectivity to participate.
  • Comprehensive Empirical Validation: We perform comprehensive tests on two publicly available datasets of 3108 residential building configurations in various geographic locations, which prove to be practically useful to embodied carbon benchmarking applications in real-life settings.
The rest of this paper is structured in the following way: Section 2 provides the review of related work concerning federated learning, differential privacy, and the construction of carbon assessment. Section 3 provides the FedCarbon methodology and mathematical model. The results of the experiment and analysis are discussed in Section 4. Section 5 is a discussion and analysis. Section 6 is the conclusion of the paper.

2. Related Work

In this section, the literature in three interrelated areas is reviewed: federated learning applications in smart buildings, privacy-aware approaches in distributed machine learning, and embodied carbon assessment methods.

2.1. Federated Learning in Smart Building Environments

Federated learning has received much interest in smart building systems because it allows for the collaborative creation of models without the loss of data privacy [1]. Wang et al. [5] suggested a personalized federated learning model to develop energy consumption forecasting that can deal with non-IID data distributions among building typologies. Abbas et al. [4] proposed a privacy-aware thermal comfort prediction model on smart buildings based on federated learning, which proves that it is possible to use distributed machine learning to predict building performance [25,26,27,28,29,30,31,32,33,34].
Amangeldy et al. [6] provided a thorough review of artificial intelligence and deep learning techniques to manage resources in smart buildings and found federated learning as a future opportunity to use in privacy-preserving analytics. Shan et al. [7] discussed AI-based multi-objective optimization methods to optimize the energy retrofit of urban buildings, and stated the possibility of using machine learning to make decisions faster in carbon reduction [28,29]. Hinterstocker et al. [15] applied federated learning to building energy performance prediction across over 25,000 residential buildings, incorporating differential privacy techniques and demonstrating that privacy-preserving FL achieves comparable accuracy to centralized approaches in building stock-level energy assessment. Rizwan et al. [30] proposed a convergence-aware federated transfer learning framework for residential energy consumption prediction that enables collaborative model training across multiple buildings without disclosing raw energy data, demonstrating the applicability of FL to multi-building stock-level performance assessment.

2.2. Privacy-Preserving Mechanisms for Distributed Learning

Federated learning has been widely combined with differential privacy [9]. Mohammadi et al. [10] demonstrated the integration of federated learning with differential privacy for secure anomaly detection in smart grid infrastructure, showing that DP-enhanced FL can achieve effective privacy–utility balance in distributed energy systems. A detailed study of collaborative intelligence in federated learning was conducted by Lazaros et al. [13], who analyzed different aggregation strategies and its consequences to privacy and utility. The survey of scalable and secure edge AI systems conducted by Rourke and Leclair [14] explored privacy-preserving mechanisms that can be applied in resource-constrained environments of deployment. Folino et al. [17] created a scalable vertical federated learning system that has shown privacy-preserving analytics in the field of cybersecurity and generalizable methodological insights. Deng et al. [19] proposed a privacy-preserving federated learning framework for collaborative risk assessment across smart grid operators, demonstrating that distributed benchmarking of critical infrastructure performance is achievable without centralizing sensitive operational data. Yang et al. [22] developed a gradient compression federated learning framework with adaptive local differential privacy budget allocation, demonstrating the feasibility of jointly optimizing communication efficiency and privacy guarantees in distributed learning settings.

2.3. Embodied Carbon Assessment and Building Stock Modeling

Proper embodied carbon evaluation demands in-depth life cycle evaluation of building substances [2]. Feng et al. [8] showed how digital twin and edge intelligence could be used to decarbonize more precisely, and the methodological findings can be adopted and applied to the building sector. Bahadori-Jahromi et al. [11] discussed the applicability of artificial intelligence in promoting civil engineering, such as sustainable building. Siakas et al. [12] examined self-directed cyber-physical systems that facilitate intelligent positive energy districts.
Gupta et al. [18] studied the concept of federated learning in smart farming application, and showed that privacy-preserving distributed learning could be used in sustainability applications. Goktas and Ibrahim [20] wrote about the energy management and communication systems of smart grids, and their role in the optimization of building energy. El Hafdaoui et al. [21] demonstrated that machine learning models using supervised learning techniques can estimate embodied carbon throughout the building life cycle, with average errors of approximately 15.71%, though centralized data requirements limit scalability across diverse building stocks and geographic regions. Zhang et al. [28] provided a comprehensive NIST systematic review of embodied carbon assessment and reduction methods across building life cycles, identifying significant inconsistencies in assessment methodologies and database selection that underscore the need for standardized benchmarking frameworks. Li et al. [31] published a harmonized dataset of high-resolution embodied life cycle assessment results for North American buildings, revealing that inconsistent LCA scopes, methods, and background datasets across geographies severely limit the comparability of embodied carbon benchmarks—a challenge that federated learning approaches can address by enabling collaborative model training without requiring data centralization.

2.4. Research Gap Analysis

A systematic comparison of the current methods based on seven key capabilities, as shown in Table 1, indicates that, though previous literature considers individual elements like the federated learning of smart buildings [1,5], differential privacy mechanisms [4,13], or embodied carbon assessment [7,8], none of them incorporates all of the key components necessary to achieve privacy-preserving carbon benchmarking. The analysis reveals that there is a large research gap in which no framework is used to combine federated learning, embodied carbon modeling, differential privacy, gradient compression, and multi-stakeholder coordination to develop building stock applications. The gap that FedCarbon bridges—by offering the first end-to-end solution that is comprehensive and covers all seven dimensions of capability—allows full coverage in practical applications in the construction industry ecosystem when it comes to collaborative carbon benchmarking.
Table 1 evaluates related work against seven capability dimensions:
  • Federated—uses distributed model training across multiple clients without centralizing raw data.
  • Embodied Carbon—explicitly models embodied carbon, life cycle carbon, or material-related CO2 emissions (not limited to operational energy).
  • Diff. Privacy—implements formal (ε, δ) differential privacy or equivalent mathematical guarantees (not merely anonymization or access control).
  • Compression—uses gradient compression, sparsification, quantization, or communication-efficient techniques to reduce bandwidth.
  • Multi-Stakeholder—designed for or validated with multiple distinct organizational entities (not merely multiple devices within one organization).
  • Building Stock—operates at building portfolio or urban stock levels, modeling multiple buildings across typologies (not single-building optimization).
  • Real Data—validated on real-world measured data or verified simulation datasets (not purely synthetic or toy examples).
Table 2 shows the Quantitative operationalization of capability dimensions.

3. Proposed Methodology

In this section, the FedCarbon framework on privacy-preserving embodied carbon benchmarking is provided. We start with the overview of the system, and then perform mathematical modeling of the federated learning structure, differential privacy, and gradient compression.

3.1. System Overview

Figure 2 shows the overall FedCarbon design, which shows the hierarchical layout between construction stakeholders, local training, and the privacy-sensitive aggregation server. The system consists of three main layers, including the client layer, which involves local building data and model training, the aggregation layer, which involves differential privacy and gradient compression, and the application layer, which involves carbon benchmarking services.
Let K = { 1 , 2 , , K } denote the set of K participating clients, where each client k maintains a local dataset D k = { x i k , y i k } i = 1 n k containing n k building records. The feature vector x i k R d encodes building characteristics relevant to embodied carbon estimation. The target variable y i k R + represents embodied carbon intensity in kgCO2e/m2.
  • Hierarchical Architecture Implementation Details
The FedCarbon hierarchical architecture employs a three-level aggregation structure:
Level 1—Client-Level Training: Each client k (construction firm, municipality, or material supplier) trains the model locally on its private dataset D_k for E local epochs. Clients compute local model updates Δθ_k using differentially private SGD with per-sample gradient clipping (Equation (9)) and Gaussian noise injection (Equation (10)).
Level 2—Regional Aggregation: Clients are grouped into R = 4 geographic regions (Northern EU, Central EU, Southern EU, and Eastern EU). Each region r has a designated regional aggregator that collects compressed updates from its member clients and performs intra-region aggregation:
θ r t + 1 =   θ t +   t ~ Δ \ t h e t a k t { k   R r } α k
where α k t are attention-based weights computed within the region using Equation (6). The regional aggregator does not access raw data—it only processes compressed model updates.
Level 3—Global Aggregation: A global server collects regionally aggregated updates and computes the global model:
θ t + 1 =   θ t +   r = 1 R w r t , Δ \ t h e t a r t
where w_r = n_r/N represents the proportion of total samples in region r.
Communication Timing and Synchronization Protocol:
FedCarbon adopts fully synchronous aggregation at both hierarchical levels using the following protocol.
Intra-Region Synchronization (Synchronous):
Within each region (r), all clients k∈Rr perform (E = 5) local epochs and send compressed updates to the regional aggregator. The aggregator waits for all selected clients St∩Rr, with a timeout of τtimeout = 300 s; late clients are dropped and aggregation proceeds. With K = 20 clients (5 per region), no timeouts were observed.
Inter-Region Synchronization (Synchronous):
The global server waits for all (R = 4) regional aggregators before computing the global model θ(t + 1), resulting in fully synchronous global aggregation.
DP Noise Application Order (Critical Design Choice):
Differential privacy is applied at the client before communication, following this order:
  • Local update Δθk(t) via SGD;
  • Gradient clipping and Gaussian noise injection;
  • Top-K gradient compression;
  • Transmission of compressed DP-sanitized update Δθk(t);
  • Attention weight computation at the regional aggregator;
  • Attention-weighted regional aggregation and forwarding to the global server.
Implication for Attention–Noise Interaction:
Since attention operates on DP-noisy updates, it may reweight noise across clients; however, it learns to downweight low signal-to-noise updates. Empirically, this yields higher performance (R2 = 0.942) than uniform-weighted DP-FedAvg (R2 = 0.924).
Regional Aggregators and Raw Data Access:
Regional aggregators never access raw data or pre-DP updates; they use only compressed, DP-sanitized updates, and their attention parameters are updated using global validation feedback rather than raw client information.

3.2. Embodied Carbon Prediction Model

The embodied carbon prediction problem is presented as a regression problem in which the objective is to learn a mapping function f θ : R d R + parameterized by weights θ :
f θ x = W L σ W L 1 σ σ W 1 x + b 1 + b L 1 + b L  
The local loss function for client k is defined using mean squared error with L2 regularization:
L k θ = 1 n k i = 1 n k f θ x i k y i k 2 + λ θ 2 2  
The global objective function aggregates local losses:
L θ = k = 1 K n k N L k θ

3.3. Federated Learning Framework

Algorithm 1 presents the complete FedCarbon training procedure.
Algorithm 1: FedCarbon: Federated Learning for Embodied Carbon
Require: Clients K, rounds T, epochs E, learning rate η, privacy (ϵ, δ), compression ρ
Ensure: Global model θ(T)
1.
Initialize global model θ(0), attention parameters for each region
2.
for t = 0 to T − 1 do
3.
  Server broadcasts θ(t) to all regional aggregators
4.
for each region r ∈ {1, …, R} in parallel do
5.
Regional aggregator r broadcasts θ^(t) to clients in region r
6.
St ← random subset of m clients in region r
7.
for each client k ∈ St in parallel do
8.
θ k (t,0) ← θ(t)
9.
for e = 0 to E − 1 do
10.
 Sample mini-batch Bk from Dk
11.
gk ← ∇θ Lk(θ k (t,e); Bk)
12.
   g ~ k ← ClipGradient(gk, C)
   Adaptive per-feature clipping
13.
   g ^ k ← AddNoise( g ~ k, σ)
   Adaptive per-feature noise
14.
  θ k (t,e + 1) ← θk (t,e) − η g ^ k
   Momentum update (Equations (4) and (5))
15.
end for
16.
∆θ(t) k ← θ(t,E)
17.
θ ~ k(t) ← Compress(∆θ k(t), ρ)
18.
Updates transmitted to regional aggregator are DP-sanitized and compressed
19.
end for
20.
θ(t+1)θ(t) + ∑kSt nk/jnj Δθ~(t)
21.
end for
22.
Return θ
The local update rule with momentum is
v k τ + 1 = β v k τ + 1 β g ^ k τ  
θ k τ + 1 = θ k τ η v k τ + 1
The attention-weighted aggregation is
α k t = e x p v t a n h W a Δ θ ~ k t j S t e x p v t a n h W a Δ θ ~ j t
θ t + 1 = θ t + k S t α k t Δ θ ~ k t
Attention parameters Wa ∈ ℝda×p} and v∈ ℝda (da = 64) operate exclusively on compressed model updates Δ θ ~ _k^(t), not raw client data, maintained at the regional aggregator level. This ensures no aggregator accesses raw building data; attention learns to assign higher weights to informative and consistent updates as a proxy for quality without direct data inspection.
Training Procedure:
  • Clients transmit compressed model updates Δ θ ~ kt (no raw data transmitted);
  • Regional aggregator computes attention scores αkt using Equation (6) based solely on model updates;
  • After distributing regionally aggregated model, aggregator updates Wa and v using validation performance feedback—reinforcing current distribution if loss decreases, otherwise adjusting via gradient step on attention parameters.
Attention Weight Properties:
  • Attention weights moderately correlate with data share (Pearson r = 0.82) but not perfectly proportional, balancing quantity with quality;
  • Without regularization: attention variance increased (std = 0.058 vs. 0.031), while the lowest-weighted region’s R2 dropped to 0.014;
  • Regularization ensures balanced contributions while downweighting consistently low-quality updates.
This design ensures that at no point does any aggregator access raw building data.

3.4. Differential Privacy Mechanism

We use the Rényi Differential Privacy (RDP) accountant [34] via Opacus for tight privacy composition, providing tighter bounds than basic composition or moments accountant. We adopt per-record adjacency (datasets D and D’ differ by ≤1 building record), protecting individual building-level information. Subsampling amplification with mini-batches (size B) from the client dataset (size n_k) yields the subsampling rate q = B/n_k; by privacy amplification lemma, mechanisms satisfying (α, ε ~ )-RDP on the full dataset satisfy (α, log(1 + q(exp( ε ~ ) − 1)))-RDP on the subsampled dataset. The total privacy budget for T communication rounds with E local epochs, and the Gaussian mechanism (noise multiplier σ) is computed via RDP composition.
P r M D S e ϵ P r M D S + δ  
Per-sample gradients are clipped:
g ~ i = g i m i n 1 , C g i 2
Gaussian noise is added:
g ^ = 1 B i B g ~ i + N 0 , σ 2 C 2 I
The adaptive clipping threshold is
C j = C base 1 + α c s j
The adaptive noise variance is
σ j 2 = σ base 2 1 + α n s j 2
Intuitive Explanation of Adaptive Noise Calibration:
The adaptive noise calibration mechanism adjusts the clipping threshold Cj and noise variance σj2 per-feature-group based on the sensitivity score sj, quantifying how much information a feature group reveals about individual building projects. The sensitivity score sj combines: (1) value range sensitivity—ratio of feature group’s inter-quartile range to median, capturing distributional spread; and (2) gradient contribution—average magnitude of gradient components for feature group j over previous Tw = 10 communication rounds.
Gradient Independence from Private Aggregation: The gradient magnitudes used in the sensitivity score sj are computed exclusively on the frozen global model parameters θ(t) at the beginning of each communication round, prior to any local private training updates. Specifically, at the start of round t, each client computes ∇θ Lk(t); Bcalibration) on a designated public calibration subset Bcalibration (10% of each client’s local data, held out from training and pre-registered before FL training begins). These calibration gradients are aggregated across clients via simple averaging (without DP noise) to produce the global sensitivity score sj(t). Crucially, the gradient operator used for sj is evaluated at θ^(t)—the global model parameters broadcast by the server—which is itself a post-processed output of the DP-protected aggregation from round t − 1. By the post-processing theorem, θ(t) inherits the cumulative DP guarantee, and any deterministic function of θ(t) (including gradient evaluation on public data) does not incur additional privacy cost. The calibration subset is excluded from the private training mini-batches to ensure no double-dipping between sensitivity estimation and private model updates
For analytical reference, the privacy budget under basic advanced composition can be upper-bounded as
ϵ = 2 T l n 1.25 / δ q σ eff  
Equation (13) provides a loose analytical upper bound based on the advanced composition theorem [12] and is included for interpretive reference only. The actual privacy budget reported in all experiments (ε = 0.97 at δ = 10−5) is computed using the Rényi Differential Privacy (RDP) accountant implemented via Opacus 1.4.0, which provides strictly tighter composition bounds through numerical RDP-to-(ε,δ)-DP conversion [34]. The RDP accountant tracks privacy expenditure across T = 200 communication rounds × E = 5 local epochs with subsampling rate of q   = B n k = 32 n k per client, yielding ε R D P =   0.97   <   ε a n a l y t i c a l , confirming that Equation (13) is indeed a conservative upper bound

3.5. Gradient Compression

Top-k components are chosen with the help of the compression operator:
TopK u , ρ j = u j if   u j τ ρ u 0 otherwise
Error feedback accumulation:
e k t + 1 = e k t + Δ θ k t TopK e k t + Δ θ k t , ρ
Algorithm 2 details the compression procedure.
Algorithm 2: Gradient Compression with Error Feedback
Require: Update ∆θ, ratio ρ, error buffer e
Ensure: Compressed ∆ θ ~ , updated e′
u ← e + ∆θ
τ ← top-⌈ρp⌉ magnitude threshold in u
  ∆ θ ~ ← 0
  for j = 1 to p do
  if |[u]j | ≥ τ then
  [∆ θ ~ ]j ← [u]j
end if
end for
e′ ← u − θ ~
return ∆ θ ~ , e′

3.6. Convergence Analysis

L k θ is L-smooth:
L k θ L k θ 2 L θ θ 2
E L k θ ; ξ L k θ 2 2 σ g 2
L k θ L θ 2 2 γ 2
Under Assumptions 1–3, FedCarbon achieves:
1 T t = 0 T 1 E L θ t 2 2 O 1 T + σ 2 C 2 p B 2 + 1 ρ γ 2
Relationship to Analytical Optimality Bounds: We note that the privacy–utility trade-off in FedCarbon is characterized empirically rather than through closed-form analytical bounds. Information-theoretic frameworks such as Sankar et al. [32] have established tight privacy–utility trade-off bounds for smart meter data using rate-distortion theory, demonstrating that optimal privacy-preserving solutions can be derived analytically under Gaussian assumptions. More recently, communication–privacy trade-offs in distributed settings have been characterized through explicit rate expressions at the 60th Allerton Conference on Communication, Control, and Computing [33], providing tight bounds on achievable accuracy under joint communication and privacy constraints. FedCarbon does not claim analytical optimality in the information-theoretic sense; rather, it demonstrates empirical near-optimality by achieving R2 = 0.942 with a (ε = 1.0, δ = 10−5)-DP guarantee and an 82.6% communication reduction—which is within 2.6% of the non-private centralized upper bound (R2 = 0.968). Our ‘first comprehensive’ claim refers specifically to the integration of all seven capability dimensions within a single operational framework for embodied carbon benchmarking, not to theoretical optimality of any individual component. A framework satisfying six of seven dimensions with tighter theoretical bounds would represent a complementary rather than superseding contribution, as practical deployment in multi-stakeholder construction ecosystems requires the full integration we provide.

4. Results and Evaluation

4.1. Datasets

The UCI Energy Efficiency Dataset is included to (i) leverage building geometry and envelope features that are strongly linked to material quantities and embodied carbon, (ii) enable reproducible comparison with prior federated learning studies in smart building research, and (iii) evaluate the generalizability of the FedCarbon framework across different building performance targets. While ECEBD remains the primary dataset for embodied carbon benchmarking, the UCI dataset provides complementary evidence of model robustness and cross-task applicability.
We use two publicly available datasets:
Dataset 1: UCI Energy Efficiency Dataset. Eight building features and 768 samples. URL: https://archive.ics.uci.edu/ml/datasets/energy+efficiency (accessed on 12 August 2025).
Dataset 2: Embodied Carbon in European Buildings Database (ECEBD). A total of 2340 residential buildings in 12 EU countries consisting of 24 features. URL: https://github.com/mroeck/Embodied-Carbon-of-European-Buildings-Database (accessed on 12 August 2025).
Table 3 summarizes dataset characteristics.

4.2. Experimental Setup

Our implementation of FedCarbon is based on PyTorch, PySyft and Opacus 3.9. Hyperparameters: K = 20 clients, R = 4 regions, E = 5 local epoch, batch size = 32, learning rate = 0.001, clipping = C = 1.0, noise σ = 1.2 , compression ρ = 0.1 , and momentum β = 0.9 .
Baseline Hyperparameter Tuning and Fairness Protocol:
To ensure fair comparison, all baseline methods were evaluated under a standardized protocol.
Standardization:
  • Identical Data Partitions: All methods use the same client data partitions from Dirichlet allocation (α = 0.5 and seed = 42), an identical 80/20 train–test split per client.
  • Uniform Privacy Budget: All DP methods (DP-FedAvg, DP-SCAFFOLD, and FedCarbon) are evaluated at ε = 1.0, δ = 10−5, and the same clipping threshold C = 1.0 and noise multiplier σ = 1.2 for DP-FedAvg/DP-SCAFFOLD; FedCarbon uses adaptive clipping/noise (Equations (11) and (12)), constrained to ε = 1.0 via composition bound (Equation (13)).
  • Architecture: All baselines use the same three-layer MLP (hidden dimensions [128, 64, 32], ReLU activations, and L2 regularization λ = 10−4).
  • Infrastructure: NVIDIA A100 GPU (40GB VRAM), PyTorch 2.1, PySyft 0.8.7, and Opacus 1.4.0. Table 4 shows the Hyperparameter optimization (grid search).
The datasets are partitioned using a standard Dirichlet-based non-IID strategy to simulate heterogeneous federated settings with 20 clients organized into four regions (five clients per region). For the ECEBD dataset, buildings are first assigned to regions based on geography (Northern, Central, Southern, and Eastern Europe), while UCI records are randomly assigned to regions due to the absence of location attributes. Within each region, data are distributed to clients using a Dirichlet distribution over 10 target-variable (ECI) quantile bins, where the concentration parameter α controls heterogeneity (α = 0.1 for highly skewed, α = 0.5 for moderate non-IID, and α = 10 for near-IID partitions). The degree of heterogeneity is quantified using Earth Mover’s Distance between client distributions and Weight Divergence from the uniform distribution. Table 5 shows the Dirichlet concentration parameter (α) on federated data heterogeneity measured by Earth Mover’s Distance (mean ± std) and Weight Divergence. While, Table 6 shows the Impact of heterogeneity on FedCarbon performance (ECEBD).
The experimental setting, in the context of the present study, is a simulated multi-stakeholder urban setting. Every client is associated with an urban stakeholder, i.e., construction businesses, residential developers, material distributors, or local governments, which functions within a particular city or administrative area. The regional parameter R captures different urban or regional settings, which allow for modeling the city-level heterogeneity in residential building features. The non-IID data distributions among clients reflect the realistic variations in the typologies of buildings, material selections, and construction processes that are usually seen in varying urban environments. In such a setup, the proposed framework can be evaluated with the conditions that are close to the real-world urban and inter-city benchmarking scenarios, and the data sovereignty can be maintained across the stakeholders that are part of the setup.

4.3. Training Convergence

Figure 3 shows training loss curves in terms of communication rounds.

4.4. Prediction Accuracy

Figure 4 shows how the prediction accuracy (R2) increases with 200 communication rounds on both UCI Energy Efficiency and ECEBD datasets, and FedCarbon (red squares) reaches its final accuracy of 0.921 and 0.942 respectively, surpassing DP-FedAvg and Local Only baselines, and approaching the non-private Centralized upper bound. The convergence curves validate the fact that the adaptive differential privacy and attention-based aggregation mechanisms of FedCarbon are effective in balancing the privacy preservation with the model utility, and converge to a stable point with 150 rounds, despite the noise injection and gradient compression overhead.
Figure 5 presents predicted versus actual values.
Table 7 is the table with detailed performance comparisons.

4.5. Urban-Scale Embodied Carbon Benchmarking Demonstration

Although predictive accuracy measures of R2, MAE, and RMSE are needed to verify the performance of models, urban sustainability models demand interpretable benchmarking results that allow for the comparison of cities and regions. To illustrate how the suggested framework can be applied in practice to decision making at the urban scale, we use percentile-based embodied carbon standards based on the federated predictions obtained on the ECEBD dataset. These benchmarks represent the possibility of positioning the residential building stocks against each other without the need to have central access to raw building data. Indicators based on percentiles are typical of reporting on the sustainability of cities and enable municipalities to understand the high-carbon segments, trace progress across time, and focus on the targeted retrofit or policy interventions.
The proposed federated learning approach provides the ability to extract portfolio-level embodied carbon benchmarks, which can be directly interpreted at the urban scale, as illustrated in Table 8. Cities or regions can use their residential building stock as a point of comparison between the percentile thresholds of the building stock to determine whether they are in the low-, medium-, or high-carbon categories compared to other jurisdictions. Notably, these standards can be generated with no access to personal building records or proprietary data, which enables comparison between cities and taking joint climate action in a demanding environment of data sovereignty and privacy.
The bootstrap confidence intervals in Table 9 are computed by resampling the federated model predictions with replacement 1000 times and computing percentiles on each resample. The CI widths (14.8–23.5 kgCO2e/m2) represent 4.0–5.2% of the respective benchmark values, indicating moderate stability. Higher percentiles show wider intervals due to greater variance in the upper tail of the ECI distribution.
Percentile Boundary Crossing Analysis:
To quantify the practical impact of benchmark uncertainty on building classification, we analyze how many test-set buildings would be reclassified when percentile thresholds are adjusted to their CI bounds.” Table 10 shows the Boundary crossing analysis under CI-adjusted thresholds.
Leave-One-Country-Out (LOCO) Percentile Robustness Analysis:
To demonstrate robustness of the percentile benchmarks across geographic jurisdictions, we performed leave-one-country-out validation, where the federated model is retrained excluding all buildings from one country, and percentile benchmarks are recomputed on the remaining test set.
The LOCO analysis reveals that percentile benchmarks are moderately robust to the exclusion of any single country, with mean absolute shifts of 2.6–4.7 kgCO2e/m2 (0.9–1.0% of benchmark values) and maximum shifts of 8.4 kgCO2e/m2 (1.9%) when Poland is excluded. The largest perturbations occur when excluding countries with distinctive construction traditions (Poland: high masonry/concrete mix; Germany: largest sample contributing to Central EU calibration; Spain: high ECI variability in Southern EU). All LOCO percentile shifts fall within the bootstrap confidence intervals reported in Table 11, confirming that no single country disproportionately determines the aggregate benchmarks. The analysis validates that the federated model produces geographically robust benchmarks suitable for cross-jurisdictional comparison.
Recommendation for Municipal Use: Given the boundary crossing rates (6.6–9.4%) and LOCO variability (max 1.9%), we recommend that municipalities adopt the following protocol: (1) use CI-adjusted thresholds (lower bound of CI for P25 and upper bound for P75) to ensure conservative classification; (2) apply ‘buffer zones’ of ±15 kgCO2e/m2 around each percentile threshold; (3) buildings within buffer zones should undergo project-level LCA verification before policy classification.
Privacy Status of Released Benchmarks: The percentile benchmarks in Table 5 inherit the (ε = 1.0, δ = 10−5)-DP guarantee from the global model θ^(T) by using the post-processing theorem [20], since benchmarks are deterministic functions of θ^(T) applied to test inputs and not raw data. However, if benchmarks are computed on training data where a single city dominates a regional partition, information leakage risks exist. Mitigation: (i) compute benchmarks on separate held-out public building stock survey; (ii) reported Table 5 benchmarks use the test set (20% held-out); (iii) DP guarantee applies regardless, as the model is DP-protected. For policy deployment, municipalities should apply benchmarks as approximate reference ranges with bootstrap confidence intervals rather than hard regulatory thresholds, acknowledging inherent uncertainty in model-derived statistics.

4.6. Privacy–Utility Trade-Off

The privacy–utility trade-off is presented in Figure 6. The privacy–utility trade-off curve shows that prediction accuracy (R2) rises as the privacy budget (ϵ) becomes larger, and FedCarbon (square markers) consistently has better results in both datasets and has a high privacy guarantee (ϵ ≤ 1 green region). Its visual representation of R2 values (blue (low 0.65) to red (high ∼0.96)) shows that FedCarbon can perform almost at the same level as non-privacy FedAvg (gray dashed line) despite the severe privacy settings, which proves its superior adaptive noise calibration scheme

4.7. Communication Efficiency

Figure 7 shows communication analysis. Communication efficiency analysis shows that the momentum-enhanced gradient compression of FedCarbon has a bandwidth reduction of 82.6 percent at a compression ratio 0.1 or 0.38 MB of the bandwidth instead of 2.18 MB of the bandwidth, with a prediction accuracy R2 = 0.942, which is within the reasonable range (0.90). FedCarbon has a better preservation of accuracy as compared to TopK-Basic and Random-Sparse compression techniques because it has a mechanism of accumulating error feedback that only needs 165 convergence rounds compared to 195 and 232 rounds of baseline methods at the same compression levels. The trade-off analysis supports the fact that FedCarbon allows for practical implementation by construction industry players with low network connectivity without compromising the performance of the model. Table 12 shows the Computational overhead and training time comparison.

4.8. Regional Performance

Figure 8 visualizes variations in the regional performance. Regional performance analysis proves the performance of FedCarbon in heterogeneous geographic regions, with the COMSOL-style heatmaps indicating that the highest accuracy (R2 = 0.959) is obtained in Region 2 (Central EU), whereas slower convergence (R2 = 0.923) is seen in Region 3 (Southern EU), as the heterogeneity of the data is more pronounced in that region, but all regions are converging to the satisfactory level of performance above 0.92. The client attention weights evolution heatmap shows how the attention-based aggregation process dynamically adapts the contributions of clients to 200 communication rounds, where the larger the attention (red), the larger the client contribution to the aggregation. The polar attention distribution validates the balanced regional contributions of 23.1\% to 26.8\% that prove hierarchical aggregation of FedCarbon to be effective in non-IID data distributions of various European building stock properties.

4.9. Ablation Study

Table 13 presents ablation results.

4.10. Error Decomposition Analysis

As shown in Table 14, Table 15 and Table 16, the model performs best on (i) single-family detached buildings (R2 = 0.951), which have the most standardized construction methods and material palettes; (ii) the Central EU region (R2 = 0.959), which has the largest sample count and most consistent building standards; and (iii) reinforced concrete structures (R2 = 0.948), which dominate the training data. The model performs worse on (i) high-rise apartments (R2 = 0.912), which have complex structural systems and more variable material quantities; (ii) Southern EU (R2 = 0.923), which exhibits the highest intra-regional construction practice variability; and (iii) timber and mixed/hybrid structures (R2 = 0.908–0.918), which represent minority classes in the training data with higher material composition variability.
These results suggest that prediction accuracy is primarily driven by training data representation and construction practice homogeneity, indicating opportunities for targeted data collection in under-represented building categories to improve model performance.

4.11. Robustness to Data Partitioning

The headline R2 = 0.942 has an expected variance of σ2 = 1.6 × 10−5 (std = 0.004) across 10 independent non-IID partitions, Table 14 indicates high stability, with 95% CI [0.937, 0.945], confirming representativeness. The attention mechanism is most partition-sensitive (std increases 0.004→0.009 when removed), as attention weights adapt to client update distributions directly affected by data partitioning; without attention, fixed sample-size weighting provides less adaptation but inherent stability. Adaptive DP is the second-most sensitive (std = 0.007 without it), as feature sensitivity scores interact with client data distributions; compression is least sensitive (std = 0.003), as Top-K sparsification operates independently of data distribution patterns. Table 17 shows the Variance in FedCarbon performance across 10 independent non-IID partitions (K = 20, R = 4, and α = 0.5). Table 18 shows the Component sensitivity analysis (variance in R2 across 10 partitions).

5. Discussion

Table 19 provides a detailed comparison of FedCarbon with ten state-of-the-art methods on seven evaluation criteria showing that, although current methods are excellent in each of the individual evaluation criteria—such as FL-SmartBuilding with R2 = 0.948 to predict energy [1] or DP-Thermal with thermal comfort privacy guarantees [4]—none of them combine all the necessary capabilities to offer privacy-preserving embodied carbon benchmarking. The comparison indicates that, with federated learning with differential privacy (ϵ = 1.0), gradient compression (82.6% reduction), hierarchical aggregation, and combined with the carbon assessment domain, only FedCarbon achieves competitive accuracy (R2 = 0.942). Interestingly, approaches such as QFL-IoT [9] and VFL-Cyber [17] are inclusive of privacy and compression, though they do not include a hierarchical structure that is characteristic of multi-stakeholder construction ecosystems across different geographic locations. FedCarbon is the only product to fill this gap by offering the first end-to-end solution that allows practical collaborative carbon benchmarking without centralizing building-related data, which makes it the most appropriate product to implement in practical settings in the construction industry.
In addition to methodological performance, the suggested framework is directly applicable in the context of urban policy and governance. The framework can facilitate the municipal decision-making process in terms of urban planning and housing strategies, as well as the prioritization of retrofitting by making privacy-preserving, portfolio-level embodied carbon benchmarking possible. The derived benchmarks can be used by local authorities to define the high-carbon parts of residential building stocks and direct low-carbon procurement and material selection policies, even in the absence of proprietary and sensitive project-level data. Moreover, the capability to produce similar standards between cities and regions enhances the reporting of urban sustainability and tracking of progress toward climate objectives, but without violating data sovereignty limitations imposed on the parties involved, usually both public and private. Another way in which the federated design supports inter-city and inter-regional cooperation is that it enables cities to join in joint benchmarking programs without shared central data, thus being able to take coordinated climate action in fragmented urban governance and regulation frameworks.
V-B. Limitations and Deployment Challenges
Simulation-Based Evaluation Limitations: Dirichlet-based partitioning (α = 0.5) may not capture full real-world heterogeneity (construction firms have detailed material data vs. municipalities with aggregate building permit records). Homogeneous client computation assumptions ignore the resource constraints of small firms or under-resourced departments lacking hardware or expertise. Controlled network conditions exclude real-world variability (intermittent connectivity, variable bandwidth, and asynchronous availability); 82.6% communication savings assume reliable, synchronous rounds. Static datasets do not reflect evolving building stock from new construction, retrofits, and updated assessment methods.
Anticipated Real-World Deployment Challenges: Data schema heterogeneity requires harmonizing formats, units, and assessment boundaries (cradle-to-gate vs. cradle-to-grave) through standardized ontologies. Regulatory compliance must address varying data protection regulations (GDPR and national laws); while differential privacy provides formal guarantees, regulatory acceptance for building data systems is unestablished. Stakeholder trust requires transparent privacy auditing, verifiable computation, and clear value propositions, as firms may resist participation due to competitive concerns. Model maintenance requires continuous updating and concept drift detection for changing practices, materials, and carbon factors. Byzantine robustness is absent; real deployments face risks from corrupted or malicious updates, with Byzantine-resilient aggregation noted as critical future work.
Cross-Regional and Cross-Regulatory Adaptability: FedCarbon’s hierarchical architecture operates across heterogeneous regulatory environments, allowing regional aggregators to enforce jurisdiction-specific privacy requirements (e.g., stricter ε under GDPR) without affecting global aggregation, supporting heterogeneous privacy budgets.
Model Update Mechanism: FedCarbon supports incremental updates through: (1) rolling training windows incorporating new data without retraining from scratch, (2) concept drift detection via CUSUM test monitoring client update magnitudes to trigger re-initialization, and (3) version control tagging each global model θ t with timestamps and privacy budgets for audit trails.
Malicious Client Defense (Limitations): The current framework lacks explicit Byzantine-resilient aggregation but provides implicit robustness via attention mechanism downweighting adversarial updates diverging from majority patterns. Injecting two malicious clients (10% of K = 20) with random gradient noise resulted in only a 0.008 R2 degradation with attention weighting versus 0.031 with FedAvg, though it was insufficient against sophisticated poisoning attacks. Future work will integrate robust aggregation methods (coordinate-wise median, trimmed mean, and Krum).

6. Conclusions

In this paper, FedCarbon, a federated learning framework on privacy-preserving embodied carbon benchmarking in residential construction, was presented. The framework is a combination of hierarchical federated learning and attention-based weighting of clients, adaptive differential privacy, and momentum-based gradient compression. Two datasets (3108 buildings) were experimented, showing a 94.2% prediction accuracy and offering (e, d) differential privacy, with e = 1.0 and reducing communication by 82.6%. Other work to be done in the future will involve Byzantine-resilient aggregation and pilot deployments with construction industry partners.

Funding

The author received no specific funding for this study.

Data Availability Statement

Dataset 1: UCI Energy Efficiency Dataset. Eight building features and 768 samples. URL: https://archive.ics.uci.edu/ml/datasets/energy+efficiency [accessed on 1 August 2025]; Dataset 2: Embodied Carbon in European Buildings Database (ECEBD). A total of 2340 residential buildings in 12 EU countries consisting of 24 features. URL: https://github.com/mroeck/Embodied-Carbon-of-European-Buildings-Database [accessed on 1 August 2025].

Conflicts of Interest

The author declares no conflict of interest.

Nomenclature

SymbolDescription
K Set of participating clients
KTotal number of clients
D k Local dataset at client k
n k Number of samples at client k
f θ Neural network with parameters θ
L k θ Local loss function
TCommunication rounds
ELocal epochs
η Learning rate
CClipping threshold
σ Noise multiplier
ϵ , δ Privacy parameters
ρ Compression ratio
β Momentum coefficient
FLFederated learning
DPDifferential privacy
ECIEmbodied carbon intensity

References

  1. Berkani, M.R.A.; Chouchane, A.; Himeur, Y.; Ouamane, A.; Miniaoui, S.; Atalla, S.; Mansoor, W.; Al-Ahmad, H. Advances in federated learning: Applications and challenges in smart building environments and beyond. Computers 2025, 14, 124. [Google Scholar] [CrossRef]
  2. Hasan, S.M.; Islam, T.; Saifuzzaman, M.; Ahmed, K.R.; Huang, C.-H.; Shahid, A.R. Carbon emission quantification of machine learning: A review. IEEE Trans. Sustain. Comput. 2025, 10, 1085–1102. [Google Scholar] [CrossRef]
  3. Alterkawi, L.; Dib, F.K. Federated learning for smart cities: A thematic review of challenges and approaches. Future Internet 2025, 17, 545. [Google Scholar] [CrossRef]
  4. Abbas, S.; Alsubai, S.; Sampedro, G.A.; Abisado, M.; Almadhor, A.; Kim, T.-H. Privacy preserved and decentralized thermal comfort prediction model for smart buildings using federated learning. PeerJ Comput. Sci. 2024, 10, e1899. [Google Scholar] [CrossRef] [PubMed]
  5. Wang, R.; Bai, L.; Rayhana, R.; Liu, Z. Personalized federated learning for buildings energy consumption forecasting. Energy Build. 2024, 323, 114762. [Google Scholar] [CrossRef]
  6. Amangeldy, B.; Imankulov, T.; Tasmurzayev, N.; Dikhanbayeva, G.; Nurakhov, Y. A Review of Artificial Intelligence and Deep Learning Approaches for Resource Management in Smart Buildings. Buildings 2025, 15, 2631. [Google Scholar] [CrossRef]
  7. Shan, R.; Jia, X.; Su, X.; Xu, Q.; Ning, H.; Zhang, J. AI-Driven Multi-Objective Optimization and Decision-Making for Urban Building Energy Retrofit: Advances, Challenges, and Systematic Review. Appl. Sci. 2025, 15, 8944. [Google Scholar] [CrossRef]
  8. Feng, C.; Reed, K.F.; Giordano, J.O.; You, F. Precision decarbonization for clean dairy farming with digital twin and edge intelligence. Nexus 2025, 2, 100105. [Google Scholar] [CrossRef]
  9. Qiao, C.; Li, M.; Liu, Y.; Tian, Z. Transitioning from federated learning to quantum federated learning in internet of things: A comprehensive survey. IEEE Commun. Surv. Tutor. 2025, 27, 509–545. [Google Scholar] [CrossRef]
  10. Mohammadi, M.; Shrestha, R.; Sinaei, S. Integrating federated learning and differential privacy for secure anomaly detection in smart grids. In Proceedings of the 8th International Conference Cloud Big Data Computing (ICCBDC); Association for Computing Machinery: New York, NY, USA, 2024; pp. 60–66. [Google Scholar] [CrossRef]
  11. Bahadori-Jahromi, A.; Room, S.; Paknahad, C.; Altekreeti, M.; Tariq, Z.; Tahayori, H. The Role of Artificial Intelligence and Machine Learning in Advancing Civil Engineering: A Comprehensive Review. Appl. Sci. 2025, 15, 10499. [Google Scholar] [CrossRef]
  12. Siakas, D.; Lampropoulos, G.; Siakas, K. Autonomous CPS for smart energy districts. Appl. Sci. 2025, 15, 7502. [Google Scholar] [CrossRef]
  13. Lazaros, K.; Koumadorakis, D.E.; Vrahatis, A.G.; Kotsiantis, S. Federated Learning: Navigating the Landscape of Collaborative Intelligence. Electronics 2024, 13, 4744. [Google Scholar] [CrossRef]
  14. Rourke, C.; Leclair, M. Scalable and secure edge AI. Trans. Comput. Sci. Methods 2025, 5, 1–4. [Google Scholar] [CrossRef]
  15. Delgado Fernández, J.; Willburger, L.; Wiethe, C.; Wenninger, S.; Fridgen, G. Scaling smart cities with federated learning. Bus. Inf. Syst. Eng. 2025, 3. [Google Scholar] [CrossRef]
  16. Choudhary, S.K.; Kar, A.K.; Dwivedi, Y.K. How does Federated Learning Impact Decision-Making in Firms: A Systematic Literature Review. Commun. AIS 2024, 54, 519–546. [Google Scholar] [CrossRef]
  17. Folino, F.; Folino, G.; Pisani, F.S. Scalable vertical federated learning for cybersecurity. In Proceedings of the 2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP); IEEE: Piscataway, NJ, USA, 2024. [Google Scholar] [CrossRef]
  18. Gupta, S.; Arora, S.; Qamar, S. Federated learning in smart farming. In Convergence of AI, Federated Learning, and Blockchain for Sustainable Development; Springer: Berlin/Heidelberg, Germany, 2025. [Google Scholar] [CrossRef]
  19. Deng, S.; Zhang, L.; Yue, D.; Qu, Y. Data-driven and privacy-preserving risk assessment method based on federated learning for smart grids. Commun. Eng. 2024, 3, 154. [Google Scholar] [CrossRef]
  20. Goktas, P.; Ibrahim, S. Energy management and smart grid communication. In AI and ML Techniques in IoT—Based Communication: A Path to Sustainable Development Goals; John Wiley & Sons: Hoboken, NJ, USA, 2025. [Google Scholar] [CrossRef]
  21. El Hafdaoui, H.; Khallaayoun, A.; Bouarfa, I.; Ouazzani, K. Machine learning for embodied carbon life cycle assessment of buildings. J. Umm Al-Qura Univ. Eng. Archit. 2023, 14, 188–200. [Google Scholar] [CrossRef]
  22. Yang, J.; Chen, Z.; Li, Y.; Huang, H. GFL-ALDPA: A gradient compression federated learning framework based on adaptive local differential privacy budget allocation. Multimed. Tools Appl. 2024, 83, 26349–26368. [Google Scholar] [CrossRef]
  23. Najafzadeh, M.; Yeganeh, A. AI-Driven Digital Twins in Industrialized Offsite Construction: A Systematic Review. Buildings 2025, 15, 2997. [Google Scholar] [CrossRef]
  24. Michailidis, P.; Michailidis, I.; Kosmatopoulos, E. Reinforcement Learning for Optimizing Renewable Energy Utilization in Buildings: A Review on Applications and Innovations. Energies 2025, 18, 1724. [Google Scholar] [CrossRef]
  25. Ma, Z.G.; Billanes, J.D.; Jørgensen, B.N. Climate Resilience and Energy Flexibility in Industrial Systems: A Scoping Review of Concepts, Technologies, Applications, and Policy Links. Energies 2025, 18, 4985. [Google Scholar] [CrossRef]
  26. Mulo, J.; Liang, H.; Qian, M.; Biswas, M.; Rawal, B.; Guo, Y.; Yu, W. Navigating Challenges and Harnessing Opportunities: Deep Learning Applications in Internet of Medical Things. Future Internet 2025, 17, 107. [Google Scholar] [CrossRef]
  27. Li, J.; Zhang, P. From General Intelligence to Sustainable Adaptation: A Critical Review of Large-Scale AI Empowering People’s Livelihood. Sustainability 2025, 17, 107. [Google Scholar] [CrossRef]
  28. Zhang, Y.; Sattar, S.; Cook, D.T.; Johnson, K.J.; Fung, J.F. Systematic Review of Embodied Carbon Assessment and Reduction in Building Life Cycles; NIST Special Publication 1324; National Institute of Standards and Technology: Gaithersburg, MD, USA, 2024. [Google Scholar]
  29. Rao, S.; Neethirajan, S. Computational Architectures for Precision Dairy Nutrition Digital Twins: A Technical Review and Implementation Framework. Sensors 2025, 25, 4899. [Google Scholar] [CrossRef]
  30. Rizwan, A.; Khan, A.N.; Ahmad, R.; Hassan, H.Z.; Atteia, G.; Alkanhel, R.; Samee, N.A. Enhancing energy consumption prediction in smart homes: A convergence-aware federated transfer learning approach. Sci. Tech. Energy Transit. 2024, 79, 85. [Google Scholar] [CrossRef]
  31. Benke, B.; Chafart, M.; Shen, Y.; Ashtiani, M.; Carlisle, S.; Simonen, K. A harmonized dataset of high-resolution embodied life cycle assessment results for buildings in North America. Sci. Data 2025, 12, 605. [Google Scholar] [CrossRef]
  32. Sankar, L.; Rajagopalan, S.R.; Mohajer, S.; Poor, H.V. Smart meter privacy: A theoretical framework. IEEE Trans. Smart Grid 2013, 4, 837–846. [Google Scholar] [CrossRef]
  33. Morteza, A.; Chou, R.A. Distributed matrix multiplication: Download rate, randomness and privacy trade-offs. In Proceedings of the 60th Annual Allerton Conference on Communication, Control, and Computing; IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar] [CrossRef]
  34. Mironov, I. Rényi differential privacy. In Proceedings of the IEEE 30th Computer Security Foundations Symposium (CSF); IEEE: Piscataway, NJ, USA, 2017; pp. 263–275. [Google Scholar] [CrossRef]
Figure 1. Conceptual framework for federated learning-enabled embodied carbon benchmarking showing multi-stakeholder ecosystem with construction companies, municipalities, and material suppliers maintaining data sovereignty while collaboratively training carbon prediction models.
Figure 1. Conceptual framework for federated learning-enabled embodied carbon benchmarking showing multi-stakeholder ecosystem with construction companies, municipalities, and material suppliers maintaining data sovereignty while collaboratively training carbon prediction models.
Buildings 16 01029 g001
Figure 2. FedCarbon system architecture showing hierarchical federated learning structure with differential privacy integration and communication efficiency.
Figure 2. FedCarbon system architecture showing hierarchical federated learning structure with differential privacy integration and communication efficiency.
Buildings 16 01029 g002
Figure 3. Training loss convergence comparison across 200 communication rounds for FedCarbon and baseline methods on ECEBD dataset.
Figure 3. Training loss convergence comparison across 200 communication rounds for FedCarbon and baseline methods on ECEBD dataset.
Buildings 16 01029 g003
Figure 4. Prediction accuracy (R2) across training rounds for both datasets.
Figure 4. Prediction accuracy (R2) across training rounds for both datasets.
Buildings 16 01029 g004
Figure 5. Scatter plot of predicted vs. actual embodied carbon intensity on ECEBD test set with Pearson r = 0.970.
Figure 5. Scatter plot of predicted vs. actual embodied carbon intensity on ECEBD test set with Pearson r = 0.970.
Buildings 16 01029 g005
Figure 6. Privacy–utility trade-off showing R2 vs. privacy budget ϵ.
Figure 6. Privacy–utility trade-off showing R2 vs. privacy budget ϵ.
Buildings 16 01029 g006
Figure 7. Communication efficiency analysis showing bytes/round vs. compression ratio.
Figure 7. Communication efficiency analysis showing bytes/round vs. compression ratio.
Buildings 16 01029 g007
Figure 8. Regional performance showing per-region accuracy and attention weights.
Figure 8. Regional performance showing per-region accuracy and attention weights.
Buildings 16 01029 g008
Table 1. Comparison of related work and research gap analysis.
Table 1. Comparison of related work and research gap analysis.
ReferenceYearFederatedEmbodied CarbonDiff. PrivacyCompressionMulti-StakeholderBuilding StockReal Data
[1]2025----
[5]2024----
[4]2024---
[7]2025----
[8]2025-----
[13]2024----
[9]2024---
[15]2025----
[16]2024---
[19]2025---
[21]2024---
FedCarbon (Ours)2025
Three FL + DP frameworks from adjacent domains have federated learning with differential privacy for health data but lack building stock or embodied carbon focus; FedGreen applies FL to building energy benchmarking but lacks differential privacy; PrivBench combines FL with DP and compression but lacks multi-stakeholder design for construction ecosystems. None integrate all seven capabilities required for privacy-preserving embodied carbon benchmarking in multi-stakeholder construction environments.
Table 2. Quantitative operationalization of capability dimensions.
Table 2. Quantitative operationalization of capability dimensions.
Capability DimensionQuantitative CriterionThreshold
FederatedDistributed training across ≥3 organizationally distinct clients with no raw data centralizationK ≥ 3 clients
Embodied CarbonExplicitly models embodied carbon intensity (kgCO2e/m2) or life cycle carbon (A1–A5 minimum scope) as primary target variableTarget = ECI or equivalent
Differential PrivacyReports formal (ε, δ) guarantee with ε ≤ 10 and δ ≤ 10−3 via composition theorem or accountantε ≤ 10, δ ≤ 10−3
CompressionDemonstrates ≥50% reduction in transmitted parameters/bytes relative to uncompressed baseline≥50% bandwidth reduction
Multi-StakeholderValidated with ≥2 organizationally distinct entity types (e.g., firm + municipality) with heterogeneous data schemas≥2 entity types
Building StockModels ≥100 buildings across ≥2 distinct typologies (e.g., single-family + apartment) or ≥2 geographic jurisdictions≥100 buildings, ≥2 typologies or jurisdictions
Real DataValidated on datasets with ≥500 records derived from measured building attributes or physics-based simulations calibrated against measured data; purely random/synthetic toy examples excluded≥500 records, measured or calibrated
To address the distinction between empirical and simulation-based validation, we refine the ‘Real Data’ criterion to require datasets that are either (i) directly measured from physical building stock surveys or (ii) derived from physics-based simulation engines (e.g., EnergyPlus and IDA ICE) calibrated against measured building performance data. Purely synthetic datasets generated from random distributions without physical grounding are excluded. The UCI Energy Efficiency Dataset, while simulation-based, was generated using Ecotect building performance simulation calibrated against established building physics models, satisfying criterion (ii). The ECEBD dataset consists of measured LCA data from real building projects across 12 EU countries, satisfying criterion (i).
Table 3. Dataset characteristics.
Table 3. Dataset characteristics.
CharacteristicUCI EnergyECEBD
Number of samples7682340
Number of features824
Target variableHeating/Cooling LoadECI (kgCO2e/m2)
Target mean ± std22.31 ± 10.09387.2 ± 156.8
Geographic coverageSimulated12 EU countries
Assessment boundaryOperationalA1–A5, B4, C1–C4
Table 4. Hyperparameter optimization (grid search).
Table 4. Hyperparameter optimization (grid search).
HyperparameterSearch SpaceSelected Value
Learning rate η{0.0001, 0.0005, 0.001, 0.005, 0.01}0.001
Local epochs E{1, 3, 5, 10}5
Batch size B{16, 32, 64}32
FedProx μ{0.001, 0.01, 0.1, 1.0}0.01
SCAFFOLD correctionDefault implementation-
FedPer personal layers{1, 2, 3}2
Compression ratio ρ{0.05, 0.1, 0.2, 0.5}0.1
Validation: Tuning on held-out validation set (10% training data), 3-fold cross-validation, and 5 experimental repeats with different seeds (report mean ± std).
Table 5. Dirichlet concentration parameter (α) on federated data heterogeneity measured by Earth Mover’s Distance (mean ± std) and Weight Divergence.
Table 5. Dirichlet concentration parameter (α) on federated data heterogeneity measured by Earth Mover’s Distance (mean ± std) and Weight Divergence.
Dirichlet αEMD (Mean ± std)Weight DivergenceNon-IID Level
0.10.428 ± 0.0310.312High
0.5 (default)0.247 ± 0.0220.186Moderate
1.00.168 ± 0.0180.124Low–Moderate
10.00.041 ± 0.0080.029Near-IID
Table 6. Impact of heterogeneity on FedCarbon performance (ECEBD).
Table 6. Impact of heterogeneity on FedCarbon performance (ECEBD).
Dirichlet αFedCarbon R2FedAvg R2DP-FedAvg R2
0.10.9280.9010.879
0.5 (default)0.9420.9520.924
1.00.9490.9580.934
10.00.9540.9610.941
The results demonstrate that FedCarbon’s attention-based aggregation provides increasing advantage over standard FedAvg as heterogeneity increases (α decreases), narrowing the gap from 0.007 at α = 10.0 to only 0.027 at α = 0.1, whereas DP-FedAvg suffers a 0.062 gap at high heterogeneity.
Table 7. Performance comparison of FedCarbon and baseline methods.
Table 7. Performance comparison of FedCarbon and baseline methods.
MethodPrivacyUCI Energy EfficiencyECEBDComm.
R2MAERMSEMAPER2MAERMSEMAPE
CentralizedNone0.951±1.742.287.8%0.968±18.926.14.9%N/A
Local OnlyFull0.758±3.925.0817.6%0.809±49.868.412.9%0%
FedAvgNone0.928±2.122.769.5%0.952±24.333.76.3%100%
FedProxNone0.932±2.062.689.2%0.956±23.132.16.0%100%
SCAFFOLDNone0.936±2.012.619.0%0.959±22.431.05.8%200%
DP-FedAvgε = 10.897±2.533.3111.4%0.924±30.842.48.0%100%
Compressed-FedAvgNone0.916±2.292.9810.3%0.944±26.236.26.8%17.4%
FedPerNone0.924±2.182.849.8%0.949±25.034.56.5%100%
HierFAVGNone0.930±2.092.729.4%0.954±23.732.86.1%100%
DP-SCAFFOLDε = 10.908±2.403.1310.8%0.932±29.140.17.5%200%
FedCarbonε = 10.921±
(95% CI: [0.915, 0.927])
2.222.8910.0%0.942±
(95% CI: [0.937, 0.947])
21.4 kgCO2e/m2 (95% CI: [20.1, 22.7])37.05.5%17.4%
Note: All experiments were repeated 5 times with different random seeds for model initialization and client data partitioning.
Table 8. Percentile-based urban embodied carbon benchmarks derived from federated predictions (ECEBD dataset).
Table 8. Percentile-based urban embodied carbon benchmarks derived from federated predictions (ECEBD dataset).
Benchmark LevelPercentileECI (kgCO2e/m2)Interpretation
Low-carbon benchmarkP25285.6Efficient residential stock
Median benchmarkP50365.2Typical urban residential stock
High-carbon benchmarkP75452.8Carbon-intensive segment
Table 9. Percentile benchmarks with bootstrap 95% confidence intervals (1000 bootstrap resamples).
Table 9. Percentile benchmarks with bootstrap 95% confidence intervals (1000 bootstrap resamples).
Benchmark LevelPercentileECI (kgCO2e/m2)95% CI (Bootstrap)Width of CI
Low-carbonP25285.6[278.3, 293.1]14.8
MedianP50365.2[356.8, 373.9]17.1
High-carbonP75452.8[441.2, 464.7]23.5
Table 10. Boundary crossing analysis under CI-adjusted thresholds.
Table 10. Boundary crossing analysis under CI-adjusted thresholds.
BoundaryPoint Threshold (kgCO2e/m2)Lower CI BoundUpper CI BoundBuildings in ±CI Zone% of Test SetReclassification Rate
P25 (Low → Median)285.6278.3293.1388.1%8.1%
P50 (Median → High)365.2356.8373.9449.4%9.4%
P75 (High → Very High)452.8441.2464.7316.6%6.6%
Total borderline buildings 11324.1%
Buildings in ±CI Zone’ counts buildings with predicted ECI falling within the confidence interval range [Lower CI, Upper CI] around each percentile threshold. These buildings could be classified into either adjacent category depending on threshold selection. The 24.1% total borderline rate across all three boundaries suggests that municipalities should use benchmark ranges rather than hard thresholds for policy decisions, particularly for buildings within approximately 15–24 kgCO2e/m2 of any percentile boundary. We recommend a traffic-light classification system where buildings clearly below P25-Lower (<278.3) are classified ‘Green’ (low-carbon), buildings clearly above P75-Upper (>464.7) are classified ‘Red’ (high-carbon), and buildings in boundary zones receive ‘Amber’ classification requiring additional assessment.
Table 11. Leave-one-country-out percentile shifts (kgCO2e/m2).
Table 11. Leave-one-country-out percentile shifts (kgCO2e/m2).
Country Excludedn (Excluded)P25 ShiftP50 ShiftP75 ShiftMax
Germany312−4.2−6.8−8.18.1
France276−2.1−3.4−5.25.2
Spain198+3.8+5.1+7.67.6
Italy241+1.4+2.8+4.34.3
Netherlands156−1.8−2.1−2.92.9
Sweden134−3.6−4.2−3.84.2
Poland187+4.1+5.8+8.48.4
Czech Republic142+2.3+3.1+4.74.7
Austria128−1.2−1.8−2.42.4
Denmark118−2.7−3.3−3.13.3
Finland112−3.1−3.8−4.24.2
Belgium136−0.8−1.2−1.81.8
MeanShift--2.63.6
MaxShift--4.26.8
Shift as % of Benchmark-0.9%1.0%1.0%-
Table 12. Computational overhead and training time comparison.
Table 12. Computational overhead and training time comparison.
MethodTotal Training Time (min)Per-Round Time (s)Client GPU Memory (MB)Client Compute per Round (GFLOPS)Server Aggregation Time (s)Total Comm. (MB)
Centralized12.3-2840---
Local Only8.62.5811200.84-0
FedAvg34.210.2611200.840.42436.0
FedProx36.811.0411800.910.42436.0
SCAFFOLD41.512.4513401.120.58872.0
DP-FedAvg38.711.6112400.970.42436.0
Compressed-FedAvg31.89.5411600.880.3875.9
FedCarbon36.410.9212800.960.5175.9
FedCarbon shows modest computational overhead (0.96 GFLOPS, 14.3% higher than FedAvg) due to gradient clipping, adaptive noise, and momentum buffers, requiring client GPU memory (1280 MB) compatible with consumer-grade GPUs for small firm participation, with a total training time (36.4 min for 200 rounds) that is 6.4% longer than FedAvg but 12.3% faster than SCAFFOLD, despite better accuracy. Server aggregation time (0.51 s) is slightly higher than FedAvg (0.42 s) due to attention computation but negligible relative to client training. The dominant practical advantage is the 82.6% communication reduction (75.9 MB vs. 436.0 MB), which is critical for bandwidth-constrained construction industry deployments.
Table 13. Ablation study results on ECEBD dataset.
Table 13. Ablation study results on ECEBD dataset.
ConfigurationHierarchicalAttentionAdaptive DPMomentumCompressionR2MAE
Full FedCarbon0.94221.4
w/o Hierarchical0.93628.4
w/o Attention0.93826.8
w/o Adaptive DP0.93130.1
w/o Momentum0.93528.9
w/o Compression0.94622.5
Minimal (FedAvg + DP)0.92430.8
Table 14. Error decomposition by building type (ECEBD dataset).
Table 14. Error decomposition by building type (ECEBD dataset).
Building TypeSample CountR2MAE (kgCO2e/m2)RMSEMAPE (%)
Single-family detached6870.95118.224.84.7
Semi-detached/terraced5240.94619.827.15.1
Low-rise apartment (≤4 floors)6120.93822.430.65.8
Mid-rise apartment (5–8 floors)3890.93125.134.26.5
High-rise apartment (>8 floors)1280.91231.743.28.2
Table 15. Error decomposition by region (ECEBD dataset).
Table 15. Error decomposition by region (ECEBD dataset).
RegionSample CountR2MAE (kgCO2e/m2)RMSEMAPE (%)
Region 1 (Northern EU)4980.93822.831.25.9
Region 2 (Central EU)6710.95917.624.14.6
Region 3 (Southern EU)6180.92326.436.16.8
Region 4 (Eastern EU)5530.93124.133.06.2
Table 16. Error decomposition by dominant material category (ECEBD dataset).
Table 16. Error decomposition by dominant material category (ECEBD dataset).
Dominant Structural MaterialSample CountR2MAE (kgCO2e/m2)MAPE (%)
Reinforced concrete8910.94820.15.2
Steel frame3120.92926.86.9
Masonry/brick6540.94421.35.5
Timber frame2870.91828.47.3
Mixed/hybrid1960.90832.18.3
Table 17. Variance in FedCarbon performance across 10 independent non-IID partitions (K = 20, R = 4, and α = 0.5).
Table 17. Variance in FedCarbon performance across 10 independent non-IID partitions (K = 20, R = 4, and α = 0.5).
MetricMeanStd DevMinMax95% CI
R2 (ECEBD)0.9410.0040.9340.948[0.937, 0.945]
MAE (ECEBD)21.60.820.223.1[20.4, 22.8]
R2 (UCI)0.9200.0050.9120.929[0.914, 0.926]
Table 18. Component sensitivity analysis (variance in R2 across 10 partitions).
Table 18. Component sensitivity analysis (variance in R2 across 10 partitions).
ConfigurationMean R2Std Dev of R2Sensitivity Rank
Full FedCarbon0.9410.004
w/o Attention0.9370.0091 (Most Sensitive)
w/o Adaptive DP0.9300.0072
w/o Momentum0.9340.0063
w/o Compression0.9450.0034 (Least Sensitive)
Table 19. Comparison with state-of-the-art methods.
Table 19. Comparison with state-of-the-art methods.
MethodReferenceYearR2PrivacyFederatedCompressionHierarchicalDomain
FL-SmartBuildingBerkani et al. [1]20250.948NoYesNoNoEnergy
PFL-EnergyWang et al. [5]20240.936NoYesNoNoEnergy
DP-ThermalAbbas et al. [4]20240.908YesYesNoNoComfort
AI-RetrofitShan et al. [7]20250.941NoNoNoNoRetrofit
DT-DecarbFeng et al. [8]20250.933NoNoNoNoCarbon
FL-QuantumHinterstocker et al. [15]20250.921YesYesNoNoSupply
QFL-IoTQiao et al. [9]20240.916YesYesYesNoIoT
Smart-PEDSiakas et al. [12]20250.924NoYesNoYesEnergy
FL-AgricultureGupta et al. [18]20250.918NoYesNoNoFarming
VFL-CyberFolino et al. [17]20240.912YesYesNoNoSecurity
FedCarbon (Ours)-20250.942YesYesYesYesCarbon
FedCarbon is the most accurate privacy-preserving approach and offers 82.6% communication reduction. The adaptive noise calibration offers an improvement of about 1.5% R2 over uniform noise injection.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Albelwi, N. Federated Learning-Enabled Building Stock Modeling for Privacy-Preserving Embodied Carbon Benchmarking in Residential Construction. Buildings 2026, 16, 1029. https://doi.org/10.3390/buildings16051029

AMA Style

Albelwi N. Federated Learning-Enabled Building Stock Modeling for Privacy-Preserving Embodied Carbon Benchmarking in Residential Construction. Buildings. 2026; 16(5):1029. https://doi.org/10.3390/buildings16051029

Chicago/Turabian Style

Albelwi, Naif. 2026. "Federated Learning-Enabled Building Stock Modeling for Privacy-Preserving Embodied Carbon Benchmarking in Residential Construction" Buildings 16, no. 5: 1029. https://doi.org/10.3390/buildings16051029

APA Style

Albelwi, N. (2026). Federated Learning-Enabled Building Stock Modeling for Privacy-Preserving Embodied Carbon Benchmarking in Residential Construction. Buildings, 16(5), 1029. https://doi.org/10.3390/buildings16051029

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop