1. Introduction
The building sector constitutes one of the most significant contributors to global energy consumption and greenhouse gas emissions, accounting for approximately 38% of total life-cycle carbon emissions worldwide and up to 50.9% in China [
1,
2,
3]. Public buildings within this sector are of particular concern due to their complex energy systems, extended operating hours, and concentrated equipment loads [
4,
5]. Compared to residential buildings, the unit-area carbon emission intensity of public buildings is three to five times higher, establishing them as a critical focus for carbon reduction strategies.
Accurate carbon emission prediction plays a pivotal role in green building design; however, existing approaches face significant limitations. Traditional methods, such as life cycle assessment (LCA) and regression-based simplified models, rely heavily on high-dimensional or multi-stage data inputs [
6], which are often unavailable during the early design stage [
7]. International efforts, such as the European Union’s life-cycle database under the Energy Performance of Buildings Directive (EPBD) framework and ASHRAE’s simulation-based emission tools, have demonstrated progress in large-scale applications [
8,
9]. Concurrently, recent studies have explored the adoption of recurrent neural networks (e.g., LSTM and GRU). However, these methods generally require detailed and high-quality datasets that are difficult to obtain in practice, particularly during early-stage design or in grassroots construction projects. In China, although the “Building Carbon Emission Calculation Standard”(GB/T51366-2019) [
10] provides a regulatory framework, existing models such as Random Forest and Support Vector Regression still exhibit limited generalization ability under low-dimensional conditions [
11].
Although the aforementioned high-dimensional methods provide valuable insights for detailed design stages, their applicability diminishes significantly during the critical early phases of architectural design [
12]. During this phase, designers require rapid, simplified tools for comparative scenario analysis rather than exhaustive, data-intensive simulations. Consequently, the focus on low-dimensional prediction is not merely a concession to data scarcity but represents a deliberate methodological choice aligned with the practical constraints and decision-making needs of early-stage design [
13]. This approach prioritizes expediency and accessibility, enabling architects to swiftly evaluate the carbon implications of fundamental massing and spatial decisions.
Several studies have aimed to support early-stage carbon prediction. Simplified regression models, for instance, provide computational efficiency by drawing on existing building stock data, yet they often overlook the complex, non-linear relationships between fundamental design parameters and carbon emissions. Likewise, rule-based methods and early benchmarks, while convenient for preliminary reference, offer limited flexibility and fail to reflect the nuanced interplay among a project’s specific area, height, and climatic conditions. These constraints—rooted in linear assumptions and weak generalization point to a clear research need: an approach that preserves the ease of low-dimensional inputs while leveraging non-linear modeling to deliver accuracy close to that of more detailed, later-phase tools. This study responds to that need by introducing a lightweight multilayer perceptron (MLP) framework designed to balance these competing requirements.
To this end, we developed a carbon emission prediction model adapted to low-data scenarios in early design. The model is built around an MLP architecture that relies on three basic parameters floor area, number of above-ground floors, and geographic region commonly accessible at this stage. We further constructed two composite variables through feature engineering: layers per unit area (LPA) and height-to-area ratio (HAR), which help quantify spatial compactness and vertical density. To strengthen non-linear representation and prevent overfitting, the network incorporates Swish activation functions, adaptive L2 regularization, and Dropout layers. Additionally, transfer learning is employed using pre-training on large public datasets followed by fine-tuning on local samples to improve model adaptability and robustness across varied contexts.
The contributions of this study are summarized as follows. First, it presents a practical application of a multilayer perceptron (MLP) network to carbon emission prediction under low-data conditions, effectively addressing the limitations of conventional linear regression methods. Second, it introduces two composite indicators height-to-area ratio (HAR) and layers per unit area (LPA) which help quantify spatial compactness and vertical density, thus supporting early-stage low-carbon design decisions. Third, a user-friendly Python-based tool (using Python 3.9) is developed, allowing designers to obtain rapid carbon estimates with minimal input during schematic design. Together, this work highlights the potential of lightweight deep learning models to facilitate carbon-aware building design and supports the transition toward carbon neutrality in the construction sector.
2. Relevant Theories and Technical Route
2.1. Relevant Theories
2.1.1. Life Cycle Carbon Emission Theory
Building carbon emissions exhibit distinct stage-dependent characteristics, typically categorized into material production (Em), transportation (Et), construction (Ec), operation (Eo), and demolition (Ed) [
14]. The total emissions can be expressed by the following equation:
This life-cycle perspective (
Figure 1) ensures that emissions from all structural and operational activities are accounted for [
15,
16,
17]. However, during the early design stage, only low-dimensional descriptors—floor area, above-ground floor count, and geographic region—are typically available. Consequently, rather than estimating each stage separately, this study focuses on establishing a nonlinear mapping between these limited parameters and the aggregated Etotal, thereby embedding the implicit stage-level emission patterns within a neural network framework to facilitate rapid and practical prediction.
2.1.2. Multilayer Perceptron Theory
The Multi-Layer Perceptron (MLP) is a class of feed-forward artificial neural networks. As illustrated in
Figure 2, its structure consists of an input layer, one or more hidden layers, and an output layer, making it one of the most fundamental and widely used deep learning models. Compared to traditional linear regression, the MLP can capture complex nonlinear relationships between input features through its nonlinear activation functions and multilayer structure. This capability makes it particularly suitable for modeling low-dimensional, small-sample datasets with high feature interactivity [
18].
In the context of building carbon emission calculation, the multilayer perceptron demonstrates superior nonlinear modeling and feature interaction capabilities compared to traditional linear regression models. Building carbon emissions are influenced by the superposition of numerous factors—such as floor area, number of floors, and regional climate—which exhibit significant nonlinear interactions. These complex interactions are often inadequately captured by linear models [
19]. In contrast, the MLP architecture, leveraging its multilayer structure and nonlinear activation functions, can automatically learn the underlying patterns from limited features and achieve high-precision fitting even with low-dimensional parameter inputs. Furthermore, the MLP offers the flexibility to incorporate manually constructed interaction variables (e.g., layers per unit area, height-to-area ratio) and exhibits strong structural extensibility and data adaptability. These properties make it especially suitable for the rapid prediction of building carbon emissions in scenarios where detailed data are unavailable during the early design stage.
2.2. Technical Route
This study innovatively applies a multilayer perceptron (MLP) neural network to the carbon emission prediction of public buildings. This approach provides a novel solution to the challenge of carbon emission estimation in low-dimensional, small-sample scenarios [
20]. Leveraging its unique layered structure and nonlinear activation functions, the MLP effectively captures the complex interactions among limited parameters—such as building area, number of floors, and regional climate—thereby overcoming the expressive limitations inherent in traditional linear models.
In terms of methodological design, a lightweight three-layer MLP network (64-32-16 neurons) was constructed. The Swish activation function was adopted to enhance nonlinear modeling capability, and composite features—such as layers per unit area (LPA) and height-to-area ratio (HAR)—were innovatively introduced. These engineered features significantly enhance the information density of the low-dimensional input data. To address the challenge of limited sample size, a transfer learning strategy is employed. The model is first pre-trained on large-scale public datasets to learn a generalized feature representation and is subsequently fine-tuned on the local data distribution. This strategy enables the model to maintain excellent generalization performance with a dataset of only 150 samples [
21].
As outlined in the technology roadmap (
Figure 3), this research follows a complete iterative cycle: problem definition → feature engineering → model design → transfer learning → validation analysis → tool development. Finally, the model is encapsulated into a Python-based rapid calculation tool. This tool allows practitioners to obtain a reliable carbon emission estimate by inputting only three basic parameters: floor area, number of floors, and geographical region. It provides practical technological support for green building design and verifies the feasibility of applying multi-layer perceptron approach to carbon emission prediction in low-dimensional building scenarios.
3. Model Design and Methods
3.1. Input Feature Construction and Preprocessing
3.1.1. Data Source
The local dataset was compiled by selecting office buildings from journal publications and government disclosures within the past five years. For each building, floor height, area, geographical region, and total carbon emissions were collected and integrated into a structured dataset. This data collection approach ensures high levels of authenticity, accuracy, and comprehensiveness [
22]. All selected office buildings feature reinforced concrete structures. Variation in their carbon emissions primarily stems from differences in height, area, geographical location, and spatial characteristics, which are influenced by building form and construction complexity [
23,
24]. The construction complexity varies significantly between low-rise and high-rise buildings. These inherent characteristics provide valuable insights for parameter design in this study.
According to the actual data collection, three types of core building parameters, as shown in
Table 1, Core Data Parameter Map are selected as raw inputs:
3.1.2. Feature Enhancement and Interaction Variables
To enhance the model’s expressive capability, feature interaction design was implemented through the construction of the following derived variables:
The selection of LPA and HAR as engineered features is grounded in a dual theoretical foundation addressing both physical drivers of building carbon emissions and mathematical limitations of low-dimensional data modeling [
25,
26,
27]. This approach transforms basic early-stage design parameters into informative proxies characterizing building spatial configuration.
The primary rationale stems from the principles of Life Cycle Carbon Emission theory, which posits that a building’s total carbon footprint (Etotal) is an aggregate of emissions from material production, construction, operation, and demolition stages. However, during the early design phase, detailed data for modeling these stages individually are unavailable. The parameters of floor area (A) and number of floors (F) are accessible but insufficient on their own to capture the complex interplay between a building’s form and its life-cycle carbon intensity [
28]. Herein lies the innovation of LPA and HAR. Rather than being arbitrary mathematical constructs, they serve as quantifiable proxies for fundamental architectural properties that directly influence emissions across the life cycle. The parameter LPA = F/A quantifies horizontal compactness. A high LPA value indicates a building with a greater concentration of floor area, which typically correlates with intensified energy use per unit area during operation (e.g., from centralized HVAC and elevator systems) but may also imply structural efficiency in material use. Conversely, the parameter HAR = F
2/A captures vertical density or slenderness. A high HAR value signifies a tall, narrow building, which is associated with increased embodied carbon from the structural systems required to resist lateral loads, as well as altered operational energy profiles due to a higher surface-area-to-volume ratio. By incorporating LPA and HAR, the model is provided with condensed, physically meaningful indicators that embed critical aspects of the life cycle carbon emission structure, enabling a more nuanced prediction than is possible with A and F alone.
The second pillar of the rationale addresses a fundamental mathematical limitation of traditional regression models when applied to this domain. Standard linear models or simplified formulas often treat parameters like area (A) and floor count (F) as independent variables, operating under the assumption of linearity and additivity [
29]. However, the impact of building area on carbon emissions is not constant; it is intrinsically modulated by the building’s height. This interaction effect is a quintessential nonlinear relationship that linear models fail to capture effectively.
The engineered features LPA and HAR are, in essence, predefined nonlinear interaction terms. The MLP neural network, while capable of learning such interactions implicitly, benefits significantly from being guided by these semantically meaningful transformations, especially in a low-dimensional, small-sample scenario. By explicitly providing LPA = F/A and HAR = F
2/A, the model is relieved from the burden of discovering these specific functional forms from limited data. This reduces model complexity, accelerates convergence, and enhances generalization [
30]. It effectively linearizes a core piece of the underlying nonlinear problem, allowing the MLP to focus on learning higher-order complexities. Thus, this feature engineering is not merely an enhancement but a mathematical prerequisite for achieving high predictive accuracy with a lightweight model when working with constrained input dimensions.
In summary, the theoretical basis for LPA and HAR is twofold: (1) they embed physically significant descriptors of building morphology that are intrinsically linked to lifecycle carbon emissions, and (2) they explicitly introduce critical nonlinear interactions between area and height, thereby overcoming a fundamental shortcoming of traditional linear modeling approaches and providing a robust input structure for the subsequent MLP network.
Geographical climate coding: mapping the “cold/mild/hot summer/cold winter” zones into 0/1/2 numerical variables, which is convenient for modeling, and the main source of data is the city code, as shown in
Table 2 Climate coding table:
All input features are normalized before training:
where
is the original data,
is the scaled data,
is the sample mean, and
is the standard deviation.
3.2. Model Structure Design
3.2.1. Network Structure Configuration
In this study, a lightweight multilayer perceptron (MLP) neural network is designed and implemented for predicting carbon emissions in low-dimensional building scenarios [
31,
32,
33]. The model architecture comprises an input layer, three hidden layers, and an output layer. The input layer accepts five feature variables as inputs. The three hidden layers consist of fully connected neurons with sizes set to 64, 32, and 16, respectively, to facilitate the layer-by-layer extraction of higher-level feature representations.
The network architecture was determined using an empirical approach, guided by the specific constraints of the problem. Given the limited training sample size (N = 150), automated hyperparameter search techniques, such as grid search or Bayesian optimization, were deemed unsuitable due to risks of high computational cost and result instability. Consequently, a pragmatic strategy combining manual tuning with preliminary validation was adopted for architecture selection. This strategy primarily prioritizes ensuring the model’s generalization capability within low-dimensional, small-sample scenarios. The selected layered structure [64, 32, 16] aligns with common empirical configurations used in the deep learning community for tabular data, aiming to achieve effective feature learning while maintaining controllable model complexity. In terms of design principles, this architecture embodies the core objective of balancing expressive power with generalization performance. The initial layer, comprising 64 neurons, provides sufficient capacity to nonlinearly map the complex interactions among the input features: building area, number of floors, region, and the structural parameters LPA and HAR. Subsequent hidden layers, with 32 and 16 neurons, progressively abstract features and compress information. This progressively decreasing structure facilitates the extraction of higher-order representations pertinent to carbon emissions from the raw data. Simultaneously, the relatively lightweight three-layer design inherently acts as a form of regularization, proactively mitigating overfitting issues that are prone to occur with small datasets when model complexity is excessive. This design establishes a structural foundation that enhances the model’s robustness.
To enhance nonlinear modeling capability while ensuring stability during training on low-dimensional samples, the Swish activation function is employed in the hidden layers. The Swish function offers smoother gradients compared to the ReLU function, enabling more stable gradient propagation and feature extraction under limited data conditions. The output layer consists of a single neuron, which outputs the predicted value of total building carbon emissions (unit: tCO2) or carbon intensity per unit floor area (unit: tCO2/m2).
- 1.
Input layer
The input feature variables are:
where
is the Floor area,
is the Number of floors above ground,
is the Geographical climate code (numeric),
, and
;
- 2.
Hidden layer structure (3 layers)
Let the weight matrix of the layer be , the bias vector be and the activation function be (Swish function). Dimension of each layer:
Input layer → Hidden layer 1: 5 → 64;
Hidden layer 1 → Hidden layer 2: 64 → 32;
Hidden layer 2 → Hidden layer 3: 32 → 16.
Activation function:
where
is the science system parameter and
is the sigmoid function.
- 3.
Output layer
where
is the predicted carbon emissions (tCO
2).
3.2.2. Loss Function and Optimizer
During model training, the Huber loss function is selected as the objective function to enhance training robustness. The Huber loss integrates the advantages of the mean square error (MSE) and the mean absolute error (MAE). Specifically, it behaves similarly to MSE for small errors, ensuring differentiable gradient continuity, while for large errors, it approximates MAE, which is less sensitive to outliers. This characteristic reduces the negative impact of extreme samples on model training. In this study, the threshold parameter δ for the Huber loss is set to 1500, a value chosen to align with the data magnitude and balance the penalty imposed on large errors. The threshold δ = 1500 was determined by considering the data statistics: the mean carbon emission is 98,756 tCO
2, the standard deviation is 102,345 tCO
2, and the data range is substantial (1474 to 665,233 tCO
2). This value is appropriate given the data magnitude. This value, approximately 1.5% of the data’s standard deviation, avoids excessive sensitivity to small errors while effectively controlling the influence of outliers. The input features are normalized using the Z-score method (Equations (1)–(3)), and the output magnitude must be consistent with this scaling. The value δ = 1500 corresponds to a reasonable quantile within the normalized error distribution.
where,
,
takes the value of 1500 (when
< 1000 the model is sensitive to noise,
> 2000 ignores important error information, and 1500 is the optimal compromise point).
3.2.3. Regularization and Dropout
- 1.
L2 regularization: add a weight decay term to the loss function:
where
is the adaptive regularization coefficient (adjusted according to the importance of features).
- 2.
Dropout: Discard the neuron outputs of hidden layers 1–3 with probability p = 0.3 during training (no dropout during training).
3.2.4. Complete Training Model Expression
where ⊙ is the denotes element-by-element multiplication and
is the Dropout mask vector (elements obey Bernoulli distribution B(1, 0.7)).
3.3. Migration Learning Strategy Design
3.3.1. Pre-Training Data Sources
To mitigate the challenges associated with training on small samples, this study employs a transfer learning strategy. The model is first pre-trained on a large-scale dataset compiled from publicly available building energy consumption and carbon emission case studies, following the methodological framework advocated by the China Association of Building Energy Efficiency (CABEE). Although this integrated dataset is not directly published by CABEE, its provenance is clearly documented. It primarily originates from the Donghe Building Carbon Emission Calculation Platform, which was jointly developed by Southeast University and China Construction Group. The data, calculated using the life cycle assessment (LCA) methodology, encompass over 10,000 samples covering various public building types across China’s major climate zones. The key characteristics of the pre-training dataset are summarized in
Table 3.
This large-scale pre-training enables the model to learn robust, generalized feature representations that capture the relationship between basic building parameters and carbon emissions.
3.3.2. Fine-Tuning Method
Following pre-training, the model is fine-tuned on the smaller, specific local dataset (N = 150). A layer-wise fine-tuning strategy is adopted. The weights of the lower layers (,,,) are frozen, as these layers capture universal, low-level features (e.g., fundamental relationships between area, height, and emissions). Only the weights of the higher layers (,,,) are updated during fine-tuning. This approach allows the model to adapt its high-level reasoning to the specific distribution of the local data while preserving the general knowledge acquired during pre-training, effectively reducing overfitting and improving convergence stability.
Fine-tuning: freeze the first two layer weights () and optimize only the subsequent parameters:
During fine-tuning, the model progressively incorporates the data distribution of local samples into its higher-level parameters, while preserving the generalized features captured in the initial layers during pre-training. This "freeze lower layers, adjust higher layers" strategy offers two key benefits in low-dimensional settings: it substantially shortens training time by updating only a subset of parameters, and it alleviates overfitting and training instability, leading to faster and more reliable convergence. By combining pre-training on large datasets with targeted fine-tuning on local samples, the model retains robust feature extraction capabilities while adapting its final layers to domain-specific characteristics. In practice, this approach effectively addresses common small-data challenges—such as model instability, slow convergence, and training difficulty—and enhances the generalizability and practical utility of the proposed carbon emission prediction model, offering a viable path toward efficient building carbon estimation under low-dimensional constraints.
3.4. Model Evaluation Methods and Parameters
To comprehensively evaluate the effectiveness and generalization capability of the proposed MLP model for building carbon emission prediction, a rigorous evaluation framework incorporating a multidimensional performance index system is implemented during the model training stage. The evaluation process encompasses three key aspects: the training strategy, the validation mechanism, and the error measurement methodology. This multi-faceted approach aims to ensure the stability, reliability, and interpretability of the model’s performance.
3.4.1. Training and Validation Mechanism
Given the limited local sample size of 150 instances, a 10-fold cross-validation (K = 10) strategy is adopted for model training and performance validation to enhance data utilization efficiency and evaluation robustness. The specific operational workflow is illustrated in
Figure 4.
A nested data partitioning strategy, integrating 10-fold cross-validation with an 8:1:1 data split ratio, was meticulously implemented to ensure a robust and unbiased evaluation of the model’s performance on the limited sample data. This approach maximizes data utility and yields a reliable estimate of the model’s generalizability.
This mechanism operates through a two-tiered process. First, an outer 10-fold cross-validation loop is executed. The complete dataset of 150 samples is partitioned into 10 mutually exclusive subsets (or folds) of equal size (15 samples each) using the KFold (n splits = 10) method. In each iteration, one unique fold is held out as the external test set for the final evaluation of the model’s generalization capability. The remaining nine folds (135 samples) constitute the interim training pool for that iteration. Subsequently, within each outer loop iteration, an internal 8:1:1 split is performed. This split is achieved automatically during model training by setting the validation_split parameter to 0.1111 in Keras’s fit() function. This setting instructs the training routine to randomly reserve approximately 11.11% (1/9) of the 135 samples from the current iteration’s training pool to create an internal validation set. This internal validation set is used for real-time performance monitoring and for triggering the Early Stopping callback during training. The remaining 88.89% (8/9) of the pool serves as the actual training set for updating the model weights. The effective global data allocation is as follows: the training set comprises approximately 120 samples (80% of the total data), the internal validation set comprises about 15 samples (10%), and the external test set about 15 samples (10%), achieving the intended global 8:1:1 split ratio. This method maximizes the utilization of limited sample information while effectively reducing the impact of chance data distribution on the model results, thereby enhancing the representativeness and stability of the assessment. The average error metrics obtained through cross-validation provide a more accurate reflection of the model’s performance on unseen data.
3.4.2. Parameter Setting
The hyperparameters for the multilayer perceptron (MLP) model were selected using a pragmatic, manual tuning approach, guided by the dual constraints of small-sample learning (N = 150) and the strategic implementation of large-scale pre-training. This methodology prioritized stability, reproducibility, and efficient knowledge transfer, thereby avoiding computationally expensive automated searches (e.g., grid search or Bayesian optimization). The selection process was fundamentally shaped by the two-stage learning framework: initial pre-training on a large, diverse dataset followed by fine-tuning on the small local sample.
The parameter configurations for the multi-layer perceptron model are summarized in
Table 4.
To assess the prediction accuracy and goodness of fit from multiple perspectives, three standard regression evaluation metrics are employed: the mean absolute error (MAE), the root mean square error (RMSE), and the coefficient of determination (R2). These metrics are defined as follows:
where
is the actual value,
is the predicted value and
is the average value.
3.4.3. Specific Approach and Optimization Strategy
To ensure robust and unbiased evaluation under conditions of limited sample size and inherent data variability, a comprehensive validation methodology was employed throughout the modeling process. All input features were standardized using Z-score normalization prior to training to mitigate scale-related bias. During cross-validation, model parameters and prediction errors were systematically recorded to analyze performance variations across data partitions. Evaluation metrics were computed strictly on held-out test sets, and the proposed MLP model was explicitly compared against baseline models, including linear regression, random forest, and support vector regression to objectively demonstrate its predictive advantage. Furthermore, the statistical properties of prediction errors, such as mean error, standard deviation, and skewness were quantitatively examined to assess model robustness across varying data conditions. This multi-faceted validation framework ensures a thorough assessment across three critical dimensions: predictive accuracy, operational stability, and practical relevance. The systematically gathered performance data also provides a solid empirical basis for subsequent comparative analysis and visual interpretation in the results and discussion sections.
4. Results
4.1. Dataset Construction and Outlier Handling
4.1.1. Dataset Construction and Division
To validate the effectiveness of the multilayer perceptron (MLP) model for building carbon emission prediction, a unified dataset was constructed by integrating the carbon emission sample data compiled in
Section 3.2.1 with typical case studies disclosed by the China Building Energy Conservation Association (CABEE). The dataset comprises 150 records encompassing floor area, number of above-ground floors, geographic climate code, and carbon emissions. These data were standardized and feature-engineered to form a complete input matrix. To enhance generalization performance, the dataset was partitioned into training, validation, and test sets using an 8:1:1 ratio, ensuring objective and robust model evaluation. The distribution characteristics of the data features are summarized in
Table 5. This partitioning strategy accounted for geographical distribution and building size balance to mitigate potential data bias.
4.1.2. Data Cleaning Procedures and Outlier Handling
To ensure model robustness and generalization capability, a systematic data cleaning process was implemented, focusing on outlier identification and handling. Initially, completeness checks were performed on the dataset, confirming that all 150 samples contained complete values for key parameters (area A, number of floors F, and total carbon emissions E). Subsequently, a combined approach utilizing statistical visualization and model-driven analysis was employed to identify data points potentially detrimental to model training. To better illustrate the distribution characteristics and dispersion of core parameters, box plots for each parameter were utilized (
Figure 5).
Figure 5 clearly illustrates several data points in the area (A) and carbon emissions (E) data that reside far from the main distribution zone (i.e., statistical outliers). As illustrated in
Figure 5, the boxplots graphically depict the median, quartiles, and extremes of the data. Several data points for Area (A) and Total Carbon Emissions (E) fall beyond the whiskers (defined as 1.5 times the interquartile range), classifying them as statistical outliers. This is an expected characteristic of real-world building data, which often includes extreme cases, such as very large public edifices.
However, statistical anomalies do not necessarily constitute harmful noise in the modeling process. To identify samples that pose significant challenges to prediction accuracy, an in-depth error interaction analysis was conducted. Following initial model training, the correlation between predicted values and absolute errors was examined.
Figure 6 reveals that the prediction errors for specific samples far exceed the average level. For instance, points a and b exhibit exceptionally high relative errors despite their low absolute carbon emissions, as the model’s absolute prediction errors appear disproportionately large relative to their own magnitude. This indicates the model struggles to accurately capture the carbon emission patterns of such small-scale buildings. Without intervention, the training process may become dominated by these high-leverage points.
Based on this analysis, a targeted handling strategy was formulated:
Identification and Marking: Samples such as points a and b, characterized by high relative error and low emission values, were clearly marked as high-impact outliers.
Handling Approach: To avoid bias from simple deletion and enhance adaptability to complex data distributions, an algorithmic augmentation strategy was adopted. Specifically, during final MLP model training, lower sample weights were applied to labeled high-impact outliers. This reduced their contribution to the loss function, enabling the model to focus on learning intrinsic patterns from the main data rather than fitting special cases.
Results Validation: After implementing this weighted training strategy, the final model achieved outstanding test set performance (MAE = 4160 tCO2, R2 = 0.966). Crucially, prediction stability for small-scale buildings improved while maintaining overall generalization capability, demonstrating that the strategy successfully balances data diversity respect with robust model training.
4.2. Model Training Process
Figure 7 illustrates the trend in the error percentage change in the multilayer perceptron model throughout the training process. It can be observed that as the number of training epochs increases, the error percentage decreases rapidly and gradually stabilizes, indicating effective model convergence.
To further monitor convergence behavior and potential overfitting, the training and validation loss curves are plotted and analyzed across the training epochs. As illustrated in
Figure 8, both the training loss (blue solid line) and the validation loss (red dashed line) exhibit a consistent downward trend during initial stages before plateauing. The absence of significant divergence between the two curves, combined with the early stopping mechanism (patience = 30) restoring weights from the epoch with the lowest validation loss, demonstrates that the model achieved stable convergence without severe overfitting. This behavior validates the effectiveness of the regularization strategies (L2 and Dropout) and the selected network architecture in maintaining generalization performance on the limited dataset.
Additionally, as shown in
Figure 9 (Actual vs. Predicted Carbon Emissions), the overall difference between actual and predicted carbon emissions for office buildings during training is small, indicating high prediction accuracy.
4.3. Comparative Analysis of Model Performance
4.3.1. Comparison of Model Performance Before and After Fine-Tuning
This section comprehensively compares the performance of the transfer learning approach with training from scratch using only the limited local dataset, while also discussing potential domain shift issues between the pre-training and fine-tuning datasets. The analysis aims to objectively evaluate how fine-tuning impacts model adaptation and generalization capabilities. The Multilayer Perceptron (MLP) model was evaluated under three key scenarios to isolate the effects of transfer learning:
Direct application of pre-trained model: Utilizing weights initialized from large-scale pre-training without fine-tuning.
Fine-tuned model: Following domain-specific adaptation on the local dataset.
Training from scratch on local data only: Compared to training from scratch, the fine-tuned model achieved an average absolute error reduction of approximately 49.4% (4160 vs. 8223 tCO2), demonstrating the advantage of transfer learning in leveraging pre-trained knowledge.
Table 6 summarizes the comparative results quantifying the impact of fine-tuning.
The pre-training dataset, sourced from Donghe Software’s (Version: V3.5)case repository, encompasses diverse building types (e.g., steel and masonry-concrete structures) across multiple climate zones, yielding a generalized but heterogeneous feature representation. In contrast, the fine-tuning set consists of only 150 reinforced concrete buildings from a specific region, representing a more homogeneous and specialized domain. This discrepancy induces both covariate and label shifts: the input feature distributions, such as structural properties and spatial configurations differ from those in the pre-training corpus, and the resulting carbon emission profiles exhibit distinct characteristics. Although the pre-trained model demonstrates reasonable generalizability, its initial predictions on the target domain are inaccurate, yielding a high mean absolute error (MAE = 11,506 tCO
2e). Fine-tuning addresses this domain gap by adapting the model’s higher-level parameters to the target data distribution, substantially improving predictive accuracy. The process exhibited stable convergence, as indicated by the synchronous decline in training and validation loss (
Figure 8), confirming that adaptation was achieved without overfitting despite limited samples. This suggests that pre-training provided a robust foundational prior, which fine-tuning efficiently specialized for the reinforced concrete domain.
In summary, the performance gap between the pre-trained and fine-tuned models stems primarily from domain shift, arising from differences in building typology and climatic representation between the two datasets. The transfer learning strategy particularly fine-tuning effectively mitigates this issue by transferring knowledge from a broad, public dataset to a specialized local context. These results highlight the value of domain adaptation in settings with data distribution mismatch, though further validation across larger and more varied datasets remains necessary to generalize the findings. Ultimately, this approach provides a practical pathway to leverage large-scale public data while maintaining relevance in localized, data-constrained scenarios.
4.3.2. Performance Comparison Among Different Models
To comprehensively evaluate the predictive capabilities of the MLP model, we used the same dataset and selected the following three models as benchmarks for comparison:
Linear regression model (LR): A simplified carbon emission formula based on area and number of floors;
Random Forest Regression (RF): Tree model representation with some nonlinear fitting ability;
Support Vector Regression (SVR): Representative of linear fitting in high dimensional space.
The performance of each model on the test set is comparatively illustrated in
Figure 10.
The results demonstrate that the proposed MLP model outperforms all baseline models across all evaluation metrics. Notably, it achieves a 54.7% reduction in Mean Absolute Error (MAE) compared to traditional Linear Regression and attains a coefficient of determination (R
2) of 0.966, reflecting its strong nonlinear modeling capacity and generalization ability. The quantitative results are summarized in
Table 7.
4.4. Validation of High-Rise Building Adaptation
To validate the model’s applicability to high-carbon-emission building scenarios, buildings exceeding 15 floors above ground were classified as a ‘high-rise group’ for specialized validation. The prediction results for low-rise and high-rise buildings are presented in
Figure 11a and
Figure 11b, respectively.
A stable fit is observed between predicted and actual values for high-rise buildings, with an R
2 value of 0.957, which is higher than the R
2 of 0.889 achieved for low-rise buildings. Comparison of
Figure 11c,d reveals that the error percentage for high-rise buildings exhibits less fluctuation compared to low-rise buildings, indicating better model adaptation. Furthermore, comparison of
Figure 11a,b indicates that the MLP model achieves an average error percentage of 8.1% for the high-rise group, significantly outperforming other models, such as the linear model, in predicting high-rise building emissions. This demonstrates the model’s capability to handle complex building structures through nonlinear enhanced feature combinations (e.g., HAR and LPA), confirming its suitability for estimating emissions from high-emission projects like office complexes and large public buildings.
4.5. Model Interpretability Analysis (SHAP) and Ablation Studies
To elucidate the model’s decision-making logic and the contribution of key variables, the SHAP (SHapley Additive exPlanations) value analysis method was employed to interpret the carbon emission predictions. The relative contribution of each input variable is illustrated in
Figure 12.
The analysis indicates that building floor area and number of stories are the primary factors influencing carbon emissions. Geographic characteristics nonlinearly influence emissions by affecting energy consumption patterns and building material choices via climatic differences. The engineered variables, Height-to-Area Ratio (HAR) and Layers per Unit Area (LPA), enhance the model’s ability to characterize building carbon emissions in both planar and spatial dimensions. As shown in
Figure 12, these two parameters collectively improve prediction accuracy by 5.8%. Particularly in high-rise scenarios (HAR > 15), prediction errors are controlled within 8.1%. For example, designers can potentially reduce carbon emissions from super-tall buildings by approximately 12% by lowering the HAR value from 20 to 15. This approach of quantifying spatial density into computable parameters overcomes limitations of traditional linear models, advancing carbon emission prediction from a simple area-to-story ratio relationship to a correlation incorporating spatial efficiency and carbon intensity. It thereby provides directly actionable quantitative metrics for low-carbon building design.
To quantitatively assess the individual importance of each input feature in the model and provide a robust theoretical basis for feature construction, we conducted a systematic ablation study. Three simplified models were established: a model containing only the original features, a model excluding geographic features, and a model integrating only area and layer count features while keeping other conditions constant. Their specific performance is shown in
Table 8.
The ablation study quantitatively elucidates the distinct and complementary roles of each feature category in the model’s predictive framework. The most pronounced performance degradation is observed upon the removal of the geographical climate code (R), as evidenced by a substantial increase in MAE (+1650 tCO2) and a marked decrease in R2 (−0.054). This underscores the fundamental role of regional climate as a primary driver of building carbon emissions, primarily governing operational energy consumption patterns for heating and cooling. The removal of the engineered features, Layers per Unit Area (LPA) and Height-to-Area Ratio (HAR), results in a significant though comparatively smaller performance decline (ΔMAE = +760 tCO2, ΔR2 = −0.025). This confirms that these composite variables capture essential, non-redundant information pertaining to building spatial configuration and volumetric density, which are not fully encapsulated by the raw features of area (A) and number of floors (F) alone. The precipitous drop in performance when utilizing only A and F (ΔMAE = +2570 tCO2) highlights the synergistic effect of the feature set; the model’s high accuracy is contingent upon the confluence of basic parameters, geographical context, and morphological indicators.
4.6. Model Output Examples and Visualization
The trained multilayer perceptron model was further developed into a rapid calculation tool using Python to predict carbon emissions of public buildings during early project stages. In the deployment scenario, users can input parameters and obtain outputs as summarized in
Table 9.
The interface of the rapid calculation tool is illustrated in
Figure 13.
Users select the number of floors, area, and climate zone for the target building in the input panel on the left. Upon clicking ‘Calculate Carbon Emissions,’ the tool displays the corresponding predicted values in the output panel on the right, along with reference suggestions to support decision-making for designers during early building project stages.
5. Discussion
This study successfully developed a lightweight MLP model for predicting public building carbon emissions under the significant constraint of low-dimensional, early-stage design data. The model demonstrated superior performance compared to tradi-tional linear and other benchmark models, achieving an MAE of 4160 tCO2 and an R2 of 0.966. The integration of feature-engineered variables (HAR, LPA) and a transfer learning strategy proved effective in enhancing nonlinear modeling capacity and mitigating over-fitting on a small dataset (N = 150). Despite these promising results, several limitations must be acknowledged to objectively assess the model’s applicability and to guide future research.
- (1)
Oversimplification of Climatic and Regional Representation
A primary limitation stems from the coarse classification of geographical and cli-matic influences. The model input relies on a simplified climate code (R: 0,1,2) that groups vast and diverse regions (e.g., "Cold region" encompassing both Beijing and Shenyang). This approach fails to capture critical intra-regional variations in microclimates, which significantly impact building energy consumption for heating and cooling. Impact on Ac-curacy: This simplification likely introduces a source of error. For instance, the heating demand and associated carbon emissions for a building of identical size and form would differ between a coastal city like Qingdao and a more continental city like Beijing, even though they share the same climate code in this model. The model’s predictive accuracy could be dampened in areas that are climatically transitional or atypical within their as-signed zone. Path for Enhancement: Future work should incorporate more granular, con-tinuous climatic parameters. Utilizing actual meteorological data, such as Heating Degree Days (HDD) and Cooling Degree Days (CDD), or higher-resolution climate zoning would allow the model to learn the nuanced relationship between local climate severity and op-erational carbon emissions more accurately.
- (2)
Challenges in Model Generalizability and Scalability
The model’s performance, while robust on the tested dataset, raises valid concerns regarding its generalizability to broader contexts. This limitation is twofold: ① Geographical and Typological Transferability: The model was trained and validated primarily on a dataset of office buildings from specific Chinese cities. Its performance on other building types (e.g., hospitals, schools) or in entirely different geographical and regulatory contexts (e.g., European or North American building stocks) remains unproven. Building designs, construction standards, and operational patterns vary greatly across regions, and a model trained on one context may not translate effectively to another. ② Temporal Generalizability: The model is a snapshot based on current construction practices and energy systems. As the power grid decarbonizes and building energy efficiency standards evolve, the underlying relationship between building form and operational carbon emissions will change. A model trained on current data may become progressively less accurate without mechanisms for temporal adaptation.
To address these challenges, future research should prioritize external validation on diverse, international datasets. Furthermore, incorporating a mechanism for continuous learning or designing the model to be sensitive to dynamic parameters like the grid carbon intensity factor (as discussed next) would significantly improve its long-term utility and scalability.
- (3)
Neglect of Spatial and Temporal Variations in Grid Carbon Factors
Perhaps the most significant limitation for a life-cycle perspective is the treatment of operational carbon emissions. The model implicitly assumes a static, average carbon emission factor for electricity consumption across a broad climate zone. In reality, the carbon intensity of the electrical grid (gCO2eq/kWh) exhibits substantial spatial heterogeneity (even within a single country) and significant temporal variation (by time of day and season). Impact on Prediction Validity: This assumption can lead to substantial inaccuracies. A building’s operational carbon footprint is not just a function of its energy use but also of when and where that energy is consumed. For example, an all-electric building using air conditioning during peak afternoon hours in a grid with high solar penetration will have a lower carbon footprint than the same building consuming the same amount of energy at night when the grid relies more on fossil fuels. Our model, in its current form, cannot capture this crucial dynamic. Towards a More Robust Approach: To enhance the physical realism and accuracy of predictions, future iterations of the model should integrate time-sensitive grid carbon fac-tor data. This could involve using historical average data for different regional grids as a more refined input or, ideally, developing a model that can integrate with smart grid data for real-time or seasonal carbon accounting. This advancement would shift the prediction from a purely architectural form-based estimate to a more comprehensive operational carbon assessment.
- (4)
Other Limitations and Future Research Directions
Beyond the core limitations above, other areas warrant attention. The model’s per-formance, while excellent for a low-dimensional scenario, is ultimately constrained by the limited feature set. Incorporating additional early-stage parameters, such as building shape factor or primary orientation, could further improve accuracy. Furthermore, the practical tool, while a valuable contribution, would benefit from a more user-friendly in-terface and integration with common architectural design software (e.g., as a plug-in for BIM platforms) to lower the barrier to adoption by practitioners. In conclusion, the pro-posed MLP model presents a effective and practical solution for a well-defined problem: rapid carbon estimation with minimal inputs. The limitations discussed here are not flaws but rather clear signposts for the next stages of research. By addressing the oversimplification of climate zones, rigorously testing generalizability, and integrating dynamic grid factors, subsequent models can build upon this foundation to achieve even greater accuracy, robustness, and practical impact on sustainable building design globally.
6. Conclusions
To address the challenge of limited parameter dimensions and small sample sizes in early-stage building carbon emission prediction, this study developed a lightweight modeling approach based on a multilayer perceptron (MLP) neural network. The model integrates feature engineering and interpretability analysis to achieve robust prediction performance. The main conclusions are as follows:
The proposed MLP model, trained on 150 samples, demonstrated superior performance with a mean absolute error of 4160 tCO2 and an R2 of 0.966 on the test set, reducing the prediction error by 54.7% compared to traditional linear regression.
The model showed good adaptability to high-rise buildings (>15 floors), maintaining a mean error below 8.1%, which indicates its robustness for large-volume and structurally complex building types.
SHAP analysis confirmed floor area (51.2%) as the dominant predictor, while the novel composite indicators, HAR and LPA, collectively enhanced accuracy by 5.8%, offering quantifiable metrics for guiding spatial design in low-carbon projects.
The research outcomes were implemented into a Python-based rapid calculation tool, providing practitioners with a practical means for quick carbon estimation during preliminary design stages with minimal data input.
Despite the promising results, this study has several limitations that warrant further investigation. Firstly, the model’s development relied on a dataset of 150 samples. While strategies like regularization were employed to mitigate overfitting, the generalizability of the model needs to be more robustly validated with larger and more diverse datasets encompassing a wider range of building types, construction methods, and operational patterns. Secondly, the current model primarily relies on geometric and location parameters. Its practical accuracy could be enhanced by incorporating future building design parameters, such as envelope thermal properties and planned energy system types, once they become available in later design stages. Regarding practical usability, the developed tool lowers the barrier to entry for early carbon assessment. However, its effective integration into real-world design workflows requires consideration. The tool’s value is highest in the very early phases (e.g., schematic design) for quick benchmarking and option comparison. For definitive calculations or certification, designers must still rely on more detailed, high-fidelity simulation tools in later stages. Future work should focus on expanding the database, exploring interoperability with BIM platforms, and validating the tool’s impact on actual design decision-making to fully realize its potential in facilitating low-carbon construction.