Physics-Informed and Explainable Graph Neural Networks for Generalizable Urban Building Energy Modeling

Shan, Rudai; Ning, Hao; Xu, Qianhui; Su, Xuehua; Guo, Mengjin; Jia, Xiaohan

doi:10.3390/app15168854

Open AccessArticle

Physics-Informed and Explainable Graph Neural Networks for Generalizable Urban Building Energy Modeling

by

Rudai Shan

^*

,

Hao Ning

,

Qianhui Xu

,

Xuehua Su

,

Mengjin Guo

and

Xiaohan Jia

Jangho Architecture College, Northeastern University, Shenyang 110819, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(16), 8854; https://doi.org/10.3390/app15168854

Submission received: 5 July 2025 / Revised: 6 August 2025 / Accepted: 8 August 2025 / Published: 11 August 2025

(This article belongs to the Special Issue AI-Assisted Building Design and Environment Control)

Download

Browse Figures

Versions Notes

Abstract

Urban building energy prediction is a critical challenge for sustainable city planning and large-scale retrofit prioritization. However, traditional data-driven models struggle to capture real urban environments’ spatial and morphological complexity. In this study, we systematically benchmark a range of graph-based neural networks (GNNs)—including graph convolutional network (GCN), GraphSAGE, and several physics-informed graph attention network (GAT) variants—against conventional artificial neural network (ANN) baselines, using both shape coefficient and energy use intensity (EUI) stratification across three distinct residential districts. Extensive ablation and cross-district generalization experiments reveal that models explicitly incorporating interpretable physical edge features, such as inter-building distance and angular relation, achieve significantly improved prediction accuracy and robustness over standard approaches. Among all models, GraphSAGE demonstrates the best overall performance and generalization capability. At the same time, the effectiveness of specific GAT edge features is found to be district-dependent, reflecting variations in local morphology and spatial logic. Furthermore, explainability analysis shows that the integration of domain-relevant spatial features enhances model interpretability and provides actionable insight for urban retrofit and policy intervention. The results highlight the value of physics-informed GNNs (PINN) as a scalable, transferable, and transparent tool for urban energy modeling, supporting evidence-based decision making in the context of aging residential building upgrades and sustainable urban transformation.

Keywords:

urban building energy modeling (UBEM); graph neural networks (GNNs); explainable AI (XAI); physics-informed machine learning; urban building energy retrofit (UBER)

1. Introduction

Decarbonizing urban building stocks has become a critical priority in global climate mitigation efforts [1]. The operations of buildings account for 30% of global final energy consumption and 26% of global energy-related CO₂ emissions [2]. Within dense urban environments, complex morphological interactions between buildings—including shading, ventilation, and radiative exchanges—significantly influence energy demand patterns [3,4]. Accurate modeling of such interactions is essential for supporting large-scale energy retrofit planning and climate adaptation strategies.

In the context of northern Chinese cities, extensive residential districts constructed during the 1980s and 1990s present a significant challenge for urban energy management and retrofit planning. These legacy neighborhoods are distinguished by a high degree of homogeneity in the building typologies (height, floor count, and structural systems) [5]. However, a paucity of detailed sub-metered energy data hinders direct assessment of energy use and the prioritization of retrofit measures on a large scale. The proposed framework facilitates rapid, district-scale evaluation of energy consumption patterns in urban housing clusters by leveraging GIS-derived building attributes and graph-based modeling. This capability is of particular value in supporting data-driven policy design, identifying high-priority areas for intervention, and formulating scalable retrofit strategies without fine-grained monitoring infrastructure.

Urban building energy modeling (UBEM) frameworks have been developed to simulate the energy performance of building clusters and districts [6,7]. Traditional UBEM approaches are predominantly based on deterministic physics-based simulations. While physically rigorous, these methods often require extensive parameterization and calibration, which can be challenging to scale across diverse urban contexts. Moreover, such models are ill-equipped to handle cities with sparse data or rapidly evolving urban morphologies. Recent advances in machine learning, particularly GNNs, offer new opportunities to model urban energy patterns in a data-driven yet physics-informed manner [8,9]. Physics-informed refers to explicitly incorporating physically meaningful constraints or features—such as geometric, thermal, or spatial relationships—into the learning framework, ensuring that model predictions are consistent with established physical principles. By representing building clusters as graphs, GNNs can capture spatial dependencies and inter-building interactions through message passing, providing improved scalability and flexibility over conventional models. Explainable AI (XAI) has emerged as a crucial paradigm for developing machine learning models whose predictions and internal decision-making processes can be readily understood by humans. Explainable refers to the ability of a model to provide transparent, human-interpretable rationales for its outputs, allowing stakeholders to inspect, trust, and validate the system’s behavior. Such interpretability is particularly important for real-world urban energy planning, where accountability and domain expert involvement are required. Building on these advances, integrating GNNs with XAI techniques enables critical insights into the learned spatial representations and inter-building relationships, thereby supporting the development of transparent and trustworthy UBEM frameworks [10,11]. In parallel, advances in sensor-based parameter estimation and calibrated knowledge propagation provide new opportunities to enhance the fidelity of UBEM across data-sparse urban environments [12,13]. However, integrating such parameter inference techniques into scalable GNN-based UBEM frameworks remains underexplored.

Despite these promising developments, several key limitations persist. Most existing GNN-based UBEM studies treat inter-building relationships as purely topological or geometric, without explicitly encoding physically meaningful interactions such as shading effects, solar orientation, or radiative exchanges. However, recent physics-based UBEM research has demonstrated that accurate modeling of shading effects can significantly impact energy prediction accuracy [14]. Nevertheless, precise modeling of inter-building shading and radiative exchanges at the urban scale remains challenging, often requiring simplification strategies to ensure computational tractability even in physics-based approaches [15]. This further underscores the need for GNN-based UBEM models to incorporate physically meaningful interactions computationally efficiently. Furthermore, while attention-based GNNs provide mechanisms for explainability, few UBEM studies have systematically evaluated whether learned attention patterns align with known physical principles, limiting such models’ transparency and practical trustworthiness. Another critical gap concerns the generalization capability of GNN-based UBEM models. Many prior works focus on single-district or homogeneous datasets, raising questions about the scalability and transferability of these approaches across cities with heterogeneous urban morphologies, construction typologies, and climatic contexts [16]. Recent reviews of UBEM and urban climate modeling integration also highlight the challenges of dataset heterogeneity and limited cross-district generalization [17].

To address these challenges, this study proposes a physics-informed and explainable GNN framework that integrates physically meaningful edge attributes—including inter-building distance, angular relation, and simulated shading coefficients—into a scalable GNN architecture for urban building energy modeling. The framework is subjected to rigorous evaluation through the benchmarking of model performance across three heterogeneous urban clusters. This process demonstrates the model’s strong predictive capability and robustness in diverse morphological contexts. Furthermore, systematic explainability analyses verify that the learned attention patterns align with physical domain knowledge, enhancing model transparency and trustworthiness. By bridging physics-based understanding and interpretable AI, the proposed approach advances the state of the art in scalable, transparent, and transferable UBEM, supporting data-informed urban retrofit planning and climate adaptation strategies. It is important to note that this study relies on synthetic EUI generated by simulation rather than measured consumption, which may not fully capture real-world operational patterns. The implications and limitations of this approach are discussed in later sections.

The remainder of this paper is organized as follows. Section 2 reviews recent advances in GNN-based urban building energy modeling and related fields. Section 3 provides a comprehensive account of the study areas and the data preparation process. In addition, it offers a thorough overview of the proposed PINN architecture and training methodology. Section 4 presents experimental results, including district tests and validation, ablation studies, baseline comparisons, and explainability analyses. Section 5 discusses this work’s broader implications, limitations, and future directions. Finally, Section 6 summarizes the key points of the paper.

2. Literature Review

UBEM has emerged as a critical tool for supporting large-scale energy retrofit planning and climate mitigation strategies [18,19]. Early UBEM approaches predominantly relied on deterministic physics-based simulations of individual buildings or clusters [20,21]. While physically rigorous, these methods often require extensive parameterization and calibration, posing scalability challenges for large or data-sparse urban areas. To address these limitations, data-driven UBEM approaches have gained traction. Machine learning models, including regression-based methods, ensemble learners, and deep neural networks, have been applied to predict building energy consumption using readily available morphological [22,23], climatic [24,25], and operational data [26,27].

For example, random forest (RF) [28,29], support vector machine (SVM) [30,31], gradient boosting (GB) [32,33], and decision tree (DT) [31,34] models have all been widely applied for building-level load prediction, energy benchmarking, or occupant behavior inference in UBEM tasks. Other approaches, such as multilayer perceptron (MLP) [35] and K-nearest neighbor (KNN) [26,30], have been used for feature selection, classification, or clustering within urban energy datasets. While these techniques have shown utility in handling urban energy datasets, they typically treat each building as an independent sample and fail to model the complex spatial dependencies and inter-building effects that characterize urban energy systems. Convolutional neural networks (CNNs) [36] and recurrent neural networks (RNNs) [37], including long short-term memory (LSTM) models [30,38,39], have also been explored for capturing spatial patterns or temporal dependencies in building energy data. More broadly, deep learning frameworks have demonstrated the ability to model complex, nonlinear relationships in UBEM. However, while these models excel in pattern recognition, they typically treat buildings as independent samples or sequences, and lack an explicit mechanism for representing inter-building spatial dependencies—an aspect that GNNs naturally address.

Graph-based learning has recently emerged as a promising paradigm for modeling spatial–energy relationships. GNNs have been applied to a variety of urban modeling tasks, including traffic prediction [40], air quality modeling [41], district thermal comfort [42,43], and, more recently, building energy modeling [8,44,45]. By representing building clusters as graphs, GNNs can capture local spatial dependencies through message passing, offering improved performance and scalability compared to conventional models. Several studies have explored integrating morphological features and spatial relationships into GNN architectures to improve energy demand prediction accuracy [44,45,46]. Lu et al. proposed a GCN-based method for estimating design loads in complex buildings, representing the building as a graph of spatial blocks and quantifying the feature contributions using class activation mapping [46]. Liu et al. developed a dynamic spatial–temporal GNN for chiller energy prediction, which captures both spatial and temporal dependencies among building components and provides interpretable insights into feature contributions [47]. More recent works have begun incorporating dynamic factors such as temporal weather variations and occupancy patterns using GNN–LSTM hybrids, demonstrating improvements in occupancy profile prediction accuracy for office buildings [48,49]. GNNs have also been applied to address data sparsity and robustness, supporting missing data imputation and reliable energy prediction in cities with incomplete sensor coverage [50]. Despite recent progress, most GNN-based UBEM methods lack comprehensive integration of physically meaningful inter-building interactions, offer limited physical interpretability, and rarely validate systematic generalization across diverse urban contexts.

In this study, we selected three representative GNN architectures—GCN, GraphSAGE, and GAT—to benchmark and improve UBEM modeling systematically. GCN was chosen as the foundational architecture, capturing spatial dependencies via spectral graph convolutions, and was used as the baseline for graph-based learning. GraphSAGE is included for its inductive capability, which enables generalization to unseen urban contexts and supports scalability for large, heterogeneous graphs. GAT introduces attention mechanisms to enhance model interpretability by adaptively weighting the influence of neighboring nodes and edges during message passing. Together, these models represent the major families of GNNs, varying in complexity, scalability, and explainability. They provide a robust basis for exploring physics-informed, interpretable, and generalizable UBEM frameworks.

Recent studies have increasingly explored incorporating physically meaningful features into data-driven models. Physics-informed machine learning methods, including PINNs [51], have been extensively reviewed in infrastructure design, modeling, and resilience, and have shown promise in integrating domain knowledge with data-driven architectures. Several works have demonstrated that embedding physical constraints—such as energy conservation, solar geometry, or thermal coupling—into model design can significantly enhance prediction accuracy and robustness. Liu et al. [14] developed a novel shadow calculation approach based on the sunlight channel method, which explicitly leverages solar geometry and spatial relationships among buildings to accelerate and improve the accuracy of shadow and solar potential modeling in dense urban environments. This physics-informed algorithm enables highly efficient, accurate energy and solar analysis for complex urban forms, demonstrating the tangible benefits of integrating domain knowledge into urban energy modeling. Pavirani et al. [52] developed a demand response control strategy for residential heating using a physics-informed neural network as the predictive component within a reinforcement learning framework, showing that explicit incorporation of physical knowledge reduced prediction errors and improved both energy cost and occupant comfort compared to black-box neural networks. Their results demonstrated that such explicit representation of physical interactions between zones markedly improved the accuracy and robustness of energy use predictions at scale.

While PINNs have demonstrated considerable success in embedding physical laws (e.g., PDE constraints) within deep learning models for developing component and small-scale energy applications, their practical implementation at the urban scale remains constrained. This is primarily due to their reliance on explicit PDE formulations and the regularity of simulation domains, which pose challenges for heterogeneous, large-scale urban contexts. In contrast, GNNs naturally accommodate irregular urban morphologies and explicitly model inter-building physical relationships through graph-based representations. This renders GNNs particularly well suited for scalable urban building energy modeling, where complex spatial dependencies and variable topologies are prevalent. Recent studies have compared the effectiveness of GNN and PINN approaches in building energy modeling and control. For example, Nagarathinam and Vasan demonstrated that a physics-informed GNN (PI-GNN) not only better captures spatial thermodynamic interactions in open-plan spaces but also achieves superior performance compared to traditional PINNs in terms of model accuracy, computational efficiency, and energy optimization [53]. Halaccli et al. introduced a GNN framework for zone-level urban energy prediction that incorporates physically meaningful node features (geometry, material thermal properties, internal loads) and edge features (adjacent surface area and U-value), enabling the model to explicitly represent and leverage thermal coupling between zones at the urban scale [8].

Explainability has become a key challenge for data-driven UBEM, as most models still operate as black boxes and lack the transparency of traditional physics-based approaches [54]. To address growing demands for interpretable AI in urban energy modeling, attention-based GNNs and explainable graph learning frameworks have emerged as promising solutions [55]. Several recent studies leveraged graph attention mechanisms and explainable graph learning frameworks to quantify the relative importance of spatial neighbors or edge relationships in energy prediction tasks. Miraki et al. [11] introduced an explainable causal GNN that delivers both intrinsic and post hoc explanations for electricity demand forecasts, enhancing transparency across different grid levels. Huang et al. [56] combined active learning with graph recurrent networks to model district heating demand. This enables interpretability by attributing predictions to specific meters and influencing variables, supporting trustworthy decision making in real-world systems. Lin et al. [57] proposed an interpretable UBEM framework based on GNNs and attention mechanisms, emphasizing attention weights to provide transparent, building-level explanations for model predictions. Ruan et al. [58] introduced a dual attention mechanism and employed visualization of convolutional attention weights to enhance the interpretability of the relative importance of different input features in district-scale building energy prediction. Despite these advances, few UBEM studies rigorously evaluate whether model explanations align with underlying physical mechanisms, leaving open questions about their practical utility and domain trustworthiness [14].

The generalization and transferability of UBEM models across heterogeneous urban morphologies and climatic contexts are increasingly recognized as key challenges for large-scale deployment. While transfer learning, domain adaptation, and cross-district validation have shown potential in related fields [16], most GNN-based UBEM studies continue to focus on single-city or homogeneous datasets, limiting their robustness and practical applicability in real-world urban planning. Yu et al. [59] developed a data-driven framework for managing uncertainty due to limited model transferability in urban growth modeling, showing through multi-area, multi-period calibration and clustering that model parameters estimated for one area or period often fail to generalize to others. Their results highlight the value of parameter clustering and scenario development as tools for reflecting and managing uncertainty in cross-domain urban modeling applications. Eggimann et al. (2024) demonstrated the potential and limitations of transferring energy signatures across space and time, underscoring the urgent need for rigorous cross-district generalization testing in UBEM research [16]. Guo et al. [60] developed a probabilistic building characterization framework using copula-based methods to quantify parameter uncertainties and their correlations in district heat demand forecasting. Their results show that accounting for parameter heterogeneity and occupant behavior variability is essential for improving the robustness and transferability of UBEM predictions in real-world, heterogeneous urban environments. Similarly, Garg et al. [50] conducted a large-scale empirical validation of UBEM models for over 247,000 buildings in Chicago, systematically analyzing the sources of bias and generalization error across building types, vintages, and land uses. Their results highlight that while city-scale energy estimates can closely match metered data after careful calibration, substantial variance and bias remain at the sub-city and building-specific level, reinforcing the importance of robust, transferable modeling practices. Despite these advances, robust multi-area benchmarking remains underexplored. The present study addresses this gap by systematically evaluating model performance across diverse urban clusters, providing empirical evidence on the robustness and practical applicability of GNN-based UBEM frameworks.

3. Methodology

3.1. Overview of the Framework

The proposed framework integrates parametric urban modeling, PINN, and XAI techniques to enable scalable and interpretable urban building energy modeling (Figure 1). Building clusters are reconstructed via parametric workflows to extract geometric and shading attributes encoded into graphs with physically meaningful node and edge features. The PINN, equipped with attention mechanisms incorporating distance, angular relation, and shading, is trained to predict building-level EUI. Cross-district generalization tests and attention-based explainability analysis are performed to evaluate model robustness and interpretability.

3.2. Urban Building Graph Construction

This section delineates the methodology of constructing physics-informed graph representations for urban building clusters. The graph construction pipeline comprises three primary stages: the modeling of parametric data, the generation of graph topology, and the encoding of features with physically meaningful attributes.

3.2.1. Data Acquisition and Parametric Modeling

A parametric modeling workflow was employed to generate detailed 3D building models and extract key attributes for each structure. The initial phase of the research involved extracting building footprints from GIS datasets, which were then imported into Grasshopper3D for parametric modeling. Utilizing Grasshopper3D’s visual programming interface, algorithms were implemented to automate the generation of three-dimensional geometries based on inputs such as footprint area, estimated or measured building height, number of floors, and azimuthal orientation. Additional parameters, including the window-to-wall ratio, roof type, and building use, were either extracted from available datasets or assigned using typological standards.

The resulting 3D building geometries were exported as standardized geoJSON or IDF files and then exported and processed using UrbanOpt. This open-source urban modeling platform supports creating standardized geoJSON building files and enables batch energy simulation via EnergyPlus. UrbanOpt was employed to structure building attributes, assign location-specific parameters, and facilitate downstream energy modeling and simulation tasks. Employing a parametric workflow and UrbanOpt preprocessing facilitated the reproducible and scalable generation of building-level physical features and simulation-ready models. These models were then utilized for downstream graph construction and physics-informed learning.

3.2.2. Graph Construction and Physics-Informed Feature Encoding

Each building cluster was represented as an undirected graph

G = (V, E)

, where nodes

v \in V

represent individual buildings, and edges

e \in E

represent spatial adjacency relationships (Figure 2). Edges were established between building pairs with centroid distance below a domain-informed threshold

d_{t h r e s h}

, representing potential inter-building interactions. The primary objective of the graph topology design was to enable the GNN to effectively model local neighborhood interactions while avoiding the introduction of excessive noise from distant or physically irrelevant connections.

Building adjacency was defined based on a centroid distance threshold criterion. Specifically, an edge was created between two buildings

v_{i}

and

v_{j}

if the Euclidean distance

d_{i j}

between their centroids satisfied the following:

d_{i j} \leq d_{thresh},

(1)

where

d_{thresh}

was empirically set to 30 m, consistent with standard urban morphological practices and prior UBEM–GNN studies [57,61]. This threshold was selected to capture immediate spatial interactions relevant to energy exchange and mutual shading effects, while maintaining a manageable graph sparsity level.

This neighborhood-based topology construction approach offers two key advantages:

Scalability: The simple distance-based criterion enables efficient and consistent graph generation across heterogeneous clusters.
Physical interpretability: The resulting graph structure reflects local spatial adjacency patterns, facilitating intuitive interpretation of learned attention weights and their physical meaning in the urban context.

It is important to note that in this study, the graph topology was not explicitly designed to encode detailed aerodynamic or radiative transfer processes. Instead, the adjacency graph is a flexible scaffold for message passing, enabling the GNN to learn spatially aware patterns from node and edge features. The physics-informed nature of the model is primarily achieved through the design of these features, rather than through hardwired edge connectivity rules.

Building geometry datasets for the study areas were constructed through a 2.5D urban morphology processing workflow integrating multiple data sources. Node features comprised geometric and morphological attributes: floor area, height, number of floors, azimuth orientation, and a computed shape coefficient:

Area: Building footprint area (m²), derived from 2D cadastral polygons.
Height: Building height, extracted from LiDAR data and cross-validated with official height records.
Floors: Number of floors, estimated from building height and standard floor-to-floor dimensions.
Orientation: Azimuth angle of the main facade (degrees), determined by analyzing dominant facade orientations.
Shape coefficient (S): Defined as

$S = \frac{Area}{Floors \times h}$

(2)

where h is the assumed typical story height (e.g., h = 3 m); this parameter reflects the morphological compactness and thermal characteristics of each building.

Edge features captured key physics-based inter-building effects, including (i) centroid-to-centroid distance, (ii) relative azimuthal angle (angular offset), and (iii) a simulated shading coefficient, representing the reduction in direct solar access from j to i as computed from 3D massing analysis [14]. Each edge

e_{i j} \in E

was associated with a feature vector

e_{i j} = [d_{i j}, θ_{i j}, S_{i j}]

(3)

where

d_{i j}

is the centroid-to-centroid distance,

θ_{i j}

is the relative azimuthal angle, and

S_{i j}

is the simulated shading coefficient. These features are computed as follows:

Distance ( $d_{i j}$ ): Euclidean distance between the centroids of buildings i and j,

$d_{i j} = \sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2}}$

(4)

where $(x_{i}, y_{i})$ and $(x_{j}, y_{j})$ denote the centroid coordinates of buildings i and j, respectively.
Angular relation ( $θ_{i j}$ ): The angle between the vector from building i to j and the main solar direction (assumed south),

$θ_{i j} = arccos (\frac{({\vec{x}}_{j} - {\vec{x}}_{i}) \cdot \vec{s}}{∥ {\vec{x}}_{j} - {\vec{x}}_{i} ∥ \cdot ∥ \vec{s} ∥})$

(5)

where ${\vec{x}}_{i}, {\vec{x}}_{j}$ are the centroid coordinates of buildings i and j, and $\vec{s}$ is a unit vector towards the solar (e.g., south) direction.
Simulated shading coefficient ( $S_{i j}$ ): Reduction in direct solar access from j to i, computed from 3D massing analysis as

$S_{i j} = 1 - \frac{A_{i j}^{sun}}{A_{i}^{\max}}$

(6)

where $A_{i j}^{sun}$ is the effective sunlit facade area of building i considering obstruction by building j, and $A_{i}^{\max}$ is the maximum possible sunlit facade area without obstruction.

Embedding these physical attributes in the graph explicitly enables the GNN to model local building properties and inter-building energy interactions. The integration of these features supports a physics-aware message passing process. It underpins the model’s interpretability and generalization capacity, as demonstrated in the ablation and explainability experiments (see Section 4).

3.3. Physics-Informed Graph Neural Network Architecture

3.3.1. GNN Model Design and Message Passing

In this study, three representative graph neural network architectures were systematically benchmarked: GCN, GraphSAGE, and GAT. All models were constructed on the same graph structure, with consistent node and edge features as described above. The features of nodes comprise area, height, floors, and orientation. The features of edges comprise centroid distance, angular relation, and, optionally, simulated shading coefficients. The specific model variants and their configuration are detailed as follows:

GCN [62,63,64]: Aggregates neighborhood information using uniform weights (neighborhood averaging), providing a simple yet effective spatial baseline.
GraphSAGE [65]: Learns an aggregation function from sampled neighbors (using the mean aggregator in this work), enabling inductive learning and efficient embedding of unseen nodes or clusters. While GraphSAGE does not inherently exploit physics-based edge features, it benefits from the structured graph topology and informative node features, making it well suited for cross-district generalization and transfer to novel building clusters without retraining.
GAT [42,66]: Employs an attention mechanism to adaptively weight neighbor contributions, explicitly incorporating edge attributes to enhance physical interpretability and support improved accuracy, particularly in sparse or heterogeneous graphs.

A two-layer GAT architecture was selected on the basis that it offers adequate model capacity for learning spatial dependencies in building graphs of moderate size, while also mitigating the risk of overfitting and ensuring efficient training. ReLU activation functions were used due to their widespread adoption and empirical effectiveness in similar regression tasks. These choices were validated by preliminary experiments showing that deeper or alternative architectures did not yield significant performance gains.

Implementation is undertaken using PyTorch Geometric’s GATConv operator, which supports edge attributes in attention computation (Figure 3). For each connected node pair

(i, j)

, the node features

x_{i}

and

x_{j}

are first projected via a shared weight matrix W, concatenated with the edge attributes

e d g e_a t t r_{i j}

(comprising inter-building distance and relative orientation angle), and passed through a shared feedforward neural network to compute the raw attention score

e_{i j}

. This mechanism enables the model to encode spatial relationships in the attention weights explicitly:

e_{i j} = a ([W x_{i} | | W x_{j} | | e d g e_a t t r_{i j}])

(7)

The attention scores are then normalized across all neighbors of node i via the softmax function to obtain attention coefficients:

α_{i j} = \frac{exp (e_{i j})}{\sum_{k \in N (i)} exp (e_{i k})}

(8)

The final updated feature of node i is computed as a weighted aggregation of its neighbors’ projected features:

x_{i}^{'} = \sum_{j \in N (i)} α_{i j} \cdot (W x_{j})

(9)

This mechanism allows the model to learn physically interpretable inter-building influence patterns, as

α_{i j}

directly reflects the relative importance of neighboring buildings based on geometric and physical interactions. The physics-informed edge features are implemented to incorporate physical interdependencies into the message passing process: inter-building distance, angular offset relative to solar direction, and shading similarity coefficient

S_{i j}

derived from urban morphology simulation. These edge attributes are passed into the GATConv layers, enabling the attention mechanism to learn directional, distance-sensitive energy interactions between buildings.

3.3.2. Energy Labeling and Model Training

All GAT-based models were implemented using PyTorch (v2.5.1) and PyTorch Geometric (v2.6.1), and trained using the Adam optimizer with a fixed learning rate of 0.01. Training was conducted in full-graph mode due to the moderate graph sizes (typically 200–250 buildings per district). The dataset from each district was randomly split into 80% training and 20% test nodes. Early stopping with a patience of 30 epochs was applied based on test loss monitoring to prevent overfitting and improve training efficiency.

The target variable for training is the simulated annual EUI, derived from parametric models built using UrbanOpt and EnergyPlus. Each model configuration was run 20 times to enhance statistical robustness, and the averaged performance metrics were reported. These include root mean squared error (RMSE) and mean absolute error (MAE) on the test set.

A spatial-proximity-based graph was constructed for each district to model inter-building relationships. Building centroids were extracted from Rhino using Elefront, and KNN graphs were built using spatial coordinates. Each edge was assigned a feature vector consisting of Euclidean distance and relative orientation angle between building pairs. These edge attributes capture spatial interactions such as proximity and alignment, which are critical for energy-related dependencies like mutual shading or thermal exposure.

The resulting graph is defined by a node feature matrix

X

and edge index

E

with accompanying edge attributes, forming the input for downstream GAT models. Node features include area, height, floor count, and orientation. Continuous features were used directly without z-score standardization, and no categorical encoding was needed. To account for morphological variability across urban areas, all experiments were stratified based on the shape coefficient of buildings. This ensured consistent and comparable data partitions across districts during model training and evaluation.

Each image was adaptively cropped to cover all buildings within a given land parcel, with image sizes varying from 600 to 800 pixels in both dimensions to accommodate differences in building density and footprint. This approach ensured that each image provided complete, non-overlapping building masks suitable for annotation and model training.

3.4. Explainable AI and Model Interpretation

3.4.1. Physics-Aware Attention Analysis

Recent advances in XAI have enabled more transparent and interpretable machine learning models by providing insights into the decision-making process of complex architectures such as GNNs [10,67,68,69]. In the context of urban building energy modeling, XAI techniques are increasingly critical for building stakeholder trust and for validating that learned representations are physically meaningful [22,23,54,70].

To interpret the physical relevance of the learned graph attention weights, we designed an explainability analysis focusing on the relationship between attention coefficients and simulated inter-building shading effects. Specifically, we extracted the attention weights

α_{i j}

assigned to each edge

(i, j)

during message passing for each trained GAT model. These attention weights were then analyzed concerning the simulated shading coefficients

S_{i j}

in the corresponding edge attributes. The analysis involved plotting

α_{i j}

against

S_{i j}

for all edges in the graph, enabling a direct examination of whether the learned attention patterns correlate with physically meaningful shading interactions between buildings.

This interpretability analysis provides insight into whether the model is leveraging the physics-informed edge attributes as intended, thereby enhancing the trustworthiness and explanatory power of the proposed GNN architecture. The outcome, facilitated by the model’s attention mechanism, validates the model’s capacity to discriminate between edges, thereby ensuring the acquisition of a nuanced and physically coherent representation of inter-building energy influence.

3.4.2. Spatial Pattern Interpretation

In addition to analyzing individual attention scores, broader spatial trends in energy demand predictions were analyzed to understand how the GNN model captures urban morphological effects. The model successfully recovers spatial gradients of energy use that align with established urban energy drivers, such as solar access, building density, and building height [71,72,73]. Specifically, lower EUI predictions are observed for buildings with high solar exposure and minimal mutual shading, while dense central blocks exhibit higher EUI values due to increased shading and reduced ventilation [74,75]. Compared to baseline models, the physics-informed GNN better distinguishes such spatial heterogeneity.

Such spatially explicit predictions provide interpretable insight for targeting energy retrofits and urban energy planning [26,76].

3.5. Ablation and Benchmark Experiments

3.5.1. Comparison with Baseline Models

To further contextualize the performance of the proposed GAT-based architecture, we compared it against two additional baseline models commonly used in urban energy modeling:

ANN: A fully connected feedforward neural network trained on node features only, without considering inter-building relationships.
GCN: A standard GCNConv model utilizing node features and graph structure without edge attributes.

All baseline models were trained using the same set of node features, including geometric and physical attributes, ensuring a fair comparison. The ANN baseline was implemented using the scikit-learn MLPRegressor. The ANN consists of two fully connected hidden layers with 32 and 16 units, each followed by the ReLU activation function. The model was trained with the Adam optimizer, a maximum of 1000 iterations, and mean squared error loss. No explicit early stopping or dropout regularization was applied. All input features were standardized using z-score normalization before model training, which is critical for neural network convergence and performance. The GCN baseline employs two GCNConv layers of the same dimension.

Training was conducted using the Adam optimizer with a learning rate of 0.001 and mean squared error loss. Early stopping based on validation loss with a patience of 20 epochs was applied. All models were evaluated under the same cross-district generalization protocol, where training and test sets came from disjoint urban regions.

3.5.2. Ablation Study on Physics-Aware Attention

Classical feature importance techniques in machine learning have emerged as a prevalent approach for assessing variable contributions in tabular or vector-based machine learning models. Prominent examples of such techniques include the importance of SHAP and permutation features. However, applying these methods to GNNs, particularly those that employ relational edge-based features, is not without complexity. In the present context, edge attributes represent inter-building relationships that are not easily interpretable as independent input variables. GAT’s message passing mechanisms further complicate direct attribution using SHAP. Consequently, the process of ablation, defined as the systematic removal or modification of specific edge features, offers a more interpretable and targeted approach for evaluating each attribute’s physical relevance and contribution to the model’s performance in a relational setting. An ablation study was conducted to assess the contribution of the proposed physics-informed edge attributes to model performance. The survey systematically evaluated the model under different configurations of edge attributes in the GAT architecture.

Specifically, three model variants were compared:

GAT baseline (no edge attributes): Standard GATConv model without any edge attributes, relying solely on node features.
GAT + geometric attributes: GATConv model incorporating geometric edge attributes (distance and angular relation $θ_{i j}$ ).
GAT + full physics-informed attributes: GATConv model incorporating distance, angular relation $θ_{i j}$ , and simulated shading coefficient $S_{i j}$ as edge attributes.

Each model variant was trained and evaluated on the cross-district generalization task previously described. The objective was to ascertain the impact of each type of edge attribute on model performance and generalization capability. This study confirms that the GNN effectively leverages physically meaningful inter-building interactions when equipped with appropriate edge attributes, thereby supporting the rationale for the proposed physics-informed architecture.

All model performance was quantitatively assessed using RMSE and MAE. The metrics are computed as follows:

\begin{matrix} RMSE & = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}} \end{matrix}

(10)

\begin{matrix} MAE & = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} | \end{matrix}

(11)

where

y_{i}

denotes the observed value,

{\hat{y}}_{i}

is the model prediction, and N is the total number of test samples.

3.6. Districts Test and Comparison

The proposed framework was applied to three representative old residential neighborhoods in Shenyang, China: Shenhe District, Huanggu District, and Tiexi District (Figure 4). All three clusters consist primarily of mid-rise residential buildings constructed during the 1980s–1990s, featuring similar building typologies, construction eras, and urban forms. This homogeneous selection allows for a focused evaluation of the generalization ability of the physics-informed GNN model within old residential settings, minimizing confounding factors due to differences in building type or vintage.

District 1 (Shenhe; Old City Core): Located in the city center, this district consists of mid-rise residential buildings (6–8 floors) built mainly during the 1980s–1990s. Aging envelopes, limited insulation, and varied retrofitting histories characterize the buildings.
District 2 (Huanggu; Mixed Layout): A typical old residential neighborhood with similar building height and age distribution to District 1. Buildings mostly feature original construction materials and natural ventilation.
District 3 (Tiexi; Industrial Legacy): This district comprises clusters of mid-rise old residential buildings developed during the same period. It reflects the urban morphology and building characteristics common in industrial-adjacent residential areas in Shenyang.

Table 1 presents comprehensive statistics for the three selected residential districts. Despite all clusters being predominantly residential and built during the 1980s–1990s, their urban form and building characteristics exhibit significant diversity, shaped by differing planning regimes and historical development: District 1 represents the historical center of Shenyang, featuring the rectangular Fangcheng city wall and an orthogonal street network. Buildings here are highly regular in both area (mean

3867 m^{2}

, std

2131 m^{2}

) and height (mean 7.5 floors, std 0.8). The orientation is almost perfectly aligned (mean and mode

\approx 0^{°}

), indicating strict planning and uniform parcel subdivision. The shape coefficient is nearly zero for all buildings, confirming the prevalence of elongated, rectangular blocks. EUI shows moderate variability. District 2, in contrast, comprises a patchwork of multiple residential estates, resulting in the most extensive total area (

1.3 \times 10^{6} m^{2}

) and highest building count (225). Building areas and heights are more dispersed, and the orientation exhibits significant heterogeneity (mean

{98.2}^{°}

, std

{47.2}^{°}

), with both north–south and non-orthogonal layouts present. The shape coefficient is higher and more variable (mean

98.2

, std

47.2

), and the EUI distribution is also wider, reflecting a more diverse design logic and looser planning constraints. District 3 comprises mid-rise residential clusters originally developed to serve adjacent industrial zones. Its statistics (mean area

5026 m^{2}

, floors

7.2

, orientation mean

81 . 0^{°}

) are intermediate between D1 and D2. While less regular than D1, its block layout and energy use are more homogeneous than in D2. These districts span a spectrum from high regularity and planning uniformity (D1) to high morphological diversity (D2), with D3 as an intermediate case. This variation provides a robust basis for cross-district generalization testing of GNN models, particularly in learning spatial effects associated with distance, orientation, and geometric features.

To ensure the reliability and representativeness of building attribute distributions used for GNN modeling, all district datasets underwent rigorous data cleaning to remove outliers and anomalous values (e.g., abnormally large floor areas, atypical shape coefficients). Figure 5, Figure 6 and Figure 7 illustrate the statistical distributions (boxplots and histograms) of key attributes before and after cleaning for each district. In District 1 (Figure 5), which represents the highly regular urban core around the historical Fangcheng city wall, the cleaning process primarily eliminated a small number of outliers in building area and EUI. Most buildings are highly uniform in area, height, and orientation, with the boxplots revealing minimal spread and no significant outliers after cleaning. The orientation is tightly clustered near

0^{°}

, and the shape coefficient remains close to zero, reflecting a strict, orthogonal urban layout. EUI values are distributed around the mean with moderate dispersion. District 2 (Figure 6), formed by merging multiple estates with heterogeneous layouts, initially exhibits wide variability and numerous outliers in area, EUI, and especially the shape coefficient. Data cleaning removes buildings with extensive areas and extreme shape coefficients, resulting in a more symmetric and compact distribution across all features. The orientation in D3 remains widely spread even after cleaning, consistent with the mixed layout logic of the district. District 3 (Figure 7) is an intermediate case: before cleaning, there are extreme values in area and shape coefficient, and a few high outliers in EUI. The cleaning process significantly reduces the spread, yielding tighter boxplots and more concentrated histograms, but the variability in orientation and shape coefficient remains higher than in D1, though lower than in D2. Overall, data cleaning improves the consistency and representativeness of the samples in all three districts, ensuring that subsequent GNN model training and testing are robust to extreme values. The resulting attribute distributions (e.g., shape coefficient and EUI) are comparable across districts, preserving intrinsic differences in urban form and spatial heterogeneity central to the experimental design.

4. Results and Analysis

4.1. Comparison of Baseline and Graph-Based Models

To comprehensively assess the predictive capability and computational efficiency of neural network architectures for urban building energy modeling, we systematically benchmarked a conventional feedforward ANN alongside three classes of graph-based models—GCN, GraphSAGE, and several variants of the GAT—using shape coefficient stratification on the District 3 dataset. Table 2 presents the mean and standard deviation of RMSE, MAE, and training time for each model, aggregated over 20 independent runs after removal of outlier results. This performance trend is also visually confirmed in Figure 8, where GraphSAGE shows the lowest median error and smallest interquartile range across RMSE and MAE distributions.

The ANN baseline, which relies solely on individual building attributes and does not exploit spatial or relational information, consistently produced the most significant prediction errors while converging the fastest. Specifically, the ANN yielded an RMSE of 29.06 ± 10.58 and MAE of 23.11 ± 8.80, with an average training time of 3.74 ± 0.33 s. This underscores the inherent limitations of attribute-only models in capturing the spatial and morphological dependencies that fundamentally drive urban-scale energy use patterns.

In contrast, the graph-based models exhibited substantial gains in predictive performance, validating the importance of explicitly modeling spatial context and neighborhood structure. Among the graph-based models, GraphSAGE achieved the best overall accuracy (RMSE 16.22 ± 4.00, MAE 12.63 ± 2.52), outperforming both GCN (RMSE 41.46 ± 8.22, MAE 33.45 ± 6.97) and all GAT variants, while maintaining moderate computational cost (8.87 ± 0.86 s). GraphSAGE reduced the average prediction error by more than half compared to the ANN, underscoring the value of more expressive neighborhood aggregation in capturing inter-building effects. Though outperforming the ANN, the GCN model resulted in higher errors (RMSE = 41.46 ± 8.22, MAE = 33.45 ± 6.97) with a similar computational cost. The GAT variants, designed to integrate physics-aware edge attributes, exhibited varied performance: the baseline GAT (no edge features) and GAT with combined distance and angle generally performed well (e.g., GAT_distance + angle: RMSE = 28.81 ± 7.52, MAE = 22.60 ± 6.51), but none consistently surpassed GraphSAGE in either accuracy or stability.

For the GAT models, the physics-informed GAT (distance + angle) variant—incorporating both inter-building distance and angular relation as edge features—yielded the strongest performance among the attention-based architectures (RMSE 28.81 ± 7.52, MAE 22.60 ± 6.51), though it did not surpass GraphSAGE. Other GAT variants, such as those using only distance or angle, exhibited slightly higher errors and similar or longer runtimes (up to 23.25 ± 4.11 s), with GAT (baseline) showing moderate performance (RMSE 29.53 ± 8.22, MAE 23.73 ± 7.23). Across all GAT configurations, standard deviations were notably larger than for GraphSAGE, reflecting a degree of instability and sensitivity to random initialization.

While incorporating spatial relationships and physics-aware edge attributes leads to notable gains in prediction accuracy, these improvements come at the expense of higher computational cost, particularly for GAT-based models. As shown in Table 2, GAT variants typically require approximately two to three times longer training time than GraphSAGE, and up to six times longer than the ANN baseline. This increased computational burden is primarily due to the additional complexity of calculating attention coefficients and handling more expressive edge representations. Such trade-offs between model accuracy, interpretability, and computational efficiency must be carefully balanced in practical large-scale urban applications. For scenarios where rapid deployment and scalability are paramount, models like GraphSAGE may be preferred, achieving strong accuracy with relatively moderate runtime. In contrast, GAT-based models may be reserved for applications where interpretability and fine-grained spatial reasoning are of particular value, and higher computational resources are available.

These results confirm that graph-based neural networks—particularly GraphSAGE and GAT variants with physics-aware edge attributes—deliver substantial and robust improvements over conventional attribute-only models for urban energy modeling. Explicit integration of spatial relationships and morphological features enables these models to capture neighborhood effects and generalize across diverse urban forms. This provides a compelling case for adopting graph neural frameworks in practical retrofit analysis and energy policy development.

4.2. Ablation Study on Physics-Aware Edge Features

To systematically assess the contribution of physics-informed edge attributes in the GAT framework, we conducted an ablation study on District 3 using five model configurations: (1) GCN and GraphSAGE as graph-based baselines with no explicit edge features and (2) four GAT variants—one without edge features, and three incorporating either distance, angle, or both as edge attributes. Each configuration was evaluated under shape-coefficient and EUI-based stratification, with the results summarized in Table 3.

The findings reveal several clear trends. As expected, the GCN baseline, which aggregates information without any edge feature modulation, yielded the highest errors (RMSE = 41.46 ± 8.22; MAE = 33.45 ± 6.97 under shape stratification) and performed similarly under EUI stratification. GraphSAGE, benefiting from more expressive neighborhood aggregation, outperformed all other baselines (RMSE = 16.22 ± 4.00; MAE = 12.63 ± 2.52), demonstrating that even without explicit edge features, more flexible message passing architectures are better suited to capturing inter-building effects.

Among the GAT variants, the baseline model without edge attributes achieved moderate errors (RMSE = 29.53 ± 8.22; MAE = 23.73 ± 7.23 for shape; RMSE = 28.37 ± 9.31; MAE = 23.12 ± 7.89 for EUI), substantially better than GCN, but was not as robust as GraphSAGE. This indicates that the attention mechanism can capture some local structure but lacks domain-specific sensitivity.

Introducing physics-aware edge attributes improved model accuracy and robustness in most cases. The GAT (angle) variant, which uses only inter-building angular relations, delivered lower average errors and reduced variance (RMSE = 31.99 ± 9.20; MAE = 25.85 ± 8.16 for shape; RMSE = 25.67 ± 8.84; MAE = 20.65 ± 7.53 for EUI), confirming the physical intuition that orientation is a key driver of energy interactions, especially for solar access and mutual shading.

The distance-only variant (GAT (distance)) did not achieve further gains (RMSE = 32.90 ± 7.93; MAE = 26.51 ± 7.18 for shape) and sometimes introduced noise or instability, as reflected in the higher variance for EUI stratification. This suggests that distance as an edge feature may be less informative when the urban form is irregular or inter-building effects are not solely governed by proximity.

Combining both edge features (GAT (distance + angle)) generally produced the best trade-off between accuracy and stability (RMSE = 28.81 ± 7.52; MAE = 22.60 ± 6.51 for shape; RMSE = 31.29 ± 10.26; MAE = 25.70 ± 9.51 for EUI), outperforming the attention-only baseline and yielding more consistent results than single-feature variants. This highlights the advantage of incorporating multiple, physically grounded edge attributes within the message-passing mechanism.

In summary, the ablation study confirms that careful integration of domain-specific edge features—especially building orientation and, to a lesser extent, distance—significantly enhances GAT-based urban energy models’ predictive power and robustness. The observed performance gains reinforce the value of embedding interpretable, physics-aware structures into GNN architectures for practical building energy prediction and targeted retrofit analysis.

To visually complement the numerical results in Table 3, Figure 9 presents boxplots of RMSE and MAE for each graph-based model variant under the two stratification schemes. GraphSAGE exhibits the lowest median errors and narrowest interquartile range under shape-coefficient stratifications, confirming its strong predictive stability. Incorporating spatial edge attributes (distance and angle) among the GAT variants yields modest improvements over the baseline GAT, though still lagging behind GraphSAGE. In contrast, Figure 9b shows that under EUI-based stratification, performance gaps between models are less pronounced, with higher variance across all methods. These visual trends reinforce the conclusion that physics-aware edge attributes can enhance model expressiveness, but their effectiveness may vary depending on the stratification strategy and data distribution.

4.3. Cross-District Generalization Analysis

To rigorously assess graph-based neural models’ generalization ability and transferability in urban energy prediction, we conducted a comprehensive cross-district evaluation covering three distinct residential districts (D1, D2, D3). Table 4 and Table 5 summarize the RMSE and MAE of each model under two representative stratification schemes: shape-coefficient stratification and EUI stratification. Each configuration was trained and tested independently on each district using 20 repeated runs with outliers removed to ensure robust statistics.

Across both stratification strategies, several consistent trends are observed:

First, GraphSAGE achieves the best overall predictive accuracy in all three districts, regardless of the stratification method. For example, under shape-coefficient stratification, GraphSAGE attains an RMSE of 37.97 ± 11.08 in D1, 23.57 ± 6.05 in D2, and 16.22 ± 4.00 in D3, substantially outperforming GCN and GAT variants. This demonstrates the superior capability of GraphSAGE in extracting and propagating neighborhood information in diverse urban forms, likely due to its flexible aggregation mechanism that balances local and global contexts.

Second, all graph-based models outperform the GCN baseline by a large margin, especially in more heterogeneous or morphologically complex districts (D1, D2). In particular, GAT models—especially those with physical edge attributes (distance, angle, or both)—tend to achieve lower errors and higher robustness than the vanilla GAT or GCN. This effect is most pronounced in D2 and D3, where spatial relations and urban morphology influence energy use variability.

Third, no single edge feature consistently dominates among the GAT variants across all districts. For example, in D3, the combined distance + angle GAT delivers the best overall GAT performance (RMSE 28.81 ± 7.52). In contrast, in D2 and D1, the advantage of including both edge features is less clear, and the angle-only or baseline GAT may even match or surpass the more complex variant. This suggests that the effectiveness of specific physical edge attributes is district-dependent—i.e., different urban morphologies and spatial patterns require different relational encodings for optimal learning.

Fourth, shape-coefficient stratification generally produces lower RMSE and MAE values for all models compared to EUI stratification, and the performance gap between the best and worst models is slightly amplified under shape-based grouping. This is expected, as the shape coefficient directly encodes key morphological variables (e.g., compactness, surface-to-volume ratio) that strongly correlate with energy demand and spatial adjacency, thus improving the signal-to-noise ratio for graph-based learning.

Finally, these results underscore the critical importance of spatial and physical context in generalizable urban energy modeling. The superior and stable performance of GraphSAGE and physics-aware GAT models—especially as districts become larger and more morphologically complex—highlights the value of embedding real-world physical knowledge into the edge construction and message-passing process. At the same time, the non-monotonic ranking of different GAT edge designs across districts suggests a need for future research into adaptive or district-specific graph construction strategies.

In summary, cross-district experiments confirm that physically-informed GNNs substantially improve generalization and robustness over standard GCNs and that the choice of edge feature should be tailored to the local urban context. These insights are vital for large-scale, transferable deployment of GNN-based methods in urban retrofit, planning, and energy policy.

Figure 10 and Figure 11 show the RMSE and MAE distributions for all models across the three districts using boxplots. Each figure illustrates variability in performance under shape-coefficient and EUI stratification, respectively, highlighting differences between models and districts. These plots reveal that GraphSAGE achieves the lowest mean error across districts and exhibits the lowest performance variance, reinforcing its robustness. By contrast, GCN shows wider interquartile ranges and more frequent outliers. The GAT variants fall between these two extremes, with their performance being influenced by the inclusion of physical edge attributes. These visualizations emphasize the statistical stability and generalizability of physics-aware GNNs under repeated training and testing conditions.

4.4. Explainable AI and Model Interpretation

The cross-district generalization analysis demonstrates the quantitative superiority of graph-based neural networks for urban energy prediction. Furthermore, it provides valuable insights into the physical mechanisms underlying these gains, thereby enhancing the interpretability of GNN-based frameworks for practical deployment.

Firstly, the robust and consistent performance of GraphSAGE across all districts, irrespective of their spatial regularity or heterogeneity, can be directly attributed to its flexible aggregation of local neighborhood information. In the meticulously planned and organized District 1, where building orientation, area, and shape are uniform, the aggregation mechanism captures the repetitive and predictable spatial interactions, thereby explaining the relatively high but stable accuracy. As district morphology becomes more complex (D2 and D3), GraphSAGE demonstrates its aptitude for leveraging variable edge and node contexts adaptively. This robustness can be interpreted as a form of implicit physical reasoning: the model is effectively learning, for each district, which spatial relations (neighbor connectivity, local density) matter most for predicting energy use. Secondly, the variable ranking and performance of GAT variants across districts directly reveal how physical edge attributes interact with urban morphology to influence model behavior. Within the conventional grid of District 1, where orientations and adjacencies remain essentially constant, incorporating angular or distance-based edge attributes yields minimal enhancement over the fundamental GAT model. This phenomenon can be attributed to the absence of significant diversity that could be exploited. Conversely, District 2, characterized by its diverse building orientations, block sizes, and relaxed planning, demonstrates enhanced improvement when physical edge attributes, particularly angles, are incorporated. This finding underscores the critical nature of incorporating directionality metrics (e.g., solar access, shading, street exposure) when assessing spatial disorder in built environments. District 3, as an intermediate case, benefits most from combined (distance + angle) edge features, reflecting its moderate degree of regularity and morphological complexity. Thirdly, the effect of stratification itself is physically interpretable. The employment of shape-coefficient stratification, a methodology of grouping based on block compactness and geometry, has been demonstrated to reduce prediction errors when compared with EUI-based grouping. This observation is particularly evident in the context of graph-based models. This phenomenon can be attributed to the shape coefficient’s ability to capture hard constraints, such as the surface-to-volume ratio, solar exposure, and compactness, which are closely associated with energy performance. The graph structure inherently utilizes these constraints during message passing processes. The model’s stratification according to physically meaningful criteria has been shown to enhance the learning and explainability of the resulting model.

When considered collectively, the findings demonstrate that graph-based neural networks—particularly those informed by real-world physical features—are not merely black boxes. A systematic interpretation of their performance can be provided in light of district-level spatial statistics and planning logic. For instance, the shift in GAT edge attribution effectiveness across districts is directly mapped to changes in block orientation, density, and planning regularity; the superior generalization of GraphSAGE is explained by its ability to sense the appropriate neighborhood scale under varying urban morphologies. From an engineering and policy perspective, these results provide actionable interpretability: the models highlight which types of spatial features (compactness, orientation, adjacency) are most predictive under different urban contexts. This facilitates the development of bespoke graph construction strategies and feature selection protocols tailored to specific district morphologies. Furthermore, it enables practitioners to comprehend the factors contributing to the effectiveness (or lack thereof) of a given GNN model in a particular case. In summary, the explainability of the proposed GNN framework is grounded not just in output metrics but in its transparent mapping between learned representations and interpretable physical attributes of the urban environment. This paper aims to provide a comprehensive overview of the advanced AI methods and explore the potential for integrating these methods with domain-informed, evidence-based urban energy planning.

5. Discussion

5.1. Interpretation of Explainability Results

The explainability analysis in Section 4 demonstrates that the superior performance of graph-based neural networks is not an accident, but, rather, is rooted in their ability to capture and utilize physically meaningful relationships between buildings. Models such as GraphSAGE leverage flexible, data-driven aggregation to adaptively weight neighboring buildings, while GAT variants can be further tuned with explicit spatial features such as distance and angle. The incorporation of real-world mechanisms, including solar exposure, shading, and morphological adjacency, which underpin urban energy consumption, is facilitated by this design. The variations in the model’s performance across districts can be attributed to the distinct urban morphology of each district. This finding further supports the hypothesis that the interpretability of the model is grounded in both the learned representations and the physical properties of the urban environment. The interpretability of the model lends support to its practical adoption, as it enables decision-makers to comprehend not only which model performs optimally, but also the underlying reasons for this performance.

5.2. Cross-District Generalization Insights

The cross-district experiments provide compelling evidence for the transferability and robustness of GNN-based energy prediction models. The GraphSAGE model has been demonstrated to exhibit consistent high performance, irrespective of district regularity, thereby confirming its capacity for generalization across various urban typologies. The variable gains of different GAT variants further demonstrate that model sensitivity to edge features is context-dependent: physical attributes such as angle are more predictive in morphologically diverse districts, while combined features (distance + angle) excel where moderate regularity exists. The results of the stratification process highlight that the most physically interpretable groupings (e.g., by shape coefficient) yield enhanced accuracy and more stable, explainable models. These findings suggest that, for practical, city-scale applications, the selection of GNN architecture and stratification method should be tailored to the physical reality of the urban fabric.

5.3. Case Implications for Old Residential District Retrofit

In the context of aging residential districts, characterized by outdated construction, suboptimal orientation, and variable block layouts, the explainable performance of GNNs holds direct engineering significance. The models facilitate the implementation of more targeted retrofit strategies, thereby ensuring the optimal utilization of limited resources by accurately identifying the spatial drivers of energy inefficiency (e.g., poor orientation or excessive compactness). The interpretability of feature importance (e.g., which edge attributes dominate in a given district) supports the transparent prioritization of interventions. Furthermore, the capacity of the models to generalize across districts of differing morphological types suggests that a single, well-calibrated GNN framework has the potential to serve as a robust tool for retrofit planning across diverse urban settings, ranging from grid-like historic cores to fragmented estate districts.

5.4. Limitations and Future Work

Despite the encouraging outcomes, it is imperative to acknowledge the limitations that require consideration. Firstly, the analysis is based on a limited number of districts, and the underlying building energy data may not capture all operational and behavioral factors influencing real-world consumption. Specifically, this study relies on synthetic EUI values generated via physics-based simulation for training and evaluation purposes. While this approach facilitates controlled benchmarking and supports generalization analysis, it presents a notable limitation in northern China, where district heating is supplied centrally and electrical submetering does not accurately reflect heating or cooling loads. Consequently, real-world energy use patterns, particularly those associated with heating, may not be accurately reflected in the simulation outputs. Building envelope calibration has been performed in related research, with measured indoor temperature data to estimate infiltration and insulation parameters [5]. However, the calibration of these instruments is not the primary focus of this study, which is, instead, centered on methodological development. Nevertheless, the present study does not explicitly address uncertainty quantification for the simulated EUI or model predictions, as uncertainty estimation is inherently limited in deterministic physics-based simulations. However, it is acknowledged that rigorous uncertainty quantification is essential for real-world UBEM applications using measured energy data. Thorough uncertainty quantification must account for model errors, parameter variability, and measurement noise. It is anticipated that forthcoming research will place a premium on integrating uncertainty estimation methods to enhance the robustness and reliability of model outcomes in practical deployment scenarios.

Secondly, while the current framework explores only a subset of possible edge attributes, additional physical or contextual features—such as building envelope properties (e.g., insulation, infiltration), HVAC system type and efficiency, operational schedules, and detailed occupancy patterns—are recognized as highly relevant for improving both the accuracy and interpretability of urban energy modeling; however, acquiring such detailed physical information at the metropolitan scale remains challenging due to data availability, privacy, and standardization barriers. The physics-based features incorporated in this study, primarily geometric, morphological, and shading-related attributes, are most relevant to capturing urban-scale phenomena such as solar access and heat island effects, rather than building-level HVAC or operational behavior. Furthermore, the current framework does not explicitly extract or represent detailed building topology (e.g., adjacency, spatial configuration, or 3D relationships), which may further influence shading patterns and inter-building energy interactions. It is recommended that future research endeavors incorporate finer-grained topological features to enhance the physical realism and predictive accuracy of UBEM models, and investigate innovative approaches for extracting or inferring these physical features on a large scale. Such approaches may include the utilization of remote sensing, IoT sensor networks, or advanced parameter inference techniques [20]. Recent studies have demonstrated that integrating envelope calibration from in situ measurements [77], occupancy detection from smart meters [78,79], and HVAC operational modeling [80] can substantially improve model fidelity. Liu et al. developed a physics-based modeling framework to generate realistic U-value distributions for large residential building stocks, showing that neglecting this uncertainty can lead to substantial underestimation of heating/cooling demand and reduced variability across the building stock [77]. The present study’s findings underscore the necessity of incorporating parameter uncertainty into UBEM simulations, thereby enhancing the reliability of energy evaluation and urban planning. Mosteiro-Romero et al. showed that integrating campus-scale Wi-Fi data enables dynamic, activity-based occupant modeling for urban energy applications, illustrating the potential of large-scale IoT data to improve the realism of UBEM frameworks [81]. Research has also indicated the capacity of system identification and frequency domain analysis to formulate precise and efficient building energy models for online control and optimization, particularly in scenarios where traditional physics-based or data-driven models encounter challenges regarding scalability or generalization [80]. However, these approaches remain challenging to generalize across large, heterogeneous urban areas.

Thirdly, the sensitivity of GAT variants to initialization and hyperparameters remains an open challenge, especially in smaller or highly irregular districts. Future research is recommended to explore the development of adaptive graph construction strategies and automated feature selection, integrating these strategies with simulation-based or sensor-derived energy datasets. This integration aims to enhance the system’s robustness and generalization capabilities. Furthermore, extending the methodology to encompass temporal dynamics or transfer learning across cities signifies a valuable direction for expanding the application to more extensive policy and planning contexts.

Finally, as graph-based models become increasingly expressive and complex, computational efficiency emerges as a key consideration for real-world deployment, particularly in large-scale urban scenarios. The elevated resource demands of attention-based GNNs (e.g., GAT variants) may limit their practical applications. To address this issue, it is recommended that future research explore optimization strategies such as graph sparsification, dynamic subgraph sampling, and parallel or distributed training. These approaches have the potential to significantly reduce computational overhead, thereby enabling the adoption of advanced GNN architectures for city-scale energy modeling and real-time policy support.

6. Conclusions and Future Work

The present study systematically benchmarked conventional and graph-based neural network architectures for urban building energy modeling across three representative residential districts with diverse spatial morphologies. Integrating domain-informed edge attributes, encompassing inter-building distance and orientation, into advanced graph attention frameworks has substantially enhanced predictive accuracy, generalization, and robustness compared to traditional attribute-only models and basic graph convolutions. This enhancement in performance is attributed to the incorporation of physics-aware GNNs.

A rigorous experimental process across various districts determined that the GraphSAGE model consistently demonstrates optimal performance across urban forms. Conversely, the optimal design of GAT edge features is contingent upon the specific local spatial context. The findings of this study demonstrate that the incorporation of real-world physical relationships into the construction of graphs and the process of message passing is imperative for effectively capturing the multifaceted drivers of energy utilization in heterogeneous urban environments. Furthermore, the discernible correlation between model performance and urban morphological characteristics offers valuable interpretability, facilitating transparent and data-driven decision-making processes concerning urban retrofitting and energy policy.

The present study highlights several practical implications for urban energy management and policy creation. Firstly, graph-based frameworks—particularly those incorporating interpretable and physically meaningful features—provide a scalable, robust, and transferable foundation for large-scale analysis of building energy performance in diverse city contexts. This facilitates the systematic capture of spatial interactions by city planners and engineers, the identification of high-impact zones, and the prioritization of interventions across entire urban districts. Secondly, the demonstrated generalization ability of GNN models indicates that a single, well-calibrated architecture can be reliably deployed across different cities or neighborhoods. This would streamline the workflow for energy retrofit planning, policy assessment, and scenario analysis in aging residential areas. In practice, this suggests that data-driven, physics-aware GNNs have the potential to enable more targeted, efficient, and evidence-based decision-making processes for sustainable urban transformation.

Nevertheless, this study also highlights several open challenges that warrant further investigation. Chief among these is the need for broader geographic and climatic validation across diverse urban forms, including cities, climate zones, and more complex building typologies. The integration of richer contextual and operational factors and the further development of adaptive graph construction techniques to capture urban complexity more effectively are identified as additional challenges. Incorporating additional physical factors (e.g., wind exposure, thermal mass, occupancy schedules) into the definition of edge or node attributes can enhance the accuracy of models. However, the extraction of such data at the urban scale remains a challenging endeavor. Furthermore, incorporating temporal dynamics, such as seasonal and operational variations, remains a critical direction for comprehensive UBEM. Whilst the present study focuses on static energy modeling, future work will explore integrating time-series data via hybrid frameworks to enable spatiotemporal urban energy prediction. It is recommended that future research place a greater emphasis on the incorporation of temporal and multimodal data sources, the advancement of automated model selection processes, and the application of transfer learning methods. These strategies hold significant promise for enhancing the scalability, robustness, and practical utility of GNN-based frameworks in real-world urban energy management. Furthermore, as the intricacy of graph-based models rises, future endeavors should concentrate on formulating computational optimization strategies, such as graph sparsification and dynamic sampling, to guarantee the scalability and practical implementation of advanced GNN architectures in large-scale, real-world urban energy applications.

This study provides substantial empirical and methodological evidence for adopting physics-informed GNNs in urban building energy modeling. This paves the way for more interpretable, accurate, and generalizable tools for sustainable city planning and building retrofit. The proposed framework facilitates rapid, district-scale evaluation of energy consumption patterns in legacy urban residential areas where detailed monitoring data are unavailable. This capability is of particular value in supporting data-driven policy formulation, prioritization of retrofit investments, and the development of scalable retrofit strategies in large, morphologically homogeneous urban neighborhoods. The provision of actionable insights for policymakers and practitioners is a direct contribution of this work to accelerating urban decarbonization and effectively implementing building retrofit programs in data-sparse environments.

Author Contributions

Conceptualization, R.S.; methodology, R.S.; software, H.N.; validation, Q.X., X.S. and M.G.; formal analysis, R.S.; investigation, H.N. and X.J.; resources, R.S.; data curation, Q.X. and X.S.; writing—original draft preparation, R.S.; writing—review and editing, R.S.; visualization, H.N.; supervision, R.S.; project administration, R.S.; funding acquisition, R.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the Liaoning Provincial Natural Science Foundation Joint Fund (grant number: 2023-MSBA-102) and the Fundamental Research Funds for the Central Universities (grant number: N2311003).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANN	Artificial neural network;
CNN	Convolutional neural network;
DT	Decision tree;
EUI	Energy use intensity ( $kWh / (m^{2} \cdot a)$ );
GB	Gradient boosting;
GNN	Graph neural network;
GCN	Graph convolutional networ;k
GAT	Graph attention network;
KNN	K-nearest neighbor;
LiDAR	Light detection and ranging;
LSTM	Long short-term memory;
MAE	Mean absolute error;
MLP	Multilayer perceptron;
MSE	Mean squared error;
PINN	Physics-informed neural network;
RF	Random forest;
RNN	Recurrent neural network;
RMSE	Root mean square error;
$R^{2}$	Coefficient of determination;
SVM	Support vector machine;
UBEM	Urban building energy modeling;
UBER	Urban building energy retrofit;
XAI	Explainable artificial intelligence.

References

Wang, C.; Song, J.; Shi, D.; Reyna, J.L.; Horsey, H.; Feron, S.; Zhou, Y.; Ouyang, Z.; Li, Y.; Jackson, R.B. Impacts of climate change, population growth, and power sector decarbonization on urban building energy use. Nat. Commun. 2023, 14, 6434. [Google Scholar] [CrossRef] [PubMed]
International Energy Agency. Buildings-Energy System-IEA. 2025. Available online: https://www.iea.org/energy-system/buildings (accessed on 15 June 2025).
Sharston, R.; Singh, M. Urban morphology, urban heat island (UHI) and building energy consumption: A critical review of methods and relationships among influential parameters. Build. Serv. Eng. Res. Technol. 2025, 46, 561–584. [Google Scholar] [CrossRef]
Dab’at, A.A.; Alqadi, S. The impact of urban morphology on energy demand of a residential building in a Mediterranean climate. Energy Build. 2024, 325, 114989. [Google Scholar] [CrossRef]
Shan, R.; Lai, W.; Tang, H.; Leng, X.; Gu, W. Residential Building Renovation Considering Energy, Carbon Emissions, and Cost: An Approach Integrating Machine Learning and Evolutionary Generation. Appl. Sci. 2025, 15, 1830. [Google Scholar] [CrossRef]
Salvalai, G.; Zhu, Y.; Sesana, M.M. From building energy modeling to urban building energy modeling: A review of recent research trend and simulation tools. Energy Build. 2024, 319, 114500. [Google Scholar] [CrossRef]
Perwez, U.; Rasool, M.H.; Aziz, I.; Zia, U. UBEM-SER: Role of sufficiency, efficiency and renewable in the decarbonization of commercial building stock at city scale. Sustain. Cities Soc. 2025, 122, 106214. [Google Scholar] [CrossRef]
Halaçlı, E.G.; Canlı, İ.; İşeri, O.K.; Yavuz, F.; Akgül, Ç.M.; Kalkan, S.; Dino, I.G. A Novel Graph Neural Network for Zone-Level Urban-Scale Building Energy Use Estimation. In Proceedings of the 10th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, Istanbul, Turkey, 15–16 November 2023; pp. 169–176. [Google Scholar] [CrossRef]
Lei, B.; Liu, P.; Milojevic-Dupont, N.; Biljecki, F. Predicting building characteristics at urban scale using graph neural networks and street-level context. Comput. Environ. Urban Syst. 2024, 111, 102129. [Google Scholar] [CrossRef]
Nandan, M.; Mitra, S.; De, D. GraphXAI: A survey of graph neural networks (GNNs) for explainable AI (XAI). Neural Comput. Appl. 2025, 37, 10949–11000. [Google Scholar] [CrossRef]
Miraki, A.; Parviainen, P.; Arghandeh, R. Electricity demand forecasting at distribution and household levels using explainable causal graph neural network. Energy AI 2024, 16, 100368. [Google Scholar] [CrossRef]
Kastner, P.; Dogan, T. Towards Auto-Calibrated UBEM Using Readily Available, Underutilized Urban Data: A Case Study for Ithaca, NY. Energy Build. 2024, 317, 114286. [Google Scholar] [CrossRef]
Faure, X.; Lebrun, R.; Pasichnyi, O. Impact of time resolution on estimation of energy savings using a copula-based calibration in UBEM. Energy Build. 2024, 311, 114134. [Google Scholar] [CrossRef]
Liu, Z.; Zhou, X.; Shen, X.; Sun, H.; Yan, D. A novel acceleration approach to shadow calculation based on sunlight channel for urban building energy modeling. Energy Build. 2024, 315, 114244. [Google Scholar] [CrossRef]
Garreau, E.; Berthou, T.; Duplessis, B.; Partenay, V.; Marchio, D. Solar shading and multi-zone thermal simulation: Parsimonious modelling at urban scale. Energy Build. 2021, 249, 111176. [Google Scholar] [CrossRef]
Eggimann, S.; Fiorentini, M. Transferring energy signatures across space and time to assess their viability for rapid urban energy demand estimation. Energy Build. 2024, 316, 114348. [Google Scholar] [CrossRef]
Yu, Q.; Ketzler, G.; Mills, G.; Leuchner, M. Exploring the integration of urban climate models and urban building energy models through shared databases: A review. Theor. Appl. Climatol. 2025, 156, 266. [Google Scholar] [CrossRef]
Johari, F.; Lindberg, O.; Ramadhani, U.H.; Shadram, F.; Munkhammar, J.; Widén, J. Analysis of large-scale energy retrofit of residential buildings and their impact on the electricity grid using a validated UBEM. Appl. Energy 2024, 361, 122937. [Google Scholar] [CrossRef]
Thrampoulidis, E.; Hug, G.; Orehounig, K. Approximating optimal building retrofit solutions for large-scale retrofit analysis. Appl. Energy 2023, 333, 120566. [Google Scholar] [CrossRef]
Kamel, E. A systematic literature review of physics-based urban building energy modeling (UBEM) tools, data sources, and challenges for energy conservation. Energies 2022, 15, 8649. [Google Scholar] [CrossRef]
Ferrando, M.; Causone, F.; Hong, T.; Chen, Y. Urban building energy modeling (UBEM) tools: A state-of-the-art review of bottom-up physics-based approaches. Sustain. Cities Soc. 2020, 62, 102408. [Google Scholar] [CrossRef]
Eshraghi, P.; Talami, R.; Dehnavi, A.N.; Mirdamadi, M.; Zomorodian, Z.S. Adopting Explainable-AI to investigate the impact of urban morphology design on energy and environmental performance in dry-arid climates. Adv. Build. Energy Res. 2025, 19, 497–531. [Google Scholar] [CrossRef]
Li, Z.; Ma, J.; Jiang, F.; Zhang, S.; Tan, Y. Assessing the impacts of urban morphological factors on urban building energy modeling based on spatial proximity analysis and explainable machine learning. J. Build. Eng. 2024, 85, 108675. [Google Scholar] [CrossRef]
Worthy, A.; Ashayeri, M.; Abbasabadi, N. Leveraging Earth Observational Data Products and Machine Learning to Enhance Urban Building Energy Modeling (UBEM) with Microclimate Effects. Sustain. Cities Soc. 2025, 130, 106544. [Google Scholar] [CrossRef]
Mondal, N.; Anand, P.; Khan, A.; Deb, C.; Cheong, D.; Sekhar, C.; Niyogi, D.; Santamouris, M. Systematic review of the efficacy of data-driven urban building energy models during extreme heat in cities: Current trends and future outlook. Build. Simul. 2024, 17, 695–722. [Google Scholar] [CrossRef]
Ali, U.; Bano, S.; Shamsi, M.H.; Sood, D.; Hoare, C.; Zuo, W.; Hewitt, N.J.; O’Donnell, J. Urban building energy performance prediction and retrofit analysis using data-driven machine learning approach. Energy Build. 2024, 303, 113768. [Google Scholar] [CrossRef]
El-Maraghy, M.; Metawie, M.; Safaan, M.; Eldin, A.S.; Hamdy, A.; El Sharkawy, M.; Abdelaty, A.; Azab, S.; Marzouk, M. Predicting energy consumption of mosque buildings during the operation stage using deep learning approach. Energy Build. 2024, 303, 113829. [Google Scholar] [CrossRef]
Ma, J.; Cheng, J.C. Identifying the influential features on the regional energy use intensity of residential buildings based on Random Forests. Appl. Energy 2016, 183, 193–201. [Google Scholar] [CrossRef]
Wang, C.e.a. An innovative method to predict the thermal parameters of construction assemblies for urban building energy models. Build. Environ. 2022, 224, 109541. [Google Scholar] [CrossRef]
Wang, W.; Lin, Q.; Chen, J.; Li, X.; Sun, Y.; Xu, X. Urban building energy prediction at neighborhood scale. Energy Build. 2021, 251, 111307. [Google Scholar] [CrossRef]
Gao, G.; Yang, S. Construction and Research of a Data-Driven Energy Consumption Evaluation Model for Urban Building Operation. IEEE Access 2023, 11, 139439–139456. [Google Scholar] [CrossRef]
Sauer, J.; Mariani, V.C.; dos Santos Coelho, L.; Ribeiro, M.H.D.M.; Rampazzo, M. Extreme gradient boosting model based on improved Jaya optimizer applied to forecasting energy consumption in residential buildings. Evol. Syst. 2022, 13, 577–588. [Google Scholar] [CrossRef]
Amiri, S.S.; Mueller, M.; Hoque, S. Investigating the application of a commercial and residential energy consumption prediction model for urban Planning scenarios with Machine Learning and Shapley Additive explanation methods. Energy Build. 2023, 287, 112965. [Google Scholar] [CrossRef]
Nyawa, S.e.a. Transparent machine learning models for predicting decisions to undertake energy retrofits in residential buildings. Ann. Oper. Res. 2023, 1–29. [Google Scholar] [CrossRef]
Xu, Y.; Li, F.; Asgari, A.; Momeni, S.; Eghbalian, A.; Talebzadeh, M.; Paksaz, A.; Bakhtiarvand, S.K.; Shahabi, S. Prediction and optimization of heating and cooling loads in a residential building based on multi-layer perceptron neural network and different optimization algorithms. Energy 2022, 240, 122692. [Google Scholar] [CrossRef]
Geng, X.; Cai, S.; Gou, Z. Assessing BIPV potential in dense urban areas using CNN models. Appl. Energy 2025, 377, 124716. [Google Scholar] [CrossRef]
Koschwitz, D.; Spinnräker, E.; Frisch, J.; van Treeck, C. Long-term urban heating load predictions based on optimized retrofit orders: A cross-scenario analysis. Energy Build. 2020, 208, 109637. [Google Scholar] [CrossRef]
Li, G.; Zhao, X.; Fan, C.; Fang, X.; Li, F.; Wu, Y. Assessment of long short-term memory and its modifications for enhanced short-term building energy predictions. J. Build. Eng. 2021, 43, 103182. [Google Scholar] [CrossRef]
Pan, X.; Xu, Y.; Hong, T. Surrogate modelling for urban building energy simulation based on the bidirectional long short-term memory model. J. Build. Perform. Simul. 2024, 1–19. [Google Scholar] [CrossRef]
Roy, A.; Roy, K.K.; Ahsan Ali, A.; Amin, M.A.; Rahman, A.M. SST-GNN: Simplified spatio-temporal traffic forecasting model using graph neural network. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Virtual, 11–14 May 2021; Springer: Cham, Switzerland, 2021; pp. 90–102. [Google Scholar] [CrossRef]
Mandal, S.; Thakur, M. A city-based PM2.5 forecasting framework using Spatially Attentive Cluster-based Graph Neural Network model. J. Clean. Prod. 2023, 405, 137036. [Google Scholar] [CrossRef]
Zheng, L.; Lu, W. Urban micro-scale street thermal comfort prediction using a ‘graph attention network’ model. Build. Environ. 2024, 262, 111780. [Google Scholar] [CrossRef]
Yu, Y.; Li, P.; Huang, D.; Sharma, A. Street-level temperature estimation using graph neural networks: Performance, feature embedding and interpretability. Urban Clim. 2024, 56, 102003. [Google Scholar] [CrossRef]
Cheng, X.; Hu, Y.; Huang, J.; Wang, S.; Zhao, T.; Dai, E. Urban building energy modeling: A time-series building energy consumption use simulation prediction tool based on graph neural network. In Computing in Civil Engineering 2021; American Society of Civil Engineers: Reston, VA, USA, 2021; pp. 188–195. [Google Scholar] [CrossRef]
Hu, Y.; Cheng, X.; Wang, S.; Chen, J.; Zhao, T.; Dai, E. Times series forecasting for urban building energy consumption based on graph convolutional network. Appl. Energy 2022, 307, 118231. [Google Scholar] [CrossRef]
Lu, J.; Zhang, C.; Li, J.; Zhao, Y.; Qiu, W.; Li, T.; Zhou, K.; He, J. Graph convolutional networks-based method for estimating design loads of complex buildings in the preliminary design stage. Appl. Energy 2022, 322, 119478. [Google Scholar] [CrossRef]
Liu, Q.; Cheng, X.; Shi, J.; Ma, Y.; Peng, P. Modeling and predicting energy consumption of chiller based on dynamic spatial-temporal graph neural network. J. Build. Eng. 2024, 91, 109657. [Google Scholar] [CrossRef]
Xie, Y.; Stravoravdis, S. Generating occupancy profiles for building simulations using a hybrid GNN and LSTM framework. Energies 2023, 16, 4638. [Google Scholar] [CrossRef]
Jia, X.; Song, H.; Nan, X.; Cai, X. Prediction of short-term heat load of office buildings based on GNN-LSTM modeling. In Proceedings of the 2024 6th International Conference on Frontier Technologies of Information and Computer (ICFTIC), Qingdao, China, 13–15 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1145–1148. [Google Scholar] [CrossRef]
Garg, A.; Correa, S.; Li, F.; Chowdhury, S.; New, J.; Bacabac, K.; Kunkel, C.; Baird, D. Empirical Validation of UBEM: An Assessment of Bias in Urban Building Energy Modeling for Chicago; Technical Report; Oak Ridge National Laboratory (ORNL): Oak Ridge, TN, USA, 2024. Available online: https://www.osti.gov/biblio/2301660 (accessed on 15 June 2025).
Chew, A.W.Z.; He, R.; Zhang, L. Physics Informed Machine Learning (PIML) for design, management and resilience-development of urban infrastructures: A review. Arch. Comput. Methods Eng. 2025, 32, 399–439. [Google Scholar] [CrossRef]
Pavirani, F.; Gokhale, G.; Claessens, B.; Develder, C. Demand response for residential building heating: Effective Monte Carlo Tree Search control based on physics-informed neural networks. Energy Build. 2024, 311, 114161. [Google Scholar] [CrossRef]
Nagarathinam, S.; Vasan, A. PhyGICS–A Physics-informed Graph Neural Network-based Intelligent HVAC Controller for Open-plan Spaces. In Proceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems, Singapore, 4–7 June 2024; pp. 203–214. [Google Scholar] [CrossRef]
Ang, Y.Q.; Yan, M.; Ma, N. On interpretable and explainable machine learning for urban building energy modeling. In Proceedings of the 15th ACM International Conference on Future and Sustainable Energy Systems, Singapore, 4–7 June 2024; pp. 683–686. [Google Scholar] [CrossRef]
Liu, P.; Zhang, Y.; Biljecki, F. Explainable spatially explicit geospatial artificial intelligence in urban analytics. Environ. Plan. B Urban Anal. City Sci. 2024, 51, 1104–1123. [Google Scholar] [CrossRef]
Huang, Y.; Zhao, Y.; Wang, Z.; Liu, X.; Liu, H.; Fu, Y. Explainable district heat load forecasting with active deep learning. Appl. Energy 2023, 350, 121753. [Google Scholar] [CrossRef]
Lin, D.; Xu, X.; Liu, K.; Wu, T.; Wang, X.; Zhang, R. Interpretable data-driven urban building energy modeling considering inter-building effect. Build. Environ. 2025, 274, 112688. [Google Scholar] [CrossRef]
Ruan, Y.; Ma, Y.; Xu, T.; Yao, Y.; Meng, H.; Qian, F.; Wang, C.; Liu, W. Interpretable Multi-Feature District Building Energy Consumption Prediction Model: Based on Dual Attention Mechanism and Spatiotemporal Graph Convolutional Network. SSRN Electron. J. 2024. [Google Scholar] [CrossRef]
Yu, J.; Hagen-Zanker, A.; Santitissadeekorn, N.; Hughes, S. A data-driven framework to manage uncertainty due to limited transferability in urban growth models. Comput. Environ. Urban Syst. 2022, 98, 101892. [Google Scholar] [CrossRef]
Guo, R.; Shamsi, M.H.; Sharifi, M.; Saelens, D. Exploring uncertainty in district heat demand through a probabilistic building characterization approach. Appl. Energy 2025, 377, 124411. [Google Scholar] [CrossRef]
Wu, Z.; Li, M.; Liu, W.; Cheng, J.C.; Wang, Z.; Kwok, H.H.; Huang, C.; Hou, F. Developing surrogate models for the early-stage design of residential blocks using graph neural networks. Build. Simul. 2025, 18, 679–698. [Google Scholar] [CrossRef]
Yang, C.; Li, S.; Gou, Z. Spatiotemporal prediction of urban building rooftop photovoltaic potential based on GCN-LSTM. Energy Build. 2025, 334, 115522. [Google Scholar] [CrossRef]
Lu, J.; Zheng, Z.; Zhang, C.; Zhao, Y.; Feng, C.; Choudhary, R. Graph convolutional networks-based method for uncertainty quantification of building design loads. Build. Simul. 2025, 18, 321–337. [Google Scholar] [CrossRef]
Cai, J.; Yang, H.; Song, C.; Xu, K. A novel graph convolutional network-based interpretable method for chiller energy consumption prediction considering the spatiotemporal coupling between variables. Energy 2024, 312, 133639. [Google Scholar] [CrossRef]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1024–1034. [Google Scholar] [CrossRef]
Jia, Y.; Wang, J.; Hosseini, M.R.; Shou, W.; Wu, P.; Mao, C. Temporal graph attention network for building thermal load prediction. Energy Build. 2024, 321, 113507. [Google Scholar] [CrossRef]
Roscher, R.; Bohn, B.; Duarte, M.F.; Garcke, J. Explainable machine learning for scientific insights and discoveries. IEEE Access 2020, 8, 42200–42216. [Google Scholar] [CrossRef]
Belle, V.; Papantonis, I. Principles and practice of explainable machine learning. Front. Big Data 2021, 4, 688969. [Google Scholar] [CrossRef]
Agarwal, C.; Queen, O.; Lakkaraju, H.; Zitnik, M. Evaluating explainability for graph neural networks. Sci. Data 2023, 10, 144. [Google Scholar] [CrossRef]
Li, Z.; Ma, J.; Jiang, F. Exploring the effects of 2D/3D building factors on urban energy consumption using explainable machine learning. J. Build. Eng. 2024, 97, 110827. [Google Scholar] [CrossRef]
Wang, P.; Liu, Z.; Zhang, L. Sustainability of compact cities: A review of Inter-Building Effect on building energy and solar energy use. Sustain. Cities Soc. 2021, 72, 103035. [Google Scholar] [CrossRef]
Ni, H.; Wang, D.; Zhao, W.; Jiang, W.; Mingze, E.; Huang, C.; Yao, J. Enhancing rooftop solar energy potential evaluation in high-density cities: A Deep Learning and GIS based approach. Energy Build. 2024, 309, 113743. [Google Scholar] [CrossRef]
Lan, H.; Gou, Z.; Hou, C. Understanding the relationship between urban morphology and solar potential in mixed-use neighborhoods using machine learning algorithms. Sustain. Cities Soc. 2022, 87, 104225. [Google Scholar] [CrossRef]
Yue, Y.; Yan, Z.; Ni, P.; Lei, F.; Qin, G. Promoting solar energy utilization: Prediction, analysis and evaluation of solar radiation on building surfaces at city scale. Energy Build. 2024, 319, 114561. [Google Scholar] [CrossRef]
Ren, H.; Xu, C.; Ma, Z.; Sun, Y. A novel 3D-geographic information system and deep learning integrated approach for high-accuracy building rooftop solar energy potential characterization of high-density cities. Appl. Energy 2022, 306, 117985. [Google Scholar] [CrossRef]
Wenninger, S.; Karnebogen, P.; Lehmann, S.; Menzinger, T.; Reckstadt, M. Evidence for residential building retrofitting practices using explainable AI and socio-demographic data. Energy Rep. 2022, 8, 13514–13528. [Google Scholar] [CrossRef]
Liu, Z.; Zhou, X.; Tian, W.; Liu, X.; Yan, D. Impacts of uncertainty in building envelope thermal transmittance on heating/cooling demand in the urban context. Energy Build. 2022, 273, 112363. [Google Scholar] [CrossRef]
Doma, A.; Ouf, M. Modelling occupant behaviour for urban scale simulation: Review of available approaches and tools. Build. Simul. 2023, 16, 169–184. [Google Scholar] [CrossRef]
Banfi, A.; Ferrando, M.; Li, P.; Shi, X.; Causone, F. Integrating Occupant Behaviour into Urban-Building Energy Modelling: A Review of Current Practices and Challenges. Energies 2024, 17, 4400. [Google Scholar] [CrossRef]
Li, X.; Wen, J. Building energy consumption on-line forecasting using physics based system identification. Energy Build. 2014, 82, 1–12. [Google Scholar] [CrossRef]
Mosteiro-Romero, M.; Miller, C.; Quintana, M.; Chong, A.; Stouffs, R. Leveraging campus-scale Wi-Fi data for activity-based occupant modeling in urban energy applications. J. Phys. Conf. Ser. 2023, 2600, 132008. [Google Scholar] [CrossRef]

Figure 1. Physics-informed and explainable GNN-based urban energy modeling framework.

Figure 2. Workflow for graph-based representation of urban building clusters, from footprint extraction and morphological feature generation to construction of the spatial graph.

Figure 3. Structure of the GAT attention mechanism.

Figure 4. Location and schematic map of the three selected old residential districts in Shenyang, China.

Figure 5. Comparison of building attribute distributions in District 1 before and after data cleaning: (a) boxplot before cleaning, (b) boxplot after cleaning, (c) histogram before cleaning, (d) histogram after cleaning. Data cleaning removes outliers and extreme values (e.g., overly large floor area, atypical shape coefficients), ensuring sample representativeness for urban GNN modeling.

Figure 6. Comparison of building attribute distributions in District 2 before and after data cleaning: (a) boxplot before cleaning, (b) boxplot after cleaning, (c) histogram before cleaning, (d) histogram after cleaning.

Figure 7. Comparison of building attribute distributions in District 3 before and after data cleaning: (a) boxplot before cleaning, (b) boxplot after cleaning, (c) histogram before cleaning, (d) histogram after cleaning.

Figure 8. Boxplots of RMSE and MAE across baseline and graph-based models.

Figure 9. Boxplot comparisons of different GNN models under (a) shape-coefficient stratification and (b) EUI stratification in District 3.

Figure 10. Cross-district comparison of RMSE and MAE across Districts 1–3 under shape-coefficient stratification.

Figure 11. Cross-district comparison of RMSE and MAE across Districts 1–3 under EUI-coefficient stratification.

Table 1. Descriptive statistics for three study districts.

Variable	District 1	District 2	District 3
N	136	225	200
Total Area (m²)	525,926	1,299,668	1,005,192
Floors	7.5 ± 0.8 [6–9]	7.5 ± 1.0 [5–10]	7.2 ± 0.8 [6–10]
Height (m)	22.5 ± 2.5 [18–27]	22.4 ± 3.1 [15–30]	21.7 ± 2.3 [18–30]
Area (m²)	3867.1 ± 2131.3 [718–9242]	5776.3 ± 2033.4 [2107–9952]	5026.0 ± 1663.0 [1896–8900]
Shape Coefficient	0.0 ± 0.0	98.2 ± 47.2	81.0 ± 54.5
EUI (kWh/m²)	173.1 ± 94.7 [34–422]	258.6 ± 87.5 [100–535]	232.1 ± 74.6 [102–424]
Orientation	Mean: 0.0°; Mode: 0.0°	Mean: 98.2°; Mode: 31.4°	Mean: 81.0°; Mode: 90.1°

Table 2. Comparison of baseline and graph-based models (District 3, shape-coefficient stratification, 20 runs, outliers removed).

Model	RMSE (Mean ± Std)	MAE (Mean ± Std)	Time (s)
ANN (Baseline)	29.06 ± 10.58	23.11 ± 8.80	3.74 ± 0.33
GCN	41.46 ± 8.22	33.45 ± 6.97	9.05 ± 0.84
GraphSAGE	16.22 ± 4.00	12.63 ± 2.52	8.87 ± 0.86
GAT (baseline)	29.53 ± 8.22	23.73 ± 7.23	17.44 ± 1.37
GAT (distance)	32.90 ± 7.93	26.51 ± 7.18	23.25 ± 4.11
GAT (angle)	31.99 ± 9.20	25.85 ± 8.16	15.73 ± 0.72
GAT (distance + angle)	28.81 ± 7.52	22.60 ± 6.51	18.63 ± 1.70

Table 3. Ablation study on physics-aware edge features under shape-coefficient and EUI stratification (District 3, 20 runs).

Model	Edge Attributes	RMSE	MAE	RMSE	MAE
GCN (baseline)	None	41.46 ± 8.22	33.45 ± 6.97	42.31 ± 7.84	33.92 ± 7.10
GraphSAGE	None	16.22 ± 4.00	12.63 ± 2.52	21.14 ± 8.92	16.37 ± 6.89
GAT (baseline)	None	29.53 ± 8.22	23.73 ± 7.23	28.37 ± 9.31	23.12 ± 7.89
GAT (distance)	Distance ( $d_{i j}$ )	32.90 ± 7.93	26.51 ± 7.18	29.81 ± 10.07	23.91 ± 9.41
GAT (angle)	Angle ( $θ_{i j}$ )	31.99 ± 9.20	25.85 ± 8.16	25.67 ± 8.84	20.65 ± 7.53
GAT (distance + angle)	Distance + Angle	28.81 ± 7.52	22.60 ± 6.51	31.29 ± 10.26	25.70 ± 9.51

Table 4. Cross-district ablation study on physics-aware edge features under shape-coefficient stratification (20 runs, outliers removed).

Model	Edge Attr.	District 1		District 2		District 3
Model	Edge Attr.	RMSE	MAE	RMSE	MAE	RMSE	MAE
GCN (baseline)	None	56.89 ± 15.27	44.09 ± 11.46	47.10 ± 11.52	37.85 ± 8.91	41.46 ± 8.22	33.45 ± 6.97
GraphSAGE	None	37.97 ± 11.08	28.73 ± 8.66	23.57 ± 6.05	17.98 ± 4.26	16.22 ± 4.00	12.63 ± 2.52
GAT (baseline)	None	43.03 ± 12.09	32.00 ± 10.62	25.93 ± 7.82	19.80 ± 5.01	29.53 ± 8.22	23.73 ± 7.23
GAT (distance)	Distance ( $d_{i j}$ )	43.62 ± 13.55	33.10 ± 10.19	28.71 ± 6.31	21.75 ± 4.87	32.90 ± 7.93	26.51 ± 7.18
GAT (angle)	Angle ( $θ_{i j}$ )	42.39 ± 12.04	32.02 ± 9.11	26.91 ± 8.08	20.07 ± 5.43	31.99 ± 9.20	25.85 ± 8.16
GAT (dist+angle)	Dist.+Angle	42.79 ± 13.16	31.23 ± 10.33	28.56 ± 6.95	21.49 ± 4.78	28.81 ± 7.52	22.60 ± 6.51

Table 5. Cross-district ablation study on physics-aware edge features under EUI stratification (20 runs, outliers removed).

Model	Edge Attr.	District 1		District 2		District 3
Model	Edge Attr.	RMSE	MAE	RMSE	MAE	RMSE	MAE
GCN (baseline)	None	55.37 ± 15.30	43.09 ± 10.41	51.39 ± 13.87	40.60 ± 10.76	42.31 ± 7.84	33.92 ± 7.10
GraphSAGE	None	34.07 ± 10.08	26.79 ± 8.16	24.17 ± 5.13	18.32 ± 3.91	21.14 ± 8.92	16.37 ± 6.89
GAT (baseline)	None	45.31 ± 12.90	33.98 ± 10.23	27.02 ± 8.48	20.93 ± 5.30	28.37 ± 9.31	23.12 ± 7.89
GAT (distance)	Distance ( $d_{i j}$ )	41.79 ± 13.22	32.15 ± 10.77	29.31 ± 6.86	22.11 ± 5.12	29.81 ± 10.07	23.91 ± 9.41
GAT (angle)	Angle ( $θ_{i j}$ )	38.64 ± 11.88	29.56 ± 8.68	27.98 ± 7.12	21.06 ± 4.97	25.67 ± 8.84	20.65 ± 7.53
GAT (dist+angle)	Dist.+Angle	39.14 ± 12.12	29.49 ± 8.84	29.80 ± 8.24	22.56 ± 5.27	31.29 ± 10.26	25.70 ± 9.51

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shan, R.; Ning, H.; Xu, Q.; Su, X.; Guo, M.; Jia, X. Physics-Informed and Explainable Graph Neural Networks for Generalizable Urban Building Energy Modeling. Appl. Sci. 2025, 15, 8854. https://doi.org/10.3390/app15168854

AMA Style

Shan R, Ning H, Xu Q, Su X, Guo M, Jia X. Physics-Informed and Explainable Graph Neural Networks for Generalizable Urban Building Energy Modeling. Applied Sciences. 2025; 15(16):8854. https://doi.org/10.3390/app15168854

Chicago/Turabian Style

Shan, Rudai, Hao Ning, Qianhui Xu, Xuehua Su, Mengjin Guo, and Xiaohan Jia. 2025. "Physics-Informed and Explainable Graph Neural Networks for Generalizable Urban Building Energy Modeling" Applied Sciences 15, no. 16: 8854. https://doi.org/10.3390/app15168854

APA Style

Shan, R., Ning, H., Xu, Q., Su, X., Guo, M., & Jia, X. (2025). Physics-Informed and Explainable Graph Neural Networks for Generalizable Urban Building Energy Modeling. Applied Sciences, 15(16), 8854. https://doi.org/10.3390/app15168854

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Physics-Informed and Explainable Graph Neural Networks for Generalizable Urban Building Energy Modeling

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Overview of the Framework

3.2. Urban Building Graph Construction

3.2.1. Data Acquisition and Parametric Modeling

3.2.2. Graph Construction and Physics-Informed Feature Encoding

3.3. Physics-Informed Graph Neural Network Architecture

3.3.1. GNN Model Design and Message Passing

3.3.2. Energy Labeling and Model Training

3.4. Explainable AI and Model Interpretation

3.4.1. Physics-Aware Attention Analysis

3.4.2. Spatial Pattern Interpretation

3.5. Ablation and Benchmark Experiments

3.5.1. Comparison with Baseline Models

3.5.2. Ablation Study on Physics-Aware Attention

3.6. Districts Test and Comparison

4. Results and Analysis

4.1. Comparison of Baseline and Graph-Based Models

4.2. Ablation Study on Physics-Aware Edge Features

4.3. Cross-District Generalization Analysis

4.4. Explainable AI and Model Interpretation

5. Discussion

5.1. Interpretation of Explainability Results

5.2. Cross-District Generalization Insights

5.3. Case Implications for Old Residential District Retrofit

5.4. Limitations and Future Work

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI