1. Introduction
Urban drainage infrastructure constitutes one of the most safety-critical and climate-vulnerable classes of built assets in the contemporary city. Defined here as the network of pipes, channels, retention basins, pump stations, and associated hydraulic control structures designed to convey wastewater, urban drainage systems must simultaneously absorb the hydrological variability imposed by climate change and the increased impervious surface fractions associated with ongoing urbanisation. The consequences of drainage system failure range from localised basement flooding and traffic disruption to large-scale urban inundation, with associated impacts on property, public health, and economic productivity. In China, flood disasters have imposed substantial economic burdens over recent decades. It is estimated that direct economic losses from flood events in China from 1990 to 2018 exceeded CNY 4 trillion, reflecting the increasing scale of flood risk in rapidly urbanising regions [
1].
Conventional approaches to drainage system management rely on periodic inspection regimes, physics-based hydrodynamic simulation (e.g., SWMM, InfoWorks ICM), and reactive maintenance triggered by observed failures. While these approaches provide engineering rigour, they face fundamental limitations in the context of real-time operational decision support. Full hydrodynamic models require detailed calibration, are computationally prohibitive for operational time horizons, and cannot readily assimilate real-time sensor data for state estimation. Inspection-based management is inherently retrospective and fails to anticipate failure modes associated with compound events, such as simultaneous pump failure and extreme rainfall. The gap between the technical capability of current tools and the operational demands of resilience-oriented asset management is, therefore, substantial [
2,
3]. This study addresses this gap by developing an explainable digital twin framework for real-time prediction of storm-driven water level responses in urban drainage systems. Additional background on conventional hydrodynamic modelling and operational constraints is provided in
Supplementary Material S3 [
4,
5,
6,
7,
8].
The digital transformation of built asset management, reflected in the use of digital twins, Internet of Things (IoT) sensing, and data-driven machine learning models, provides a practical pathway for addressing this gap. A digital twin, in the context of built infrastructure, denotes a continuously updated, data-enriched virtual representation of a physical asset that supports simulation, monitoring, and operational decision-making across the asset lifecycle [
9]. Recent studies have demonstrated the value of digital twins and real-time control frameworks for urban drainage, stormwater, and wastewater systems, particularly through sensor integration, online state estimation, and operational prediction [
10,
11,
12,
13]. The emergence of deep learning architectures capable of learning complex temporal patterns from high-frequency sensor data has made data-driven surrogate modelling of drainage system hydraulics computationally tractable for real-time applications [
14,
15]. However, many existing drainage prediction models rely mainly on rainfall and sensor observations, while the static characteristics of drainage assets, such as pipe geometry, hydraulic capacity, upstream contributing area, and network connectivity, are not fully incorporated. This limits their ability to explain why specific locations are more vulnerable than others under the same rainfall forcing.
Model interpretability remains a key challenge. Deep learning models can improve hydraulic prediction, but their outputs are difficult to interpret in operational settings. For drainage management, a forecast is more useful when operators can also identify the factors driving the predicted surcharge risk [
16]. This is particularly important when model outputs are used to support pump operation, emergency response, inspection planning, or rehabilitation prioritisation. SHAP can attribute model outputs to input features [
17]. However, its use in drainage digital twins remains limited, especially for node-level and asset-level interpretation.
This paper addresses these challenges by presenting an explainable digital twin framework for real-time prediction of urban drainage system responses to extreme rainfall. The principal contributions of this study are as follows: (i) a multi-source data fusion architecture that integrates dynamic sensor streams with static built-asset descriptors to enable hydraulic-capacity-aware forecasting in urban drainage networks; (ii) a hybrid TCN–LSTM deep learning architecture designed to capture both local event-driven dynamics and long-range hydraulic memory effects relevant to pipe network surcharge propagation; (iii) an integrated SHAP Explainability Module that provides asset-level attribution of forecast outputs, supporting drainage asset prioritisation and rehabilitation planning; and (iv) a case study deployment in the Yangtze River Delta demonstrating the operational viability and performance improvement of the proposed framework in real-world drainage management contexts, including SCADA-informed decision support.
The remainder of this paper is organised as follows.
Section 2 reviews the state of research in data-driven drainage modelling, digital twin frameworks for built assets, and explainable artificial intelligence, identifying the specific knowledge gaps this work addresses.
Section 3 presents the proposed framework architecture and its underlying technical design rationale.
Section 4 describes the research methodology, including the case study site, data sources, and experimental evaluation protocol.
Section 5 discusses the results, their implications, and synthesis with the existing literature.
Section 6 concludes with practical recommendations and directions for future research.
2. Background
2.1. Data-Driven Modelling of Urban Drainage Systems
The application of machine learning to urban drainage hydraulics has progressed substantially over the past decade. Early work employed artificial neural networks (ANNs) for rainfall–runoff modelling and flood inundation mapping, establishing that data-driven models could approximate physics-based simulation outputs at a fraction of the computational cost [
18]. Recurrent neural network architectures, particularly LSTMs, subsequently demonstrated superior performance in capturing the sequential dependencies inherent in hydrological time series, with demonstrated advantages for multi-step-ahead forecasting of catchment discharge [
19,
20].
More recent work has explored convolutional architectures for hydraulic forecasting. TCNs, which employ dilated causal convolutions to capture multi-scale temporal patterns, have shown competitive or superior performance compared to LSTM models on several hydrological benchmarking datasets [
21,
22]. Hybrid TCN-LSTM models that combine the local feature extraction capability of TCNs with the sequential memory of LSTMs have been proposed for traffic flow and energy forecasting tasks, but their systematic application to drainage hydraulics remains largely unexplored [
23]. The majority of published drainage forecasting models rely exclusively on meteorological and sensor inputs, without incorporating the static asset characteristics that fundamentally condition hydraulic behaviour [
24].
2.2. Digital Twin Frameworks for Built Asset Management
The concept of the digital twin, originating in aerospace manufacturing, has been progressively adapted to built environment applications including buildings, bridges, and urban infrastructure [
25,
26]. In the drainage context, digital twin implementations have typically combined physically based simulation engines (e.g., SWMM or MIKE FLOOD) with real-time data assimilation pipelines to update model states from sensor observations [
27]. This hybrid approach improves accuracy but still suffers from a high computational cost. In real-time drainage applications, the computational burden of the simulation engine can restrict model updating and scenario testing, particularly for full-network models with fine spatial and temporal resolution [
10,
28,
29].
Fully data-driven digital twin components for drainage systems have been proposed as computationally efficient surrogates, particularly for operational flood forecasting applications [
18]. However, existing implementations rarely integrate built-asset descriptors from GIS or asset management systems into the predictive model architecture. This omission is significant: pipe condition, connectivity, hydraulic capacity, and catchment morphology are primary determinants of where and when surcharging occurs under any given rainfall forcing. Failing to condition forecasts on these attributes limits the spatial transferability of trained models and reduces their physical interpretability [
16,
30].
2.3. Explainable AI in Infrastructure Management
The deployment of black-box machine learning models in critical infrastructure decision contexts raises legitimate concerns regarding accountability, regulatory compliance, and operator trust [
31,
32]. Explainable artificial intelligence (XAI) methods provide post hoc or inherently interpretable tools for attributing model outputs to input features. SHAP, grounded in cooperative game theory, offers theoretically consistent attribution of individual predictions to contributing features and has been applied to flood risk assessment, structural health monitoring, and energy system fault detection [
17,
33]. However, SHAP-based explainability has not been systematically integrated with multi-source drainage forecasting frameworks to attribute forecast uncertainty to physical asset properties, which would increase operational utility for infrastructure managers [
34]. Integrating SHAP into this framework improves the interpretability of model predictions.
2.4. Knowledge Gaps and Research Problem Statement
The review above highlights several interrelated gaps in the current literature. Although data-driven approaches for urban drainage forecasting have progressed substantially, the performance and stability of hybrid convolution–recurrent architectures have not been systematically evaluated for drainage hydraulics. In particular, the incremental predictive value of integrating static built-asset descriptors with dynamic sensor observations has rarely been quantified within a controlled ablation framework. Moreover, existing drainage digital twin implementations typically adopt one of two paradigms: computationally intensive physics-based simulation engines or sensor-driven surrogate models. The former remains constrained by runtime limitations that limit operational deployment, whereas the latter often lack specific conditioning on infrastructure characteristics that govern hydraulic response under rainfall forcing.
While XAI methods have gained traction in environmental modelling, their application in urban drainage has largely focused on catchment-scale interpretation. Feature attribution at the node level, explicitly linking forecast outputs to physical infrastructure attributes, remains underdeveloped. This limits the operational interpretability and decision-support value of data-driven models in safety-critical drainage management. Against this background, the present study develops a data-driven digital twin framework designed to integrate multi-source observations with infrastructure descriptors, generate multi-horizon hydraulic forecasts, and provide physically interpretable attribution of prediction outcomes. The framework is designed to support real-time decision-making in urban drainage systems subject to rainfall-driven surcharge risk. A structured comparison is provided in
Supplementary Material S4 [
17,
18,
19,
22,
24,
25,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48] and
Table S1.
The research problem addressed in this paper is therefore how to construct a data-driven digital twin framework that uses the complementary information from dynamic sensor streams and static asset descriptors, generates accurate multi-horizon hydraulic predictions in real time, and explains prediction outputs in relation to physical drainage drivers to support operational intervention.
The specific objectives are to (i) develop a multi-source TCN–LSTM model for drainage hydraulic prediction; (ii) quantify the added value of built-asset descriptors compared with sensor-only baselines; (iii) use SHAP attribution to identify the main physical drivers of surcharge risk at node and subcatchment scales; and (iv) validate the framework in an urban drainage system in the Yangtze River Delta.
The research problem addressed in this paper is, therefore, as follows: how can a data-driven digital twin framework be constructed that (a) uses the complementary information content of dynamic sensor streams and static asset descriptors, (b) generates accurate multi-horizon hydraulic forecasts in near-real-time, and (c) provides interpretable attribution of forecast outputs to physical drivers, sufficient to support proactive, evidence-based operational intervention in urban drainage systems?
The specific objectives are to (i) develop and implement a multi-source hybrid TCN–LSTM architecture for drainage hydraulic prediction; (ii) quantify the incremental predictive contribution of built-asset descriptors relative to sensor-only baselines through controlled ablation experiments; (iii) integrate SHAP-based attribution to identify dominant physical drivers of surcharge risk at the subcatchment and node scales; and (iv) validate the complete framework in a representative urban drainage system in the Yangtze River Delta.
3. Proposed Framework
3.1. Scope and Design Rationale
The framework focuses on real-time prediction of hydraulic state variables in UDS under rainfall conditions, with prediction horizons of 15, 30, and 60 min. The target prediction variables are (1) surcharge depth at designated monitoring nodes and (2) a binary flooding risk indicator (surcharge depth exceeding manhole rim elevation). The system is designed to operate on commodity server hardware with an inference latency below 500 milliseconds per forecast cycle, compatible with standard 5 min SCADA update cycles. Surface inundation mapping is explicitly outside this study’s scope and would require additional topographic modelling.
3.2. Framework Architecture Overview
The framework comprises four integrated modules (
Figure 1): (M1) multi-source data ingestion and preprocessing; (M2) feature engineering producing dynamic and static input tensors; (M3) the TCN–LSTM predictive model with multi-horizon output heads; and (M4) SHAP-based explainability and attribution. During inference, modules M1–M4 are implemented sequentially within a single forecast cycle triggered by incoming sensor data, forming a consistent data–model–interpretation pipeline and producing hydraulic predictions and their SHAP attributions simultaneously.
3.3. Multi-Source Data Fusion Architecture (M1–M2)
The data ingestion module (M1) interfaces with two primary data streams. The dynamic stream comprises high-frequency time-series data sampled at 5 min intervals: water level and flow rate at monitoring nodes, pump operational status, SCADA control signals, and rainfall depth from a dense rain gauge network supplemented by radar quantitative precipitation estimates. The static stream comprises built-asset descriptors extracted from GIS and as-built databases: pipe diameter, length, material age, full-bore capacity, invert levels, manhole rim elevation, contributing catchment area, impervious surface fraction, terrain slope, road-surface gradient, and a set of network connectivity indicators (upstream pipe count, flow path length to outlet).
The feature engineering layer (M2) constructs the input tensor passed to the predictive model. Dynamic features are organised into a three-dimensional tensor representing nodes, time steps, and feature variables. The tensor has the shape (N, T, D
d), where N is the number of monitoring nodes, T is the lookback window, and D
d is the number of dynamic features per node. In this study, T was fixed at 12 time steps, corresponding to a 60 min lookback window at the 5 min sampling resolution. Static features are concatenated to the final hidden state of the temporal encoder, rather than being repeated along the time axis, to avoid temporal distortion of time-invariant attributes. This concatenation strategy treats static asset descriptors as context vectors that modulate the temporal prediction, consistent with conditional modelling approaches used in analogous infrastructure domains [
49].
3.4. TCN-LSTM Predictive Architecture (M3)
The predictive model employs a two-stage temporal architecture. In the first stage, a TCN encoder processes the lookback window tensor to extract multi-scale temporal features. As illustrated in
Figure 2c, the TCN uses dilated causal convolutions with dilation factors of 1, 2, 4, and 8, a kernel size of 3, and 64 filters per layer, yielding a receptive field of 31 time steps at the deepest layer. Residual connections are applied across each dilation block to mitigate vanishing gradient issues during training. The TCN output is a feature tensor preserving the temporal dimension, allowing the subsequent LSTM to operate on learned temporal abstractions rather than raw sensor values.
In the second stage, a two-layer LSTM with 128 hidden units processes the TCN output sequence to model long-range hydraulic dependencies, such as the propagation delay of surcharge waves through the pipe network. The LSTM final hidden state is concatenated with the static asset descriptor vector to form a combined representation of dimension 128 + D_s, where D_s is the number of static features (nominally 12 after principal component analysis preprocessing). Three parallel fully connected output heads produce forecasts for the 15, 30, and 60 min horizons, respectively, enabling the model to capture the progressive uncertainty increase with forecast lead time. Together, these components form an integrated prediction framework (
Figure 2), capturing both short-term dynamics and longer-term hydraulic dependencies in UDSs.
Overall, the hybrid TCN–LSTM architecture integrates multi-scale temporal feature extraction with long-range hydraulic dependency modelling within a unified framework. Dilated causal convolutions capture short-term rainfall-driven dynamics, while the LSTM layers encode delayed routing and storage effects across the network. Conditioning the temporal representation on static asset descriptors ensures that forecasts remain infrastructure-aware rather than sensor-driven alone. The multi-horizon design further supports operational early warning at 15–60 min lead times. This architecture therefore balances predictive accuracy, computational efficiency, and physical interpretability, consistent with the resilience-oriented objectives of the proposed digital twin framework.
3.5. SHAP Explainability Module (M4)
The explainability module applies TreeSHAP to a gradient boosting surrogate model to approximate the prediction behaviour of the trained TCN-LSTM model. The surrogate is used only to generate computationally efficient SHAP attributions and is not intended to replace the main prediction model. The surrogate was trained using the TCN-LSTM outputs as target values and evaluated on the validation subset, achieving strong agreement with the original model. SHAP values were then computed for the independent test subset. The resulting attributions were analysed at both the node and catchment scales to identify dominant input features associated with the predicted surcharge risk. Because the stability of SHAP values across different storm events, node groups, and random initialisation seeds was not systematically tested in this study, the attribution results are interpreted as aggregated indicative patterns rather than causal explanations.
3.6. Research Hypotheses
The framework design is guided by three testable hypotheses.
H1. Multi-source fusion of dynamic sensor streams with static asset descriptors will improve forecast accuracy relative to sensor-only models, as measured by root-mean-square error (RMSE) of surcharge depth prediction.
H2. The hybrid TCN-LSTM architecture will outperform single-component LSTM-only and TCN-only baselines by leveraging complementary temporal feature extraction capabilities.
H3. SHAP attributions will consistently identify a small subset of physical asset descriptors (specifically pipe hydraulic capacity ratio and contributing catchment area) as dominant drivers of surcharge risk, consistent with established hydraulic principles.
4. Research Methodology
4.1. Assumptions and Scope Constraints
The following modelling assumptions are adopted. Sensor data quality is assumed sufficient following automated outlier detection and gap-filling; no explicit provision is made for catastrophic sensor failure affecting more than 20% of active monitoring points simultaneously. The static asset descriptor database is assumed to reflect the as-built state of the network; ongoing asset deterioration or rehabilitation during the study period is not modelled. The forecast framework is scoped to pipe network hydraulics in a separate sewer system, where localised misconnections may occur, and does not model surface flow pathways or groundwater interactions explicitly. These effects are implicitly captured through the data-driven modelling approach. These assumptions are considered reasonable for the operational timeframes addressed (15–60 min).
4.2. Study Area
The case study is located in an urban district of the Yangtze River Delta, Anhui Province, China, comprising a drainage catchment area of approximately 48 km
2 served by a predominantly separate sewer network. The area is characterised by high impervious surface fractions (mean 0.72), flat terrain (average slope 0.3%), and frequent short-duration convective rainfall events during the May–September monsoon period. The network comprises approximately 2400 pipe segments, 1850 manholes, and 8 pump stations, with pipe diameters ranging from 300 mm to 1000 mm. The key characteristics of the case study drainage system are summarised in
Table 1.
4.3. Data Sources
Sensor data were provided by the district urban management bureau and cover a 12-month observation period (November 2021 to November 2022). The monitoring network includes 86 water level sensors, 34 flow metres, and 68 rain gauges distributed across the catchment at an average density of one gauge per 0.7 km2. Data are transmitted via 4G telemetry at 5 min intervals to a centralised SCADA platform. Pump operational logs and SCADA control signals are available for all 8 pump stations at the same temporal resolution. The complete dataset comprises approximately 9.1 million individual sensor records prior to quality control processing.
Asset descriptor data were extracted from the district geographic information system and as-built record drawings, supplemented by field survey records from the most recent CCTV inspection programme (completed 2022). Catchment morphological attributes (contributing area, imperviousness, slope) were derived from a 1-m-resolution LiDAR digital terrain model acquired in 2021. Network connectivity indicators were computed programmatically from the pipe network topology using graph-theoretic algorithms. All asset descriptors were validated against hydraulic model parameters used in the district’s existing hydrodynamic model developed in Autodesk InfoWorks ICM 2024.
4.4. Data Preprocessing and Quality Control
Sensor data preprocessing followed a four-stage protocol. Stage 1 applied range-based outlier flagging using physically plausible bounds derived from the hydraulic model (e.g., water level constrained between invert and rim elevation plus 0.5 m allowance for pressurised conditions). Stage 2 identified and imputed short gaps of up to three consecutive missing observations, equivalent to 15 min at the 5 min sampling resolution, using linear interpolation. This threshold was selected because short telemetry interruptions of this duration are common in SCADA records and are unlikely to remove a complete rainfall–response cycle in the study network. Longer gaps were not interpolated because they may obscure rapid wet-weather hydraulic changes and introduce artificial smoothing during surcharge events. These longer gaps were therefore excluded from model training and evaluation. Stage 3 applied cross-sensor consistency checking for level and flow measurements at the same location. Stage 4 aligned all time series to a common 5 min UTC timestamp grid. After quality control, 94.7% of sensor-days were retained for analysis (
Supplementary Material S1) [
24]. The overall framework architecture and data processing procedures are presented in
Figure 1 and
Figure 2.
4.5. Experimental Design and Model Configurations
The dataset comprises a 12-month observation period (November 2021 to October 2022) and was partitioned into training, validation, and test subsets using an 8:1:1 ratio in chronological order. The first 80% of the time series was used for model training, the subsequent 10% for hyperparameter validation, and the final 10% for test evaluation. This temporal partitioning preserves the temporal structure of monitored hydraulic responses and avoids information leakage. Although the network is predominantly separate, potential RDII, local misconnections, and cross-connections may still affect wet-weather system behaviour. The test subset includes rainfall events with peak 1 h intensity exceeding 30 mm/h, representing conditions of greatest operational relevance.
All models were trained using the Adam optimiser with an initial learning rate of 1 × 10−3 and a batch size of 64. Hyperparameters were selected based on validation-set performance rather than test-set results. The tested ranges included learning rates of 1 × 10−4 to 1 × 10−3, batch sizes of 32 and 64, LSTM hidden units of 64 and 128, TCN kernel sizes of 2 and 3, and dropout rates of 0.1 to 0.3. The final configuration was selected as the model with the lowest validation MAE while maintaining stable validation loss. Early stopping was applied with a patience of 15 epochs to reduce overfitting. Model performance was then reported only on the independent chronological test subset.
The models were implemented in Python 3.10 using PyTorch 2.0.1 for deep learning model construction and training, scikit-learn 1.3.0 for baseline models and preprocessing, pandas 2.0.3 and NumPy 1.24.4 for data handling, and SHAP 0.42.1 Python package for post hoc feature attribution. All input variables were normalised using parameters estimated from the training set and then applied unchanged to the validation and test sets. The same chronological train–validation–test split was used for all model configurations to ensure a consistent comparison. The six model configurations used for the ablation study are summarised in
Table 2. Training was repeated using fixed random seeds to improve reproducibility, and the reported results correspond to the final selected configuration evaluated on the independent test subset.
4.6. SHAP Explainability Setup
SHAP values were computed for the full framework (C6) on all test events, using the gradient boosting surrogate described in
Section 3.5. The surrogate was retrained on C6 predictions for the training period prior to each test evaluation, ensuring temporal consistency. Feature importance rankings were computed as the mean absolute SHAP value across all test predictions for each input feature, aggregated separately for dynamic sensor features and static asset features to enable direct comparison of their relative attribution magnitudes.
5. Results and Discussion
5.1. Forecast Performance: Architecture Effect (H2)
Table 3 summarises the predictive performance of conventional machine learning (ML) baselines and deep learning (DL) architectures evaluated on the test dataset. Water level forecasting was selected as the primary evaluation target due to its strong hydraulic interpretability and lower measurement noise relative to flow observations. Among conventional ML models, the Decision Tree achieved the lowest MSE (80.03), outperforming Linear Regression (92.43) and Random Forest (88.08). However, all ML baselines showed limited generalisation under transient wet-weather fluctuations, as reflected by the relatively higher RMSE and lower R
2 values.
The DL architectures demonstrated consistently improved performance. The LSTM model achieved an RMSE of 0.081 m and R2 of 0.8275, indicating improved representation of temporal dependencies compared to ML baselines. The TCN model further improved performance, reducing the RMSE to 0.067 m and increasing R2 to 0.9143, highlighting the advantage of multi-scale temporal feature extraction through dilated convolutions. The proposed hybrid TCN-LSTM architecture achieved the best overall performance, with an RMSE of 0.061, MAE of 0.044 m, MSE of 37.21, and R2 of 0.9318. The hybrid TCN–LSTM model reduced the RMSE from 0.081 m for the LSTM baseline to 0.061 m, corresponding to a 24.7% reduction. The performance gain over the standalone TCN model was more moderate but consistent, particularly during peak and recession periods, suggesting enhanced robustness in capturing both rapid fluctuations and longer-term hydraulic dynamics.
Figure 3 presents a representative segment of the test results at 5 min resolution for the LSTM, TCN, and hybrid TCN-LSTM architectures. Clear differences are observed during rainfall-driven peak events and subsequent recession periods. The LSTM model captures the overall trend in water level variation, but it shows a clear phase lag and reduced peak magnitude during rapid surcharge transitions. In comparison, the TCN model responds more quickly to short-term fluctuations, suggesting that dilated convolution more effectively captures multi-scale temporal features. The proposed hybrid TCN-LSTM architecture further enhances prediction stability during peak water level events while maintaining smooth recession tracking, indicating complementary strengths of convolutional feature extraction and recurrent memory mechanisms.
Performance differences were most evident during rapid water-level changes and delayed recession. Although
Table 3 reports the overall test-set performance rather than horizon-specific metrics,
Figure 3 shows that the hybrid TCN–LSTM model produced more stable predictions during peak and post-peak periods. These findings support H2 and indicate improved representation of multi-scale temporal dynamics in urban water level prediction.
5.2. Forecast Performance: Asset Feature Fusion Effect (H1)
The inclusion of built-asset descriptors produced systematic performance improvements across all forecast horizons. For the 15 min horizon, the RMSE decreased from 0.063 m (sensor-only TCN-LSTM) to 0.052 m (asset-informed C6), representing an 18% reduction. At the 60 min horizon, the reduction increased to 22%. These improvements are interpreted as enhanced predictive robustness, particularly under near-capacity conditions where measurement uncertainty may increase due to hydraulic sensitivity and potential sensor noise (
Supplementary Material S2). The improvement was spatially heterogeneous. Nodes with a pipe full-bore capacity ratio exceeding 0.85 exhibited the largest performance gains, with local RMSE decreases reaching 26%. These nodes are typically associated with near-capacity operation and increased likelihood of manhole surcharge conditions, where small variations in hydraulic state can lead to rapid changes in water level. In such conditions, asset descriptors (e.g., capacity ratio and upstream contributing area) provide additional structural context that helps stabilise predictions and constrain variability in model outputs. In contrast, low-stress nodes showed marginal gains, indicating that asset descriptors are most beneficial under near-capacity conditions, where system behaviour is more sensitive to structural constraints.
Figure 3.
Comparison of measured and predicted water levels using LSTM, TCN, and TCN–LSTM architectures over a representative three-day test period. The y-axis represents water level in millimetres, and the x-axis represents the time-step index, with each step corresponding to 5 min. Vertical dashed lines indicate daily boundaries at 288-step intervals.
Figure 3.
Comparison of measured and predicted water levels using LSTM, TCN, and TCN–LSTM architectures over a representative three-day test period. The y-axis represents water level in millimetres, and the x-axis represents the time-step index, with each step corresponding to 5 min. Vertical dashed lines indicate daily boundaries at 288-step intervals.
5.3. SHAP Attribution Analysis (H3)
The aggregated SHAP results identified three static asset features as major contributors to surcharge prediction, namely, pipe full-bore capacity ratio, upstream contributing catchment area, and impervious surface fraction (
Figure 4). These patterns are physically consistent with drainage system behaviour. Pipes operating close to full-bore capacity have limited hydraulic buffer, while larger impervious contributing areas can generate faster and higher peak inflows during intense rainfall. These findings support the physical plausibility of the model interpretation, although further stability testing across additional storms, node groups, and random seeds is required.
Among the dynamic features, antecedent water level 30 min before event onset and peak 15 min rainfall intensity were the two highest-ranked inputs at the 15 min horizon (
Figure 4). At the 60 min horizon, the attribution of rainfall intensity decreased, while the attribution of network connectivity indicators, such as upstream pipe segment count, increased. This shift suggests that short-horizon predictions are mainly influenced by local wet-weather conditions, whereas longer-horizon predictions are more affected by routing and redistribution through the drainage network. These horizon-dependent patterns indicate that additional upstream flow metering could help reduce uncertainty in 60 min predictions.
Overall, the SHAP results provide a physically reasonable interpretation of the prediction outputs. Static asset features explain where the surcharge risk is structurally higher, while dynamic rainfall and water-level features explain short-term variations during storm events. The results should therefore be interpreted as aggregated attribution patterns within the present case study, rather than as causal explanations or universally stable feature rankings.
5.4. Computational Performance
The mean inference time across the 86-node monitoring network was 210 ms per forecast cycle on a standard server (Intel i7 CPU, 32 GB RAM). The latency was measured over 100 repeated runs and is reported as the average runtime, including data preprocessing and model inference, with low variability across runs. The reported inference time is compatible with the 5 min SCADA update cycle and satisfies real-time operational requirements (<500 ms). This demonstrates a substantial computational improvement compared to conventional physics-based simulation approaches, enabling near-real-time prediction and rapid evaluation of multiple rainfall scenarios on standard hardware. Inference time is expected to scale approximately linearly with network size under similar settings.
5.5. Synthesis with Prior Literature
The 18–22% RMSE reduction from asset feature fusion corroborates the argument advanced in prior work [
19,
24] that static infrastructure attributes carry independent predictive information not recoverable from sensor observations alone. The TCN–LSTM architecture advantage aligns with findings in traffic and energy forecasting [
18], while the drainage-specific validation reported here addresses a gap highlighted in systematic reviews of deep learning for hydrological forecasting [
15]. The integrated SHAP attribution methodology extends existing XAI applications in hydraulic engineering by delivering node-level, multi-horizon attribution that is directly actionable for asset managers at a level of spatial granularity not previously reported in the drainage XAI literature [
27,
28].
5.6. Limitations and Future Research Directions
Although the proposed explainable DT demonstrates clear performance gains and operational interpretability within this case study, several limitations should be recognised when interpreting the results and considering broader deployment. In particular, under near-capacity conditions, increased hydraulic sensitivity may introduce higher measurement uncertainty. Performance improvements should therefore be interpreted as enhanced predictive robustness rather than solely reduced error. A primary constraint concerns generalisability. The framework is calibrated and validated on a single urban drainage network characterised by high imperviousness, relatively flat terrain, and dense monitoring infrastructure. Under such conditions, the model can effectively learn rainfall–runoff response patterns and routing dynamics. However, drainage systems in steeper catchments, tidal environments, or systems with different structural conditions may show different hydraulic behaviours. Because the TCN-LSTM architecture implicitly encodes system-specific temporal memory and connectivity effects, direct transfer to morphologically distinct networks may not yield comparable accuracy. In this sense, the current study demonstrates feasibility rather than universal scalability.
In addition, static built-asset descriptors are treated as time-invariant contextual variables. In practice, sediment deposition, pipe roughness evolution, structural deterioration, and rehabilitation interventions gradually alter hydraulic capacity. The present framework assumes that GIS and inspection-derived attributes remain representative over the modelling period. Although sensor data undergo systematic quality control, persistent measurement bias or correlated telemetry failure are not explicitly modelled. The validation of explainability results also remains limited. The SHAP analysis was based on a gradient boosting surrogate trained to approximate the TCN-LSTM predictions, and the stability of SHAP attributions across storm types, node groups, and random seeds was not systematically tested. Therefore, the SHAP results should be interpreted as aggregated patterns within this case study, rather than as causal explanations or universally stable feature rankings. These assumptions are reasonable for short-term operational prediction but constrain longer-term asset degradation analysis and broader interpretation of the explanation results.
Future research should therefore focus on improving transferability and uncertainty representation. Transfer learning approaches could be explored to adapt a trained model to new drainage systems using limited local data. Asset-related variables, such as hydraulic capacity ratio and connectivity metrics, may support cross-site generalisation. At the same time, extending the framework to probabilistic forecasting would provide prediction intervals rather than single point estimates, allowing risk-based warning thresholds. Further integration with pump control or storage optimisation algorithms would also enhance practical value. Linking short-term forecasts with operational decision modules could support a shift from prediction toward proactive flood risk reduction, supporting more resilient urban drainage management under appropriate system conditions.
6. Conclusions
This paper has presented an explainable DT framework for near-real-time prediction of urban drainage hydraulic responses to extreme rainfall, validated through a case study in the Yangtze River Delta. The proposed framework demonstrates improved predictive accuracy and interpretability, offering practical value for real-time drainage management and infrastructure planning. Rather than relying on sensor observations alone, the framework integrates dynamic monitoring and SCADA data with static built-asset descriptors, allowing predictions to reflect both short-term hydraulic variation and infrastructure conditions. The hybrid TCN–LSTM architecture captured rainfall-driven water-level changes and delayed hydraulic responses, while the SHAP module provided interpretable attribution of prediction outputs to key physical and operational factors. These findings indicate that explainable prediction models can support not only real-time warning but also inspection planning and rehabilitation prioritisation.
For drainage authorities, the framework can support earlier recognition of surcharge-prone locations within standard SCADA update cycles. The attribution results help link predicted surcharge risk to pipe capacity, upstream contributing area, imperviousness, and recent rainfall conditions. This provides a more transparent basis for operational response and asset planning than prediction accuracy alone.
At the societal level, the framework has practical relevance for cities exposed to short-duration extreme rainfall and increasing drainage pressure. In dense and rapidly urbanising regions such as the Yangtze River Delta, improved warning and response capacity can help reduce disruption to transport, property, and public services. The framework demonstrates the feasibility of explainable drainage prediction in the present case study, but its transferability to other drainage systems requires recalibration and independent validation using local sensor, GIS, and as-built data.
Future research should extend the framework by linking pipe network prediction with surface inundation modelling, testing transfer learning across drainage systems, developing uncertainty-aware interpretation methods, and integrating prediction outputs with real-time pump control or storage optimisation. These developments would further support the use of explainable digital twins as practical tools for resilient urban drainage management.