1. Introduction
Urban flooding, often reported as waterlogging, is increasingly a routine hazard rather than an exceptional event. Three long-running processes are colliding since intensifying short-duration rainfall, rapid urban expansion that hardens land surfaces, and drainage systems that are difficult to upgrade at the pace of development. From a climate perspective, assessments by IPCC [
1] indicate that heavy precipitation is expected to intensify with warming, and the signal is particularly relevant at shorter durations that place immediate stress on stormwater systems. In parallel, urbanization increases imperviousness and accelerates runoff generation, while the spatial concentration of people and assets amplifies consequences even when inundation depths are modest. This combination makes pluvial flooding a central concern for cities pursuing both safety and functional continuity [
2].
The problem is especially sharp in mountainous megacities. Complex terrain can accelerate runoff, funnel flow into valley bottoms and transport corridors, and create localized convergence zones where conveyance becomes fragile [
3]. In cities like Chongqing, steep slopes, dissected landforms, and dense built-up districts form a coupled natural–built drainage system in which small bottlenecks can repeatedly produce waterlogging [
4,
5]. Empirically, recent work in Chongqing has highlighted how urbanization processes and land-surface change can alter flood exposure and flood-prone patterns, reinforcing the need for approaches that are both spatially explicit and scalable to the metropolitan level [
4]. Yet, city-scale decision-making often must proceed without full access to detailed pipe-network data, maintenance logs, pump operations, or event-resolved hydraulic boundary conditions.
Methodologically, three traditions dominate contemporary urban flood assessment, and each carries a distinctive failure mode when transferred to terrain-constrained megacities. Process-based 1D/2D hydrodynamic modeling represents flow routing and inundation physics with high fidelity when detailed pipe-network data and calibration inputs are available [
6,
7]. Its principal constraint at the municipal scale is informational, like up-to-date pipe geometries, inlet densities, and pump-station records, which are rarely public, and model recalibration typically lags infrastructure retrofitting. Data-driven machine learning using reported waterlogging records as labels [
8,
9] is scalable and flexible but susceptible to label sparsity, reporting-density bias, severe class imbalance, spatial autocorrelation that inflates naïvely cross-validated performance [
10], and opacity unless paired with post hoc interpreters. Multi-criteria composite indexing (MCDA) is data-light and directly communicable to planners, but its characteristic failure mode is drift into visually compelling maps that are never confronted with independent ground truth and whose implicit weighting is driven by statistical dispersion rather than process importance [
11]. None of the three traditions alone resolves the joint constraints that mountainous megacities impose.
Recent work has therefore moved toward hybrid frameworks that couple elements of two or three traditions. Physics-guided machine learning uses hydrodynamic simulation outputs as training labels for surrogate classifiers [
7]; hydrodynamic–AHP–ML coupling integrates process-based flood surfaces with expert-weighted exposure layers and tree-based models [
12]; Gaussian-process surrogate models learn over low-fidelity hydrodynamic solvers to enable rapid climate-scenario exploration [
13] and deep-learning super-resolution recovers sub-block-scale inundation from coarse-grid simulation [
14]. While conceptually compelling, these strategies still rest on calibrated hydrodynamic priors and detailed pipe-network information that are not publicly available city-wide for Chongqing. The remaining practical gap is therefore a municipality-scale, independently validated, explainable framework that (i) does not require pipe-network data; (ii) quantifies rather than assumes the relationship between capacity proxies and operational waterlogging performance; and (iii) converts ranking outputs into targetable engineering priorities.
Between these poles, composite index approaches remain widely used because they offer a pragmatic mechanism for screening and zoning in data-limited contexts. The core logic is straightforward, as flood risk emerges from the joint configuration of hazard predisposition, exposure, and vulnerability or sensitivity. Such frameworks align with mainstream disaster-risk thinking and have been operationalized in many urban studies by integrating precipitation indicators, terrain controls, land-cover conditions, and exposure proxies. The question becomes not whether to integrate indicators, but how to weigh them transparently and aggregate them consistently. Multi-criteria decision analysis (MCDA) provides a common toolkit for this purpose, including objective weighting schemes such as CRITIC (which uses contrast and inter-criterion conflict) and aggregation methods such as TOPSIS (which ranks alternatives by distance as an ideal solution) [
11]. These methods are attractive because they can be implemented with heterogeneous datasets, are computationally light, and generate interpretable spatial products.
However, mapping risk is only half of the governance problem. Urban systems also differ in their capacity to cope with, respond to, and recover from flood disruptions [
15,
16,
17]. This is where the literature on resilience becomes essential. Classic resilience framing emphasizes the ability to withstand disturbance, maintain function, and recover, often decomposed into robustness, redundancy, resourcefulness, and rapidity [
16,
18,
19]. In urban studies, resilience has been defined more broadly as the capacity of an urban system and its actors to maintain or rapidly restore desired functions in the face of shocks and stresses [
16,
19,
20,
21], a definition that highlights multi-dimensionality and the possibility that capacity can vary independently of hazard [
22,
23]. In other words, risk and resilience are not mirror images. It is a place where it can be high-risk and resource-rich. Conversely, a low-risk place can be capacity-poor [
19]. This conceptual separation matters because index-based practice often drifts into treating capacity proxies as if they were direct evidence of flood avoidance.
Recent policy agendas reinforce the need to keep this distinction crisp [
24,
25]. In China, urban stormwater management has been strongly shaped by the Sponge City concept, emphasizing distributed retention, infiltration, storage, and nature-based measures as complements to conventional conveyance [
26]. While the strategy is well established at the policy level, implementation priorities remain uneven and heavily constrained by budgets and institutional capacity [
27,
28]. Therefore, a practical planning question arises: When decision-makers cannot retrofit everywhere at once, how can they identify the most consequential hotspots, and how can they distinguish (i) locations that are hazard-prone because of terrain–hydrology predisposition from (ii) locations that become chronic waterlogging points because of built-network bottlenecks, operational failures, or micro-topographic traps?
Answering that question requires two further ingredients, which are validation and interpretability. First, hotspot maps, whether produced by MCDA or machine learning, are only useful if they align with observed waterlogging patterns [
29]. Without independent validation using flood or waterlogging point data, risk surfaces can become persuasive graphics rather than actionable evidence [
4]. Second, governance and engineering teams need interpretable drivers rather than black-box scores [
30]. Explainable machine learning is increasingly used to bridge this gap. Gradient-boosted decision tree models, such as LightGBM, are well-suited to susceptibility modeling because they handle non-linearity and interaction effects with limited parametric assumptions [
8,
31]. SHAP (Shapley Additive Explanations) enables consistent, feature-level attribution of predictions, helping translate model behavior into plausible mechanisms and thresholds that are communicable to planners and engineers [
32]. This matters for waterlogging because the process is rarely linear due to the condition that rainfall extremes may only trigger high impacts where runoff potential is high; slope may create both fast-routing hazards and downstream convergence; and centrality variables may act as proxies for exposure intensity and reporting bias simultaneously [
23,
33].
Despite rapid progress, there remains a specific gap for mountainous megacities like Chongqing [
4,
5,
21]. Many city-scale studies either focus on hazard or exposure components in isolation, and present composite indices without rigorous validation against independent waterlogging observations or use machine learning without sufficiently unpacking why capacity-related variables sometimes correlate positively with waterlogging occurrence. This last pattern is not merely a technical curiosity, as it reflects a deeper measurement pitfall in which capacity proxies capture urban centrality and service concentration rather than drainage-system robustness. Empirically separating “structural advantage in services” from “operational vulnerability in drainage performance” is critical if resilience metrics are to support flood governance rather than inadvertently mislead it [
16,
34].
Against this backdrop, the present study advances three specific contributions. Methodologically, we decouple capacity (service accessibility) from risk by formulating resilience as a stabilized adaptation-to-risk ratio, and empirically test against two independent waterlogging point sets whether this transformation better describes waterlogging geography in a mountainous megacity than the conventional inverse relationship. To our knowledge, this is the first deliberate test of whether accessibility-based capacity proxies behave as risk-reducing covariates or as urban-centrality markers. In terms of application, we deliver the first city-wide (22,500 × 500 m grids), externally validated risk and resilience surface for Chongqing using exclusively open data, reproducible from publicly available sources. Regarding validation, we combine two independent point sets (117 historical points for 2015–2021 and 70 points for 2022, combined to 167 unique positive grid cells) with ROC/PR metrics, top-k capture curves, 500-iteration Monte Carlo weight-perturbation sensitivity analysis, and residual spatial-autocorrelation diagnostics, thereby converting academic discrimination scores into operational inspection budgets.
2. Study Area and Data
2.1. Study Area
Chongqing is a mountainous municipality located in Southwest China (approximately 106–110° E, 28–32° N), characterized by pronounced terrain gradients and complex river–valley systems. The region spans from low-elevation river corridors to high-elevation mountain areas, producing strong spatial contrasts in runoff generation, drainage convergence, and flood susceptibility. Major waterways, including the Yangtze River and the Jialing River, structure the municipal drainage network and concentrate population and assets along flood-prone corridors [
4,
35,
36].
Figure 1 illustrates the municipal boundary, topographic background, and the delineation of the central urban area, which serves as the focal area for waterlogging validation and driver identification.
The central city area covers the municipality’s core built-up districts, including Yuzhong, Jiangbei, Yubei, Shapingba, Jiulongpo, Dadukou, Nan’an, Banan, and Beibei, where high development intensity intersects with constrained terrain and dense drainage infrastructure. In such settings, pluvial flooding and waterlogging are not solely controlled by rainfall intensity; they also depend on slope breaks, flow accumulation pathways, impervious surfaces, and the spatial mismatch between drainage capacity and rapidly changing exposure.
Climatically, Chongqing is governed by a humid subtropical monsoon regime, with rainfall concentrated in the warm season. The precipitation heatmap (
Figure 2) shows clear seasonality, with sustained high rainfall typically occurring from late spring to early autumn. This seasonal concentration increases the likelihood of short-duration extremes and successive rainfall events that can overwhelm drainage systems and trigger waterlogging, particularly in low-lying built-up pockets and convergent catchments. Given these coupled hydro-climatic and geomorphic conditions, Chongqing provides a representative setting for evaluating multi-component flood risk and resilience under strong terrain constraints and rapidly evolving urban exposure.
2.2. Data Source
This study integrates multi-source geospatial datasets to support two linked tasks: (i) a municipality-wide flood risk and resilience assessment on a uniform grid, and (ii) a central city area analysis for validating model outputs against observed waterlogging locations and diagnosing key drivers of waterlogging occurrence. All datasets were harmonized to a 500 m × 500 m fishnet grid, enabling consistent spatial aggregation across hydrometeorological, topographic–hydrological, land-cover, and socio-economic layers. For each grid cell, indicators were summarized using mean-based statistics, producing a structured indicator database for subsequent risk/resilience evaluation and machine-learning analysis.
Daily precipitation was sourced from the ChinaMet 0.01° gridded daily precipitation product (2020–2024, warm season May–October), which integrates rain-gauge observations with satellite products and is validated at the national scale [
3]. Daily fields were clipped to the Chongqing municipal boundary, buffered outward by 8 km to avoid rolling-window boundary artifacts, and provider-flagged values were removed. Following the ETCCDI convention, eight extreme-precipitation indices were derived from the June–September target window, using May–October reads to preserve rolling-sum boundary behavior. Rx1day, Rx3day, and Rx5day as annual maxima of 1-, 3-, and 5-day running totals; R20 mm, R50 mm, and R100 mm as annual counts of days exceeding 20, 50, and 100 mm, respectively; and P95 and P99 as the 95th and 99th percentiles of June–September daily precipitation. Per-year indices were composited across 2020–2024 by multi-year mean (used in
Table 1); sensitivity checks against median and maximum composites did not alter the top-decile ranking of Hazard indicators. Because the input is a pre-validated gridded product rather than station records, station-level kriging cross-validation does not apply. We cross-inspected the Rx1day spatial pattern against Chongqing Climate Center bulletins for 2015, 2018, and 2020 and found consistent major-event footprints.
Two derived datasets warrant explicit disclosure. The 250 m gridded mean-annual-runoff layer was provider-validated against national runoff stations and is used here as a screening-level hazard proxy at 500 m zonal-mean aggregation; event-scale validation against local gauges is flagged as future work. Service-accessibility indicators (fire stations, Class-A tertiary hospitals, public shelters) were computed from AMap POI records after duplicate removal and address cross-matching, using the Gaussian-enhanced two-step floating catchment area method (Gauss-2SFCA), which has been validated against gravity-model benchmarks in public-service and disaster-response applications [
37]. We emphasize that accessibility is a
structural proxy for response capacity, not a direct measurement of drainage-system robustness. This proxy–construct gap is examined empirically in
Section 5.2.
The indicator system follows a commonly used framework that separates risk formation into hazard, exposure, sensitivity, and adaptation components, while additionally deriving composite indices such as vulnerability, risk, and resilience. Hazard indicators describe rainfall extremes and runoff-generating conditions such as multi-day precipitation maxima, percentile precipitation, runoff potential, and terrain-driven convergence. Exposure captures the spatial concentration of people and human activities, proxied by population and nighttime light intensity. Sensitivity is represented by land-cover composition and surface characteristics that modulate inundation likelihood, like construction land proportion and other land-cover shares. Adaptation reflects the capacity to cope with flood impacts through accessibility to critical services and infrastructure support.
Table 1 summarizes the full set of indicators used in this study.
A key adaptation component is accessibility, quantified using the Gaussian two-step floating catchment area (Gaussian 2SFCA) approach. In brief, the method first estimates service supply-to-demand ratios within travel-time catchments and then aggregates these ratios to each grid cell with a distance-decay kernel, producing accessibility scores that reflect both proximity and competition for services. In this study, accessibility indicators were computed for multiple service types relevant to flood response (e.g., Class-A tertiary hospital facilities, fire stations, and emergency shelters), then summarized at the grid level to represent the spatial distribution of adaptive capacity. This design allows adaptation to be interpreted as an operational, place-based capacity rather than an abstract attribute.
Two independent waterlogging point sets are used for external validation and driver modeling. Set A (“historical”) comprises 117 points compiled from the Central City Drainage and Waterlogging Control Special Plan (Revision 2022), issued by the Chongqing Municipal Commission of Housing and Urban–Rural Development. This plan integrates provenance streams over 2015–2021. Municipal engineering-inspection records submitted by district drainage authorities and consolidated by the municipal emergency-management office, and post-event field surveys following the major rain events of 2015, 2018, and 2020. Set B (“2022”) comprises 70 points released with the plan’s 2022 update, reflecting events recorded during the 2021–2022 warm seasons. Both sets are multi-year aggregates of recurrent waterlogging locations, not single-event snapshots. Each record specifies the textual location of a chronic waterlogging occurrence but does not report inundation depth, duration, or annual recurrence frequency, which were not publicly disclosed for operational reasons as a limitation acknowledged in
Section 6.
The phenomena represented are predominantly surface ponding combined with drainage-system overload, and pure fluvial inundation along the Yangtze and Jialing corridors are excluded by the plan’s scope and are governed by a separate flood-defense plan. The predictive target of both the composite index and the LightGBM classifier is therefore best characterized as chronic pluvial waterlogging locations recurrent street-level ponding reflecting short-duration rainfall extremes, terrain-driven convergence, runoff potential, imperviousness, and local drainage bottlenecks rather than event-specific inundation depth or fluvial stage.
For grid-level validation, the 187 field-reported points were spatially joined to the 500 m fishnet and deduplicated at the cell level, producing 167 unique positive grid cells (106 from Set A, 65 from Set B, with four cells shared). This grid-level count defines the binary label used throughout
Section 4. Given the resulting prevalence of ≈0.74% across the 22,500-cell central-city grid, validation emphasizes ranking-based diagnostics (ROC-AUC, PR-AUC, top-k capture) rather than accuracy, which is uninformative at this rarity. Reporting-density considerations are addressed in
Section 4.4.
3. Methods
3.1. Indicator Construction and Framework
This study adopts a uniform 500 m × 500 m fishnet as the analytical unit to support a city-wide, comparable assessment of flood risk and resilience. All input layers were spatially aligned to the grid, and each indicator was aggregated to grid cells using a consistent rule, such as zonal mean for raster-derived variables, density or distance-based statistics for vector-derived variables. To reduce scale-induced bias, indicators were processed under a unified coordinate reference system and clipped to the administrative boundary of Chongqing Municipality. The same framework was then applied to the central city area as a focused sub-region for validation and mechanism interpretation.
Figure 3 demonstrates the overall research framework.
The 500 m fishnet was selected as a compromise among three constraints. First, the coarsest native resolution among required inputs, 250 m for the gridded runoff layer, approximately 1 km for gridded population, and approximately 500 m for VIIRS-like nighttime light, prevents physically meaningful downscaling below 250 m without spurious artifacts. Second, Chongqing’s central-city super-block scale averages 200–400 m in the dense core and 600–1200 m in peripheral zones, so a 500 m cell captures at least one super-block of urban fabric. Third, at sub-500 m resolutions, the waterlogging positivity rate falls below 0.3%, destabilizing spatially grouped cross-validation. Because any aggregation is subject to the modifiable areal unit problem [
38], we acknowledge this as a residual limitation; a full multi-resolution replication at 250 m and 1 km is flagged as the highest-priority future-work extension. Within the 500 m framework, we report a closely related robustness diagnostic, such as a 500-iteration Monte Carlo weight perturbation (
Section 4.4) that directly tests the stability of the top-k ranking to weighting uncertainty.
Indicators were organized into four conceptual components that jointly describe the flood risk–resilience system: Hazard, Exposure, Sensitivity, And Adaptation. Hazard captures hydro-climatic and terrain-related forcing (e.g., precipitation extremes, runoff potential, topographic wetness, slope, elevation, proximity to rivers). Exposure represents the concentration of people and assets (e.g., population density proxies, nighttime light intensity). Sensitivity reflects the susceptibility of the built and natural environment to inundation under a given forcing, while adaptation measures the capacity of emergency response and services to reduce impacts and support recovery. In particular, accessibility variables in the adaptation component were computed using a Gaussian-enhanced two-step floating catchment area (Gauss-2SFCA) approach, which models distance-decayed service availability and better reflects real-world spatial attenuation than binary catchments.
For Gauss-2SFCA, the first step calculates a supply-to-demand ratio for each facility
(e.g., hospitals, fire stations, shelters):
where
is the service supply of facility
,
is the demand at grid
,
is the travel distance or time between
and
,
denotes grids within the catchment of facility
, and
is a Gaussian decay function. The second step sums the decayed ratios from all reachable facilities for each grid
:
where
indicates facilities within the catchment of grid
. The Gaussian decay is specified as
where
controls the decay rate. The resulting accessibility scores are then used as adaptation indicators and integrated with the other components to compute composite indices.
3.2. CRITIC–Entropy Fusion and TOPSIS Weighting
To ensure comparability across indicators with different units and directions, all indicators were normalized to [0, 1]. For a “positive” indicator (higher is worse for risk components, or higher is better for adaptation depending on the component definition), min–max scaling was applied. For “negative” indicators, a direction correction was performed so that larger normalized values consistently represent a stronger contribution to the intended latent construct within that component.
Objective weights were then computed by combining CRITIC and Entropy weighting to reduce reliance on a single statistical criterion. CRITIC assigns a higher weight to indicators with larger variability and lower redundancy [
11]. Let
be the normalized value of indicator
in grid
,
be the standard deviation of indicator
, and
be the Pearson correlation between indicators
and
. CRITIC defines the information content of indicator
as
Entropy weighting quantifies the dispersion of each indicator based on information entropy [
39]. With
The two sets of weights were fused by averaging (consistent with the code implementation):
Objective weighting via CRITIC–Entropy fusion was adopted because it is fully reproducible, avoids expert-panel bias where consensus priors were unavailable, and preserves within-component comparability. We acknowledge the critique that statistical dispersion is not a substitute for process importance [
16,
29] as a highly variable indicator is not necessarily a causally dominant one. Three design choices mitigate this. First, CRITIC–Entropy is applied within each of the four components separately, rather than across the entire indicator set, preventing structurally different constructs from competing for weight in a single statistical pool. Second, a 500-iteration Monte Carlo weight-perturbation analysis (
Section 4.4) diagnoses both margin-level sensitivity and order-level stability of the resulting surface. Third, the two waterlogging point sets used for external validation were not used in any weighting or normalization step, so circularity is prevented by design. We surface in the main text (
Section 4.1) that three highly skewed hazard indicators receive disproportionate entropy weight because entropy amplifies distributions with many near-zero values, as a caveat previously hidden in
Appendix A.
For each component (hazard, exposure, sensitivity, adaptation), a weighted decision matrix was constructed and synthesized using TOPSIS [
40]. TOPSIS ranks alternatives by their relative closeness to the ideal solution. For grid
, the distance to the positive ideal
and negative ideal
yields the closeness score:
where
is the component index. This procedure produces the component indices
(hazard),
(exposure),
(sensitivity), and
(adaptation).
Following the system logic implemented in the code, Vulnerability emphasizes exposure and sensitivity while treating higher adaptation as risk-reducing:
where
are component weights (set in the configuration and applied consistently across the study). Flood risk is then modeled as a multiplicative interaction between hazard and vulnerability to capture compounding effects:
Finally, resilience is derived as a monotonic transformation of adaptation relative to risk. The main implementation uses a stabilized ratio (log form) to improve numerical behavior and interpretability:
where
is a small constant preventing division by zero. This formulation makes resilience high when adaptation capacity is strong, and risk pressure is comparatively low.
3.3. Validation and Driver Identification with LightGBM–SHAP
To test whether the grid-based indices capture real-world flood-prone patterns, waterlogging points within the central city area were extracted from the “Central City Drainage and Waterlogging Prevention Special Plan (Revised, 2022)”. The original records are text-based; place names and descriptions were converted to coordinates via geocoding, after which points were spatially matched to grid cells. A grid cell was labeled as an “event cell” if it intersects at least one waterlogging point, producing a binary label for validation and modeling.
Validation focused on whether event cells exhibit systematically higher hazard, vulnerability, and risk scores than non-event cells. The workflow computes distributional contrasts and classification-oriented diagnostics, including ROC/PR performance, top- capture curves, and calibration checks. These outputs quantify how well the risk surface can “retrieve” known waterlogging locations under different screening intensities, offering a practical interpretation aligned with risk management.
To explain why waterlogging concentrates in certain cells, a LightGBM classifier [
31] was trained within the central-city area to predict the binary label y from the selected indicators. The label is extremely imbalanced, with 167 positive cells out of 22,500, a prevalence ≈ 0.74%, which decisively shapes both model design and evaluation. Because the downstream use is ranking-based screening rather than hard classification, preserving the predicted-probability rank structure is more important than any particular accuracy threshold. We therefore adopted LightGBM’s native scale_pos_weight parameter set to n
−/n
+, which re-weights minority-class gradient contributions without injecting synthetic samples. We did not adopt SMOTE or related resampling corrections, because recent evidence shows they systematically distort probability calibration in low-prevalence risk-prediction settings—bending calibration curves away from the identity line and corrupting rank-order fidelity precisely in the high-probability region that governs top-k screening [
41]. A systematic benchmark against SMOTE variants and focal loss is flagged as future work. To reduce over-optimistic generalization caused by spatial autocorrelation in features [
10], spatial groups were constructed via MiniBatch K-means on grid centroids, and group-aware K-fold cross-validation ensured that training and validation folds were spatially separated. Performance is reported via cross-validated ROC-AUC, PR-AUC, and log loss, with the final model interpreted through SHAP:
where
is the SHAP contribution of feature
to the prediction at grid
. Dependence plots and interaction analyses are used to diagnose non-linear thresholds and coupled mechanisms. These interpretations are reported as mechanistic evidence linking hydro-topographic forcing, exposure concentration, and adaptation accessibility to observed waterlogging patterns.
Final LightGBM hyperparameters (num_leaves = 63, learning_rate = 0.03, n_estimators up to 3000 with 150-round early stopping, min_child_samples = 50, subsample = 0.8, colsample_bytree = 0.8, reg_lambda = 1.0) were selected by targeted grid search under the same spatial-group K-fold cross-validation, optimizing mean PR-AUC across folds. Choices were deliberately conservative: tree depth and leaf count were capped to reduce overfitting risk on a dataset with only 167 positive cells, and a low learning rate with many shallow trees preserves feature-level SHAP attribution stability. We did not conduct a systematic benchmark against alternative gradient-boosting libraries (XGBoost, CatBoost) or deep-learning classifiers for two reasons. First, the central argument of the paper, like the dissociation between accessibility-based capacity proxies and operational waterlogging performance, is derived from SHAP-level feature–target structure and is stable under library substitution when the feature representation is held constant. Second, the principal source of residual error in this application is data limitation (reporting-density bias in the label, absence of pipe-network attributes in features) rather than model capacity. Systematic cross-library benchmarking and Bayesian hyperparameter optimization are flagged as future methodological extensions.
5. Discussion
5.1. The Spatial Logic of Flood Risk Formation in Chongqing
This study provides convergent evidence that a composite, grid-based index can capture the first-order geography of pluvial waterlogging risk in Chongqing, particularly within the central city where exposure is highly concentrated. The risk surface shows strong discrimination against two independent point sets (historical and 2022), with ROC-AUC values of 0.834 and 0.873, respectively. Importantly, the same validation table also shows consistently low PR-AUC values, which is expected under extremely low event prevalence and reinforces that the map should be interpreted primarily as a ranking and screening tool rather than a literal probability surface. This pattern is consistent with well-known evaluation theory: when positives are rare, ROC-AUC can remain high even when precision is modest, whereas PR-AUC more directly reflects the challenge of correctly identifying scarce events [
45].
Mechanistically, the component maps with hazard, exposure, sensitivity, and adaptive capacity illustrations in
Figure 4 suggest a structured “risk production chain” consistent with mainstream risk framing: hazard provides the physical forcing background, exposure loads assets and people onto that background, sensitivity translates urban surface characteristics into runoff propensity, and adaptive capacity modulates the severity and recoverability of impacts [
1]. In Chongqing, terrain and hydro-climatic constraints create a persistent predisposition template: strong relief, pronounced valley systems, and spatially heterogeneous precipitation extremes jointly condition a region where rapid runoff generation and flow convergence are likely. Urbanization then intensifies this template by expanding impervious cover, reshaping micro-topography, and increasing the density of receptors, which elevates both impact potential and the likelihood that waterlogging is reported.
At the central city scale, corridor-like hotspots and clustered waterlogging points imply that risk is shaped not only by natural topographic convergence but also by constructed drainage pathways and bottlenecks. The built environment can re-route overland flow along roads, rail corridors, and underpasses, and can generate localized ponding at slope breaks or where stormwater inlets are sparse or blocked. The key implication is that urban waterlogging in Chongqing should be understood as a systemic outcome produced by the intersection of (i) intense short-duration rainfall, (ii) terrain-constrained convergence, and (iii) high-density urban development that both amplifies runoff and increases exposure, rather than as a set of isolated “bad spots”.
5.2. Why Risk and Resilience Co-Exist
A potentially counterintuitive but substantively important result is that resilience operationalized here as “capacity per unit risk” does not predict the absence of waterlogging. In fact, observed waterlogging points align with higher resilience values, as shown by the low ROC-AUC for “1—resilience” (0.245 and 0.296 for the two sets). It reveals a common conceptual and empirical pitfall, as capacity indicators and performance outcomes are not interchangeable [
46].
Two structural mechanisms explain the observed co-existence. First, in dense urban cores, the proxies used for adaptive capacity, such as service accessibility, facility density, and infrastructure presence, are intrinsically high because central areas are well served, well connected, and historically prioritized for investment. Meanwhile, the same cores also concentrate exposure and runoff generation. The result is a “risk–resource co-location” pattern: cities accumulate both hazards and capacities [
46,
47]. In other words, the resilience index captures a structural advantage in resources and services, but waterlogging points represent operational and micro-scale failures that are only weakly reflected by city-scale capacity proxies.
Second, resilience in this framework is primarily a “coping and recovery capacity” construct, whereas pluvial waterlogging is often governed by “avoidance and conveyance performance” at fine scales—storm sewer capacity, inlet spacing, debris blockage, local subsidence, construction disturbance, micro-topographic depressions, and the timing of peak rainfall relative to network loading. This mismatch naturally produces situations where a place is “resourced” yet still “frequently waterlogged”. The model evidence is consistent with this interpretation: the document notes that predictors such as fire-station accessibility can increase predicted waterlogging likelihood because they operate as signals of urban centrality, not direct protective factors.
Conceptually, this dissilience is framed in broader literature. Resilience is about the ability to absorb disturbance, maintain function, and recover—not necessarily about eliminating disturbance occurrence [
47,
48]. For resilience to work as a more direct “waterlogging avoidance” indicator, it likely requires drainage-system robustness variables such as pipe capacity, inlet density, pump station coverage, storage facilities, maintenance and blockage records, observed inundation duration, in addition to general service accessibility.
5.3. Driving Factor Insights for Compound Triggers and Urban Centrality
The explainable machine-learning results provide a process-consistent interpretation of why waterlogging concentrates where it does, and they add nuance that is difficult to derive from component maps alone. Under spatial cross-validation, the model shows strong ranking ability (OOF ROC-AUC around 0.920; AP about 0.074), despite the extremely imbalanced dataset (167 positive grids out of 22,500; prevalence is about 0.74%). This matters because it captures stable spatial regularities rather than overfitting to idiosyncratic point locations.
Mechanistically, the SHAP-based interpretation [
32] supports a “compound trigger” perspective: short-duration extreme precipitation acts as the initiating forcing, but its effect is amplified where runoff potential and convergence are high. This is exactly what one would expect in a city like Chongqing, where relief is strong, and drainage pathways are tightly constrained: intense rainfall becomes most consequential in landscapes that rapidly generate surface flow and route it into valley outlets or artificial low points. The non-linear contributions of slope and topographic wetness indicators further suggest multiple susceptible regimes, including low-gradient built-up pockets prone to ponding and steep settings where fast runoff overloads downstream conveyance or concentrates at topographic constrictions.
A second major insight concerns urban centrality variables. Accessibility-related predictors are statistically informative, but their directionality should be interpreted carefully. The document explicitly notes that higher accessibility (e.g., fire-station access) can correlate with higher waterlogging likelihood because it indicates where infrastructure and services concentrate precisely where imperviousness, runoff generation, traffic disruption, and reporting probability are also high. This illustrates a key methodological lesson for resilience analytics, as many proxies for urbanization intensity, without careful causal framing, can appear as risk enhancers in occurrence models. In practice, this does not imply that emergency services cause flooding; it implies that centrality, density, and infrastructure concentration co-locate with both resources and waterlogging stressors.
Finally, the model’s utility is operational rather than purely explanatory. The results demonstrate strong spatial concentration of predicted risk: prioritizing a small fraction of grids captures a large share of observed waterlogging points. This supports a pragmatic policy stance: even when probability calibration can guide efficient inspection, diagnosis, and retrofit targeting.
5.4. Implications for Zoned Governance and Engineering Intervention
The strongest governance message from this study is that precision targeting is feasible. The top-k capture analysis shows that the top 10% highest-scoring grids capture 147 points (88.02%) and the top 20% capture 95.21%. This is not a trivial performance statistic; it is effectively a resource-targeted, under-constrained budget. A city can generate a ranked “high-risk list,” then progressively narrow from district-level zoning to corridor-level diagnosis and site-level remediation, combining map-based screening with on-the-ground verification such as micro-topography checks, inlet condition audits, CCTV inspection, and localized hydraulic capacity testing.
Given the inferred driver structure, interventions should be differentiated by dominant mechanism rather than applied uniformly. Where extreme rainfall and high runoff potential coincide, the priority is to increase peak buffering and redundancy so that forcing does not exceed system capacity—through distributed storage, detention, and strategically placed overflow routes. In high-convergence corridors and confluence nodes, the emphasis should be on relieving bottlenecks and preventing backwater effects by improving key conveyance links and ensuring continuity of flow paths. In high-centrality, high-exposure urban cores, engineering measures should be coupled with operational management: real-time traffic organization, rapid inlet clearing, temporary drainage deployment during forecasted extremes, and targeted sponge retrofits that reduce recurrent ponding and protect critical urban functions.
Crucially, this zoned approach also resolves an apparent tension in the areas of “high resilience” in the sense of service provision and recovery capacity, yet they remain priority zones for drainage-performance improvement because repeated waterlogging imposes cumulative economic and social costs. Therefore, a resilience-informed governance strategy needs two layers: (i) system-level coping and recovery capacity building (emergency access, shelters, warnings), and (ii) micro-scale drainage robustness improvements that directly reduce waterlogging occurrence and duration.
5.5. Assumptions, Proxy Limitations, and Transferability
The framework rests on four falsifiable assumptions. Aggregation-invariance, like top-k rankings at 500 m, is qualitatively preserved at finer/coarser grids, and is untested here and constitutes the principal MAUP caveat [
38]. The weight-perturbation Monte Carlo (
Section 4.4) offers an indirect check (ranking stability is high), but only multi-resolution replication can close the question. Rainfall stationarity, such as the 2020–2024 composite, which treats warm-season rainfall as statistically stationary, becomes tenuous under climate change [
3]. Integrating non-stationary CMIP6 scenarios is a priority extension. Capacity–performance substitution, implicit in any resilience index constructed as 1 − risk, is the assumption this paper empirically falsifies; closure requires direct drainage-system attributes (
Section 5.2). Label completeness is the most operationally consequential: reporting-density bias (
Section 4.4) threatens calibrated absolute-risk estimation, and closure requires Bayesian under-reporting correction [
44] with multi-stream event fusion.
Transferability to other mountainous megacities follows a tiered logic. Terrain-controlled indicators (ELEmean, SLOPEmean, FLOWACCmean, TWImean, SINKDEPmean, dist2riv, rivden) transfer directly. Rainfall extreme indicators (Rx1day through P99) transfer in methodological form but require locally appropriate ETCCDI thresholds. Exposure and adaptation structures (population proxy, nighttime light, POI accessibility) transfer in concept but require locally re-curated POI sources and re-calibrated Gauss-2SFCA parameters. The weighting scheme itself must be re-estimated on the destination dataset rather than transplanted numerically. The framework is therefore offered as a transferable protocol, not a transferable set of numerical weights.
6. Conclusions
This study proposes and tests a city-wide framework for assessing urban flood risk and resilience in a terrain-constrained megacity, using Chongqing as a representative case. By integrating multi-source datasets into a uniform 500 m × 500 m fishnet, the framework supports consistent comparison of flood-related conditions across the municipality and provides a practical bridge between climate-related hazard signals, built-environment characteristics, and capacity proxies relevant to emergency response and recovery. The indicator system is organized around hazard, exposure, sensitivity, and adaptation, which allows risk formation and capacity distribution to be mapped jointly rather than treated as a single composite outcome.
A central result is that the composite risk surface aligns with observed waterlogging locations better than single-factor representations. Validated against two independent waterlogging point sets, the risk index achieves ROC-AUC values of 0.834 for historical points and 0.873 for the 2022 set, and it shows statistically significant separation between waterlogging and non-waterlogging samples. This matters because it suggests that observed waterlogging in Chongqing is not well explained by hydrologic or topographic hazard alone. Instead, hotspots reflect coupled effects in which hydroclimatic forcing and terrain convergence interact with exposure concentration and land-surface susceptibility, producing corridor-like and nodal patterns that are consistent with systemic urban structure rather than isolated anomalies.
Equally important, resilience should not be interpreted as the inverse of risk [
20,
49]. Many observed waterlogging points occur in areas classified as having medium-to-high capacity, and the resilience-related score performs below random for discriminating against waterlogging points. This is not a failure of the method; it is an empirical signal that dense urban cores can be simultaneously high-risk and high-capacity [
50]. In practice, core districts often have better service accessibility and infrastructure provision, yet they also experience higher runoff loads, higher imperviousness, and far higher exposure. Treating “capacity” as though it guarantees safety can therefore mislead flood governance. The more defensible interpretation is that resilience indicators reflect response and recovery capacity, while risk indicators reflect the likelihood and potential impact of overload and failure. Planning decisions need both maps because they answer different questions: where failures are likely to occur, and where the system can absorb and recover from them.
To move from mapping to explanation, we further tested the information content of the assembled features using a LightGBM model evaluated with spatial cross-validation in a highly imbalanced setting. The model shows strong ranking performance, which supports the view that multi-source indicators capture consistent signals related to waterlogging occurrence. More importantly for governance, the model output concentrates risk into a small spatial fraction: the top 10% of grids capture 88.02% of observed points, and the top 20% capture 95.21%. This kind of concentration is operationally valuable because municipal resources are limited. It implies that targeted inspection, monitoring, and drainage retrofits can be prioritized in a small set of high-scoring areas while still covering most historically affected locations. At the same time, calibration analysis indicates that scores should be treated as relative risk rankings rather than literal probabilities, a common issue for rare events and heterogeneous urban systems.
Future work should integrate five methodological extensions following directly from the diagnostics reported: (1) multi-resolution MAUP replication at 250 m and 1 km to test aggregation-invariance; (2) non-stationary rainfall integration via CMIP6-downscaled precipitation scenarios; (3) direct drainage-system attributes (pipe capacity, inlet spacing, pump-station coverage, blockage records) replacing accessibility proxies in the adaptation component, converting the resilience indicator from a structural-availability measure into a hydraulic-performance measure, and closing the capacity–performance gap diagnosed in
Section 5.2; (4) Bayesian spatial under-reporting correction [
16] fused with geocoded social-media reports, supporting calibrated absolute-risk estimation; and (5) residual spatial-autocorrelation absorption via eigenvector spatial filters [
10] or Gaussian-process residual layers, targeting the 0.53 residual Moran’s I diagnosed in
Section 4.4. Complementary lower-priority extensions include systematic cross-library benchmarking (XGBoost, CatBoost) with Bayesian hyperparameter optimization, and calibration-preserving imbalance corrections.
In conclusion, the proposed risk–resilience assessment provides a validated, interpretable, and operational basis for urban flood governance in complex terrain-constrained settings. By explicitly separating capacity from risk, validating against observed events, and translating results into spatial prioritization logic, the study supports more targeted and defensible decisions on drainage configuration, flooding risk monitoring, and resilience planning.