Fusing Historical Records and Physics-Informed Priors for Urban Waterlogging Susceptibility Assessment: A Framework Integrating Machine Learning, Fuzzy Evaluation, and Decision Analysis

Chen, Guangyao; Guan, Wenxin; Xu, Jiaming; Koh, Chan Ghee; Xu, Zhao

doi:10.3390/app151910604

Open AccessArticle

Fusing Historical Records and Physics-Informed Priors for Urban Waterlogging Susceptibility Assessment: A Framework Integrating Machine Learning, Fuzzy Evaluation, and Decision Analysis

by

Guangyao Chen

¹

,

Wenxin Guan

¹,

Jiaming Xu

¹,

Chan Ghee Koh

²

and

Zhao Xu

^1,*

¹

School of Civil Engineering, Southeast University, Nanjing 211189, China

²

Department of Civil and Environmental Engineering, National University of Singapore, Singapore 119077, Singapore

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(19), 10604; https://doi.org/10.3390/app151910604

Submission received: 3 September 2025 / Revised: 28 September 2025 / Accepted: 29 September 2025 / Published: 30 September 2025

(This article belongs to the Topic Resilient Civil Infrastructure, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Featured Application

What are the main findings?

This study proposes a dual-source sample enhancement strategy that integrates Physics-Informed Priors (PI-PRIORS) with HWR, incorporating extreme rainfall scenarios and applying a joint filtering mechanism based on membership, credibility, and impact degrees. This approach systematically extracts high-quality samples and embeds extreme-scenario information into the modeling process.
To address the heterogeneity of polygon-based waterlogging risk distributions, a dimension-reduction sampling framework is introduced based on TWD theory. It integrates an MCCM and the CRITIC-TOPSIS method, which integrates the CRITIC (Criteria Importance Through Intercriteria Correlation) method and the TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) method, to quantify membership degrees of overlapping risk levels, while also assessing credibility and social influence scores to support robust point-based sampling in spatially complex environments.
A MaxEnt (Maximum Entropy) modeling framework—a statistical learning approach rooted in information entropy theory—is developed by integrating variables from natural conditions, social capital, infrastructure, and the built environment. The contributions and directional effects of each factor are quantified, achieving a balance between interpretability and scalability. This framework offers a transferable tool for diverse urban settings and targeted flood mitigation planning.

Abstract

Urban Waterlogging Susceptibility Assessment (UWSA) is vital for resilient urban planning and disaster preparedness. Conventional methods depend heavily on Historical Waterlogging Records (HWR), which are limited by their reliance on extreme rainfall events and prone to human omissions, resulting in spatial bias and incomplete coverage. While hydrodynamic models can simulate waterlogging scenarios, their large-scale application is restricted by the lack of accessible underground drainage data. Recently released flood control plans and risk maps provide valuable physics-informed priors (PI-Priors) that can supplement HWR for susceptibility modeling. This study introduces a dual-source integration framework that fuses HWR with PI-Priors to improve UWSA performance. PI-Priors rasters were vectorized to delineate two-dimensional waterlogging zones, and based on the Three-Way Decision (TWD) theory, a Multi-dimensional Connection Cloud Model (MCCM) with CRITIC-TOPSIS was employed to build an index system incorporating membership degree, credibility, and impact scores. High-quality samples were extracted and combined with HWR to create an enhanced dataset. A Maximum Entropy (MaxEnt) model was then applied with 20 variables spanning natural conditions, social capital, infrastructure, and built environment. The results demonstrate that this framework increases sample adequacy, reduces spatial bias, and substantially improves the accuracy and generalizability of UWSA under extreme rainfall.

Keywords:

urban waterlogging; susceptibility assessment; physics-informed priors; connection cloud model; MaxEnt model; three-way decisions

1. Introduction

With the accelerating pace of urbanization and the increasing frequency of extreme rainfall events driven by climate change, urban flooding has emerged as one of the most frequent and destructive types of urban disasters worldwide. It poses direct threats to public safety and critical infrastructure, while also severely disrupting urban functionality and causing substantial economic losses. Among urban water-related hazards, fluvial flooding and urban pluvial flooding (waterlogging) are often interrelated in their mechanisms and spatial distribution [1]. However, they differ significantly in terms of driving factors, triggering processes, spatial extent, and management strategies [2]. Fluvial floods arise from watershed-scale runoff and manifest as river overflow or levee breaches, with wide spatial impact, long duration, and severe regional consequences [3]. In contrast, urban waterlogging is localized and sudden. It is caused by short-term heavy rainfall exceeding the design threshold of the drainage system and thus surpassing the local drainage capacity, or by the accumulation of water in low-lying areas through slope differences and the convergence effect [4]. Although typically short-lived, such events can paralyze transportation, disrupt infrastructure, and endanger residents [5]. Waterlogging is also highly sensitive to micro-topography, land use, and drainage networks [6], resulting in fragmented distribution and requiring fine-scale spatial analysis for accurate risk assessment.

In China, waterlogging has become a major meteorological-hydrological hazard due to rapid urban growth and more frequent extreme weather [7]. The Global Natural Disaster Assessment Report identifies China among the countries most severely affected [8], particularly in the Yangtze River Basin and South China, where short-duration, high-intensity rainfall frequently overwhelms drainage systems [9]. The Zhengzhou rainstorm of 20 July 2021 exemplified this threat, causing 398 deaths and economic losses of over 120 billion RMB, and ranking among the most severe global urban water disasters [10]. Similar events in Guangzhou (2020) [11], Beijing (2023), and cities such as Nanjing, Shenzhen, and Wuhan [12] highlight systemic vulnerabilities and the limits of emergency response. These cases underscore the urgent need for fine-scale susceptibility mapping and proactive risk governance to enhance resilience.

It is important to clarify that this study focuses specifically on waterlogging susceptibility, defined as the inherent tendency of a spatial unit to experience water accumulation under specific physical, meteorological, and anthropogenic conditions. Unlike vulnerability or exposure, susceptibility pertains to the likelihood of hazard occurrence without necessarily involving damage or population presence [13,14]. Vulnerability relates to the potential for harm within a system [15], while exposure concerns the value or sensitivity of elements in hazard-prone areas [16]. These concepts, though often conflated, serve distinct roles within disaster risk analysis and must be clearly distinguished to ensure conceptual clarity and analytical precision. The UWSA conducted here represents a critical component of the broader risk evaluation chain, providing spatially explicit insights that support drainage system optimization, urban resilience planning, and anticipatory risk management. Current UWSA approaches can be broadly categorized into four types, each offering unique advantages and limitations in data requirements, spatial resolution, interpretability, and modeling accuracy, and they hold strong potential for integrated application.

A widely adopted approach to UWSA is statistical analysis based on HWR data [17]. Drawing from field surveys, public reports, or remote sensing, these methods utilize spatial regression, weighted overlays, or geostatistical techniques [18]. Their strengths include easy data acquisition and straightforward implementation, making them effective for detecting large-scale patterns and conducting preliminary zoning. At the same time, they can offer direct evidence of historical events. However, their heavy reliance on HWR limits their applicability in newly developed urban areas or under climate change scenarios. For example, certain regions have consistently encountered significant precipitation events, whereas others have remained largely unaffected, which may lead to the lack or irrelevance of past data. At the same time, under extreme rainfall, HWR may encounter situations where it is unable to capture areas prone to waterlogging due to hidden physical loopholes (such as overly small drainage pipes and concave vertical Bridges). Models based only on historical samples thus lack robustness for anticipating intensified rainfall.

Another mainstream method is Physics-based Hydrodynamic Simulation (PB-HydroSim), which combines one-dimensional drainage networks with two-dimensional surface runoff models (e.g., InfoWorks ICM, MIKE URBAN) to replicate interactions such as pipe surcharging, overflow, and overland flow [19]. These models are physically grounded and widely used for high-resolution analysis at the site or sub-district level [20]. Nevertheless, they face notable challenges: their sensitivity to detailed input parameters, including pipe diameter, slope, and node configuration, and the limited availability of drainage infrastructure data, which are often confidential and inaccessible in many cities.

To address these limitations, integrated multi-source modeling approaches have emerged. By incorporating topography, land use, socio-economic factors, and infrastructure data, and employing algorithms such as Analytic Hierarchy Process [21], fuzzy comprehensive evaluation [22], Maximum Entropy [23], random forests [24], and support vector machines [25], these methods offer flexibility and adaptability in data-scarce environments. They enable susceptibility mapping even without detailed inundation records or drainage maps. Integrating such outputs with HWR improves training quality and model stability. However, their effectiveness depends on the representativeness and completeness of training data, which is often very sensitive and costly, and some models lack physical interpretability, which may affect their credibility and stability.

With advances in artificial intelligence, deep learning models have recently entered urban waterlogging research [26]. Architectures such as Convolutional Neural Networks [27], Recurrent Neural Networks [28], and U-Net [29], integrate high-dimensional inputs from remote sensing, street-level imagery, radar precipitation, and ground observations. These models excel in feature extraction and non-linear pattern recognition, particularly in image and time-series analysis. However, they require large volumes of high-quality, multimodal datasets—including rainfall–flood time series, in situ waterlogging photos, and surveillance videos—which are often difficult to obtain, scattered across departments, and restricted by privacy or security concerns. Consequently, their application remains largely experimental and limited to small-scale pilots.

In summary, all the above three types of UWSA methods have limitations, yet their complementary strengths in data applicability, precision, and interpretability highlight the potential for methodological integration. Tailoring or combining approaches according to local conditions and data availability can improve risk identification and urban water management. In addition, leveraging more abundant multi-source data for systematic processing and in-depth analysis can further enhance the effectiveness of the integration method.

Recently, municipal agencies have begun releasing risk maps to enhance public awareness and preparedness. For example, in 2024, Beijing agencies issued a waterlogging risk map integrating HWR, remote sensing, and topography [30]. Developed through international and domestic models under multiple scenarios, the map identified representative risk zones for planning and preparedness. Despite increasing availability, the scientific use of these maps remains limited. Beyond communication, they can serve as supplementary data for susceptibility modeling by supporting training, validating outputs, and aiding data fusion. Yet their polygon-based representation introduces spatial heterogeneity and boundary uncertainty, complicating point-level sampling and classification.

Addressing these issues requires the development of a framework that can account for the coexistence of multiple risk levels and resolve local spatial heterogeneity in a consistent and quantifiable way. To this end, the present study proposes an integrated strategy that combines HWR data, PB-HydroSim outputs, and multi-source data-driven evaluation methods. By incorporating uncertainty analysis and the MaxEnt machine learning algorithm, the research conducts a spatial UWSA framework in Hefei’s central area. The resulting framework is designed to be transferable and adaptive, leveraging the empirical accuracy of HWR data, the scenario flexibility of PB-HydroSim, and the comprehensive spatial coverage of multi-dimensional indicators. The key contributions of this study are as follows:

(1) This study proposes a dual-source sample enhancement strategy that integrates PI-Priors with HWR, incorporating extreme rainfall scenarios and applying a joint filtering mechanism based on membership, credibility, and impact degrees. This approach systematically extracts high-quality samples and embeds extreme-scenario information into the modeling process.

(2) To address the heterogeneity of polygon-based waterlogging risk distributions, a dimension-reduction sampling framework is introduced based on TWD theory. It integrates an MCCM and the CRITIC-TOPSIS method to quantify membership degrees of overlapping risk levels, while also assessing credibility and social influence scores to support robust point-based sampling in spatially complex environments.

(3) A MaxEnt modeling framework is developed by integrating variables from natural conditions, social capital, infrastructure, and the built environment. The contributions and directional effects of each factor are quantified, achieving a balance between interpretability and scalability. This framework offers a transferable tool for diverse urban settings and targeted flood mitigation planning.

2. Research Area and Data Collection

2.1. Overview of the Study Area

Hefei, the capital of Anhui Province in eastern China, is the province’s political, economic, and cultural center. Its central urban area comprises four core districts—Yaohai, Luyang, Shushan, and Baohe—and extends into parts of surrounding counties such as Changfeng, Feidong, and Feixi (Figure 1).

This zone has high population density, concentrated economic activity, and well-developed infrastructure. Urban expansion has blurred the boundaries between core districts and neighboring counties, with suburban towns like Taohua and Shangpai in Feixi now highly urbanized and functionally integrated with central Hefei. Major industrial zones, including the Hefei Economic and Technological Development Zone, also straddle Feixi and Changfeng, reinforcing spatial and economic connectivity.

These areas share drainage networks and public infrastructure, with strong interdependence in transportation, industry, and commuting. Accordingly, this study defines Hefei’s central urban area as including both the four districts and adjacent rapidly urbanized zones (Figure 1). The widespread replacement of natural surfaces by impervious materials has substantially heightened the risk of waterlogging under extreme rainfall.

2.2. UWSA Indicator System

Previous studies have identified a wide range of natural and anthropogenic indicators for assessing urban waterlogging, consistently showing strong links with natural conditions, social capital, infrastructure, and the built environment. Commonly used variables include rainfall intensity, elevation, slope, surface roughness, terrain undulation, population density, and GDP. Infrastructure factors, however, are often simplified, with transportation typically represented by basic measures such as road density [31], distance to overpasses [23], or distance to major roads [32].

Yet transportation infrastructure is highly vulnerable during waterlogging, and disruptions can severely affect mobility, safety, and economic activity. To address these gaps, this study develops an indicator system (Table 1) that incorporates additional transportation-related variables, including distance to underpasses, concave-down overpasses, and major stormwater pipes. These indicators provide a more detailed representation of exposure mechanisms and improve the precision of susceptibility assessment.

2.3. Indicator Data Collection

Data for all indicators in Table 1 were collected from multi-source datasets and standardized to a 100 m × 100 m grid resolution in ArcGIS Pro. Zonal statistics were used to generate spatial distributions (Figure 2). In addition, 139 waterlogging-prone locations were obtained from the Comprehensive Urban Drainage and Waterlogging Prevention Plan of Hefei (2013–2020), hereafter “Plan 2020,” and used as labeled samples for susceptibility assessment.

For natural condition variables,

N_{2} - N_{4}

were derived from

N_{1}

through spatial analysis in ArcGIS Pro. In the built environment dimension,

B_{1}

was extracted from the Annual Land Cover Dataset of China [69] and processed using nearest-neighbor analysis. Infrastructure indicators

I_{1}

-

I_{4}

were also generated through nearest-neighbor analysis.

PI-Priors from Plan 2020 were further incorporated (Figure 3). Raster outputs of PB-HydroSim simulations for Hefei’s seven drainage zones were vectorized through a semi-automated process, with manual correction to address overlapping pixel colors. The simulations were conducted with MIKE FLOOD, which couples a 1D drainage network with a 2D hydrodynamic model. A risk index

R

was then calculated as

\begin{matrix} R = d \times (v + 0.5) + ρ \end{matrix}

(1)

where

d (m)

is water depth,

v (m / s)

is flow velocity, and

ρ

is the depth-related hazard coefficient. Specifically, when

d < 0.15 m

,

ρ

is set to 0.5. When

d \geq 0.15 m

,

ρ

is set to 1.0. Based on

R

, waterlogging risk was categorized into three levels: High risk

(R \geq 1.25)

, Medium risk

(0.75 \leq R \leq 1.25)

, and Low risk

(R \leq 0.75)

.

3. Materials and Methods

In the previous sections, we introduced the datasets and acquisition process for UWSA. Building on this foundation, the proposed framework integrates HWR and PI-Priors through four core modules: indicator selection and analysis, construction of the supplementary dataset from PI-Priors, UWSA model development, and generation and interpretation of susceptibility probability maps (Figure 4).

3.1. Dominant Factor Identification and Correlation Analysis

Given the large number of UWSA indicators, this section applied multiple diagnostic techniques to detect multicollinearity, as shown in Figure 5. Figure 5a presents the Variance Inflation Factors (VIF) and Condition Indices for all variables. Typically, VIF values above 10 or Condition Indices above 30 indicate severe multicollinearity. Indicators

B_{5}

and

B_{7}

exceed these thresholds, suggesting redundancy and the risk of biased estimates. Figure 5b displays a scree plot of eigenvalues derived from the standardized covariance matrix. The sharp decline followed by a plateau, along with several eigenvalues below 1, suggests underlying dimensional dependencies and shared variance among variables.

Figure 5c shows the Pearson correlation heatmap for the UWSA indicators. A coefficient above 0.8 generally indicates a strong correlation. While most variables exhibit low to moderate correlations, a high correlation is evident between

B_{5}

and

B_{7} .

This is because the density of water bodies, including rivers and lakes, expressed by B5 has an extremely high degree of overlap with the land, including the total area that human beings can be involved in production activities expressed by B7. Consequently, the land use variable was excluded, yielding a final set of 19 UWSA indicators.

3.2. Supplementary Waterlogging Dataset Construction Using PI-Priors

Constructing a supplementary dataset from PI-Priors is a key step linking simulation outputs with probabilistic modeling. The main challenge is extracting representative and spatially balanced samples from continuous risk surfaces. HWR data are sparse and temporally constrained, while PB-HydroSim outputs provide broader spatial coverage but are prone to oversampling and mixed classification within the same unit. Integrating these outputs into a representative dataset is, therefore, essential for enhancing model generalizability and spatial robustness.

Three-Way Decision (TWD) theory [70] offers a rigorous framework for resolving these problems. Originating in rough-set research, TWD partitions any decision space into an acceptance region, a rejection region, and a deferment region. Each region is defined by two thresholds that balance evidence in favor of, against, or undecided about a given hypothesis. In the context of PI-Priors, these thresholds translate naturally into three evaluation dimensions:

Membership measures the probability that a grid cell truly belongs to a high-risk class.
Impact captures the potential consequences should waterlogging occur, derived from socio-economic and infrastructural exposure.
Credibility quantifies the internal consistency of simulation output within a cell, reflecting model stability.

Applying TWD lets us treat membership, impact, and credibility as orthogonal evidential sources. Cells that exceed upper thresholds on all three metrics enter the acceptance region and become definitive training points. Those that fall below lower thresholds enter the rejection region and are discarded. Cells with conflicting signals form a deferment region; for these, we apply CRITIC–TOPSIS weighting to decide whether their overall evidential profile justifies inclusion.

The procedure can be understood as a dimensionality reduction process that transforms a two-dimensional continuous surface into a set of discrete representative points, while preserving essential spatial meaning and physical interpretability. By applying the TWD framework to PI-Priors, we are able to derive a well-balanced sample set that integrates high-risk relevance, spatial representativeness, and simulation credibility. To this end, this study follows the workflow illustrated in Figure 6 to construct an enhanced dataset based on PI-Priors. The following sections will provide a detailed explanation of the theoretical foundations and technical procedures involved in this process.

3.2.1. Membership Degree Quantification Based on the 2-D Connection Cloud Model

As shown in Figure 7, during the construction of the enhanced dataset based on PI-Priors, a single fishnet grid cell may contain multiple coexisting waterlogging risk levels. These risk levels differ in their proportional area coverage, and their boundaries often exhibit a degree of spatial ambiguity. Traditional hard classification methods, such as the maximum area method or the average area method, fail to capture this transitional and uncertain nature effectively.

To address this issue, this study applies the Connection Cloud Model (CCM) to quantify the waterlogging membership within each grid cell. CCM [71] extends the Normal Cloud Model (NCM) [72] by integrating set pair analysis and connection number theory [73]. Unlike NCM, which assumes normally distributed inputs and struggles to represent boundary ambiguity between classification levels [71], CCM offers greater flexibility in handling real-world indicator distributions and better captures the continuity and compatibility between adjacent risk levels. These advantages make it particularly suitable for complex systems where level boundaries are inherently fuzzy, and it has been widely applied in various uncertainty analysis and decision-making tasks [74,75,76].

In this study, CCM converts the proportion of each risk level within a grid cell into a conceptual membership degree. This approach preserves categorical distinctions while accounting for spatial heterogeneity and uncertainty, offering a more realistic representation of urban waterlogging patterns. The resulting membership degree serves as a more physically meaningful input for subsequent sample selection and model training. To handle multiple indicators simultaneously, MCCM generalizes CCM to higher dimensions. It reduces computational complexity and improves precision. Its definition is as follows: Suppose the task involves

m

levels and

n

indicators. For the

j - t h

indicator at level

i

, the MCCM is composed of two clusters of cloud droplets with an expected value of

E x_{i j}

. Let

C

be a qualitative concept in an n-dimensional quantitative space

Q

, and let

x \in Q

be a random realization of

C

. If

x

follows a normal distribution

x ~ N (E x, σ^{2})

, and

σ ~ N (E n, H e^{2})

, then the overall quantitative characteristics of

C

are defined as follows:

\begin{matrix} μ_{i} (x) = \exp (- \frac{9}{2} \sum_{j = 1}^{n} {|\frac{x_{i j} - E x_{i j}}{3 σ_{i j}}|}^{λ_{i j}}) \end{matrix}

(2)

where

μ_{i} (x)

denotes the membership degree between sample

x

and level

i

. For indicator

j

at level

i

, the parameters

E x_{i j}

,

{E n}_{i j}

,

{H e}_{i j}

,

λ_{i j}

, and

σ_{i j} ~ N (E n_{i j}, H e_{i j}^{2})

correspond to expected value, entropy, hyper-entropy, cloud order, and a random variable, respectively.

A sample

x (x_{1}, x_{2}, . . ., x_{n})

in an n-dimensional space, paired with its membership degree

μ_{i} (x)

, forms a single cloud droplet. Repeating this process across all indicators and levels yields an n-dimensional CCM

(x, μ)

. In previous studies, the calculation of

E x_{i j}

,

{E n}_{i j}

,

{H e}_{i j}

, and

λ_{i j}

relies on predefined evaluation criteria for each indicator. Therefore, establishing a well-defined evaluation criterion for determining whether a grid cell is waterlogging-prone is a necessary first step.

Numerous studies suggest that road surface water exceeding 0.25 to 0.35 m can impair vehicle control. For instance, Yin recommends avoiding roads submerged beyond this depth, which matches typical exhaust pipe height [77]. Similarly, Coles et al. and Green et al. proposed a 0.25 m threshold for emergency vehicle access in the UK [78,79]. Moreover, the Chinese Code for Design of Urban Road Engineering (CJJ37-2012) stipulates that curbstone heights range from 0.1 to 0.2 m. When water depth is below this range, waterlogging is generally confined to the roadway without affecting sidewalks or green belts. As shown in Equation (1), when a grid cell is classified as low or no risk

(R \leq 0.75)

, the water depth

d

remains below 0.15 m. Accordingly, we adopts 0.15 m as the threshold to distinguish between waterlogging-prone and safe areas. Table 2 summarizes the initial classification criteria based on PI-Priors.

Table 2 defines

ξ

as the Safety-Hazard Demarcation Index. For the

k - t h

analysis grid, the Safety Factor

S_{k}

is computed based on the proportion of No-risk

(P_{k}^{N})

and Low-risk

(P_{k}^{L})

areas, while the Hazard Factor

H_{k}

is derived from the proportion of Medium-risk

(P_{k}^{M})

and High-risk

(P_{k}^{H})

areas. When computing

S_{k} (P_{k}^{N}, P_{k}^{L})

and

H_{k} (P_{k}^{M}, P_{k}^{H})

, it is essential to recognize that different risk levels contribute unequally to waterlogging susceptibility. A simple sum of area proportions cannot accurately reflect their relative significance. For example, although both cases

P_{k}^{N} = 1, P_{k}^{L} = 0

and

P_{k}^{N} = 0, P_{k}^{L} = 1

yield the same total

S_{k} = 1

, the former clearly indicates a lower risk likelihood. To account for such differences, weighting factors should be introduced to modify the contribution of each risk level. The modified Safety Factor

{\hat{S}}_{k}

and Hazard Factor

{\hat{H}}_{k}

are calculated as follows:

\begin{matrix} \{\begin{matrix} {\hat{S}}_{k} = w_{N} P_{k}^{N} + w_{L} P_{k}^{L}, 0 < w_{L} < w_{N} < 1 \\ {\hat{H}}_{k} = w_{M} P_{k}^{M} + w_{H} P_{k}^{H}, 0 < w_{M} < w_{H} < 1 \end{matrix} \end{matrix}

(3)

where

w_{N} + w_{L} = 1

and

w_{M} + w_{H} = 1

. These weights ensure that higher risk levels exert greater influence in the hazard computation. The introduction of weights also changes the value ranges of

{\hat{S}}_{k}

and

{\hat{H}}_{k}

, which can be determined through basic linear programming (Figure 8) based on the conditions listed in Table 2.

Figure 8c illustrates the modified ranges of

{\hat{S}}_{k}

and

{\hat{H}}_{k}

. These ranges intersect, creating an ambiguous and fuzzy zone in which grid-cell membership cannot be assigned with confidence. Therefore, it is necessary to revise the classification criteria defined in Table 2, and the updated version is presented in Table 3.

Based on Table 3, the numerical characteristics

(E x_{i j}, E n_{i j}, H e_{i j}, λ_{i j})

required to construct the MCCM can be calculated. As shown in Figure 8c, the expected value for the waterlogging-prone area, denoted as

E x_{W - P}

, is clearly

(0, w_{H})

because in this case

{\hat{S}}_{k}

reaches its minimum, while

{\hat{H}}_{k}

reaches its maximum. Similarly, the expected value for the waterlogging-safe area,

E x_{W - S}

is

(w_{N}, 0)

. For the fuzzy area, the center point represents the most ambiguous classification, and its expected value is calculated as

E x_{W - F} = (\frac{(w_{N} + w_{L}) ξ}{2}, \frac{(w_{M} + w_{H}) (1 - ξ)}{2})

. The remaining parameters are calculated as follows:

\begin{matrix} E n_{i j} = \frac{a_{i j}}{3} \end{matrix}

(4)

\begin{matrix} H e_{i j} = β \cdot E n_{i j} \end{matrix}

(5)

\begin{matrix} λ_{i j} = \frac{l n (\frac{l n 4}{9})}{l n |\frac{I_{i j} - E x_{i j}}{3 E n_{i j}}|} \end{matrix}

(6)

where

β

is the MCCM fuzziness coefficient, fixed at 0.01 in this study.

I_{i j}

denotes the upper

(I_{i j}^{m a x})

or lower

(I_{i j}^{m i n})

bound of indicator

j

in level

i

. The width

a_{i j}

of each cloud is defined by the left

(a_{i j}^{L})

and right

(a_{i j}^{R})

half-widths. In this study, it is calculated as follows:

\begin{matrix} \{\begin{matrix} a_{i j}^{L} = E x_{i j} - I_{(i - 1) j}^{m i n} \\ a_{i j}^{R} = I_{(i + 1) j}^{m a x} - E x_{i j} \end{matrix} \end{matrix}

(7)

Once the parameters

(w_{N}, w_{L}, w_{M}, w_{H}, ξ)

are specified, all numerical characteristics can be determined based on Equations (4)–(7), and used to construct the MCCM. For each sample

x

, the corresponding membership vector

μ (x) = [μ^{w p}, μ^{w f}, μ^{w s}]

is calculated, and each grid cell is finally labeled by the maximum-membership rule.

3.2.2. Impact and Credibility Score Quantification Based on the CRITIC-TOPSIS Method

Based on the MCCM, the membership matrix

U \in R^{K \times 3}

can be computed for the initial sample set

X

, enabling a preliminary classification of the PI-Priors spatial outputs. Using TWD theory, each sample

i

is routed to an acceptance, rejection, or fuzzy region. Samples with

m a x (U_{i}) = μ_{i}^{w p}

enter the acceptance region, those with

m a x (U_{i}) = μ_{i}^{w s}

enter the rejection region, and ambiguous cases fall into the fuzzy region for further screening. To extract the most representative waterlogging-prone points with strong potential relevance and physical plausibility, we analyze a filtered subset

X_{s e l}

drawn from the acceptance and fuzzy regions. The CRITIC-TOPSIS procedure then ranks these samples by two complementary criteria:

The Impact matrix $X_{i m p}$ includes four indicators: population density, GDP, building density, and road density (see Table 1). These indicators reflect the potential urban exposure and losses if waterlogging occurs in the area.
The Credibility matrix $X_{c r e}$ includes four spatial proximity indicators: distance to water system, distance to underpass, distance to concave-down overpasses, and distance to HWR points. These indicators assess the spatial and physical reliability of the PI-Priors by estimating how likely the identified risk is to exist in reality.

The following procedures are then applied to both the impact matrix

X_{i m p}

and the credibility matrix

X_{c r e}

.

Step 1. Indicator Direction Alignment and Normalization Mapping

Let the indicator matrix be denoted as

X = {[x_{i j}]}_{m \times n}

, where

x_{i j}

represents the value of the

j - t h

indicator for the

i - t h

sample

(i = 1, 2, \dots, m; j = 1, 2, \dots, n)

. To harmonize indicator direction and scale, we first apply Z-score standardization and then map the standardized values through the standard normal cumulative distribution function

Φ

, producing the normalized matrix

\hat{X} = {[{\hat{x}}_{i j}]}_{m \times n}

:

\begin{matrix} {\hat{x}}_{i j} = \{\begin{matrix} Φ (\frac{x_{i j} - E [x_{j}]}{\sqrt{V a r [x_{j}]}}), if x_{j} is a benefit - type indicator \\ Φ (- \frac{x_{i j} - E [x_{j}]}{\sqrt{V a r [x_{j}]}}), if x_{j} is a cost - type indicator \end{matrix} \end{matrix}

(8)

where

E [x_{j}]

and

V a r

[

x_{j}

] denote the mean and variance of the

j - t h

indicator, respectively. The normalized value

{\hat{x}}_{i j} \in [0, 1]

with values closer to 1 indicating higher importance or contribution to risk.

Step 2. Indicator Weight Calculation Based on the CRITIC Method

To assign objective weights to each indicator, we adopt the CRITIC (Criteria Importance Through Intercriteria Correlation) method [80]. CRITIC balances an indicator’s discriminative power, captured by its standard deviation, with its redundancy, captured by Pearson correlations, thereby eliminating subjective bias in weighting. For the normalized matrix

\hat{X}

, the standard deviation of each indicator is calculated as

σ_{j} = S t d [{\hat{x}}_{j}]

, and the Pearson correlation matrix

ρ

is constructed as follows:

\begin{matrix} ρ = [\begin{matrix} ρ_{11} & ρ_{12} & \dots & ρ_{1 n} \\ ρ_{21} & ρ_{22} & \dots & ρ_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ_{n 1} & ρ_{n 2} & \dots & ρ_{n n} \end{matrix}], ρ_{j k} = \frac{\sum_{i = 1}^{m} ({\hat{x}}_{i j} - {\bar{x}}_{j}) ({\hat{x}}_{i k} - {\bar{x}}_{k})}{\sqrt{\sum_{i = 1}^{m} {({\hat{x}}_{i j} - {\bar{x}}_{j})}^{2}} \cdot \sqrt{\sum_{i = 1}^{m} {({\hat{x}}_{i k} - {\bar{x}}_{k})}^{2}}} \end{matrix}

(9)

Finally, the information content of indicator

j

denoted as

R_{j}

, is computed based on its variability and correlation with other indicators. The weight

w_{j}

is then derived by normalizing the information content across all indicators:

\begin{matrix} \{\begin{matrix} R_{j} = σ_{j} \sum_{k = 1}^{n} (1 - |ρ_{i k}|), j = 1, . . ., n \\ w_{j} = \frac{R_{j}}{\sum_{j = 1}^{n} R_{j}}, w = [w_{1}, w_{2}, . . ., w_{n}] \end{matrix} \end{matrix}

(10)

Step 3. Composite Scoring Using the TOPSIS Method

After obtaining the indicator weights

w

, the TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) method [81] is applied to evaluate and rank each sample based on a composite score. This method calculates the relative closeness of each sample to the ideal solution by comparing its distance to both the positive ideal solution and the negative ideal solution [82]. To implement this process, the first step is to determine the positive ideal solution

A^{+}

and the negative ideal solution

A^{-}

for each indicator:

\begin{matrix} \{\begin{matrix} A_{j}^{+} = \underset{1 ⩽ i ⩽ m}{m a x} {\hat{x}}_{i j}, A^{+} = {[A_{j}^{+}]}_{1 \times n} \\ A_{j}^{-} = \underset{1 ⩽ i ⩽ m}{m i n} {\hat{x}}_{i j}, A^{-} = {[A_{j}^{-}]}_{1 \times n} \end{matrix} \end{matrix}

(11)

Next, for each sample, the Euclidean distances to

A^{+}

and

A^{-}

in the weighted indicator space are computed as follows:

\begin{matrix} \{\begin{matrix} D_{i}^{+} = \sqrt{\sum_{j = 1}^{n} w_{j} {({\hat{x}}_{i j} - A_{j}^{+})}^{2}} \\ D_{i}^{-} = \sqrt{\sum_{j = 1}^{n} w_{j} {({\hat{x}}_{i j} - A_{j}^{-})}^{2}} \end{matrix} \end{matrix}

(12)

Finally, the TOPSIS composite score

S_{i}

for sample

i

is calculated as

\begin{matrix} S_{i} = \frac{D_{i}^{-}}{D_{i}^{+} + D_{i}^{-}} \end{matrix}

(13)

where

S_{i} \in [0, 1]

, with higher values indicating closer proximity to the ideal high-risk condition. In this study, TOPSIS scores are calculated separately for two dimensions:

S_{i}^{(X_{i m p})}

for impact (potential social consequences) and

S_{i}^{(X_{c r e})}

for credibility (spatial and physical reliability). These scores are then used to identify and prioritize high-risk potential waterlogging locations.

3.2.3. Selection of Supplementary Waterlogging Points Based on TWD Theory

Applying MCCM together with CRITIC–TOPSIS yields, for every grid cell, (i) a membership vector

μ_{i}

and (ii) two composite scores: impact

S_{i}^{(X_{i m p})}

and credibility

S_{i}^{(X_{c r e})}

. We use TWD to route cells into acceptance, rejection, or fuzzy regions (Figure 9). Cells with strong evidence (high membership) enter the acceptance region; weak-evidence cases remain fuzzy; and clearly negative cases go to rejection.

To reflect gradual transitions, the acceptance region is split into strong-acceptance and weak-acceptance; the fuzzy region is split into positive-leaning and negative-leaning. Strong-acceptance cells are directly added to the qualified supplementary set

Ψ^{+}

. Weak-acceptance and positive-leaning fuzzy cells are retained only if both composite scores exceed their thresholds (

S_{i}^{(X_{i m p})} \geq ε_{I}^{+}

and

S_{i}^{(X_{c r e})} \geq ε_{C}^{+}

). For fuzzy cases, the thresholds are set more strictly. Negative-leaning fuzzy and rejection cells are discarded. This rule preserves informative transitional samples while controlling false positives, producing a balanced, credible set of supplementary waterlogging points for downstream modeling.

3.3. MaxEnt-Based Modeling for UWSA

After completing indicator selection and supplementary-dataset construction (Section 3.1 and Section 3.2), this section develops a spatial UWSA model using MaxEnt. Rooted in information-entropy theory, MaxEnt originated in statistical mechanics and was later adapted for ecological niche modeling [83]. It excels at spatial prediction when data are incomplete because it infers the probability distribution with the highest entropy, producing the least-biased estimates consistent with observed information. MaxEnt does not require negative samples and can accommodate diverse predictor types, which makes it particularly suitable for environmental-hazard analyses such as urban waterlogging.

3.3.1. MaxEnt Principles and Methods

Mathematically, let

x

denote the input vector and

y_{1}, y_{2}, \dots, y_{n}

the output classes. MaxEnt defines the conditional probability of output

y

given input

x

as

\begin{matrix} p (y ∣ x) = \frac{1}{Z (x)} \exp (\sum_{i = 1}^{k} w_{i} f_{i} (x, y)) \end{matrix}

(14)

where

f_{i} (x, y)

is the

i - t h

feature function,

w_{i}

its learned weight, and

Z (x)

a normalization constant ensuring probabilities sum to one. This modeling approach allows prior knowledge to be effectively integrated with spatial features. The objective of MaxEnt is to maximize the regularized log-likelihood function over the training set

D

, expressed as

\begin{matrix} \underset{w}{m a x} \sum_{(x, y) \in D} l o g p (y| x) - C \cdot R (w) \end{matrix}

(15)

with

C

the regularization coefficient, and

R (w)

the penalty term that controls overfitting.

3.3.2. Parameter Setting and Model Construction of MaxEnt

Previous studies confirm that MaxEnt can deliver objective spatial predictions and quantify each indicator’s contribution [84]. In this work, we use the MaxEnt v3.4.4 software released by Columbia University to estimate waterlogging susceptibility. The HWR dataset is combined with

Ψ^{+}

to form the merged sample set. Geographic coordinates are entered as species-occurrence points, and the UWSA indicators serve as environmental variables. The merged samples are randomly split into training and testing datasets at a ratio of 75% to 25%. Indicator importance is examined with the built-in jackknife test [85], while all other parameters remain at their default values. Model stability is assessed through ten bootstrap replicates [86].

Model performance is evaluated with the Receiver Operating Characteristic (ROC) curve. The area under this curve (AUC) ranges from 0 to 1, where 0.5 denotes random prediction, values below 0.6 indicate poor performance, 0.6~0.8 moderate, 0.8~0.9 good, and above 0.9 excellent [87]. Outputs are exported in ASCII format and visualized in ArcGIS Pro using the Jenks natural-breaks method [88] to map waterlogging susceptibility levels distribution across the central urban area of Hefei.

3.4. Spatial Autocorrelation Analysis Method

To evaluate whether supplementary waterlogging points show spatial dependence, this study applies spatial autocorrelation analysis after model construction. Spatial autocorrelation measures the clustering or dispersion of susceptibility across space and serves both as a validation of model outputs and as support for spatial decision-making. In this study, both global and local indices are used. The global Moran’s I index assesses the overall spatial association across the study area, computed as follows:

\begin{matrix} I = \frac{n \sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i j} (x_{i} - \bar{x}) (x_{j} - \bar{x})}{\sum_{i = 1}^{n} \sum_{j = 1}^{n} α_{i j} (\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2})} \end{matrix}

(16)

where

n

is the total number of spatial grid cells,

x_{i}

represents the attribute value of cell

i

,

\bar{x}

is the mean attribute value, and

α_{i j}

is the spatial weight based on distance. The value of Moran’s I ranges from

- 1

to

+ 1

, where positive values indicate positive spatial correlation (that is, clustering), negative values indicate negative spatial correlation (that is, dispersion), and values close to zero suggest a random spatial pattern.

To identify local clusters and outliers in waterlogging susceptibility, the Local Indicators of Spatial Association (LISA), specifically the local Moran’s I index, is further applied as follows:

\begin{matrix} I_{i} = (x_{i} - \bar{x}) \sum_{j = 1}^{n} w_{i j} (x_{j} - \bar{x}) \end{matrix}

(17)

LISA can reveal patterns such as high-high clusters (hotspots), low-low clusters (coldspots), and spatial outliers (high-low or low-high). The statistical significance of these local patterns is evaluated using Monte Carlo randomization and pseudo p-value estimation.

4. Results

4.1. Results of the Supplementary Dataset Based on PI-Priors

According to the procedure described in Section 3.2, the MCCM for waterlogging risk is constructed using the parameter set

(w_{N}, w_{L}, w_{M}, w_{H}, ξ) = (0.6, 0.4, 0.2, 0.8, 0.75)

as illustrated in Figure 10. By inputting all grid cell samples into the MCCM-CRITIC-TOPSIS framework, the membership vector

μ_{i}

, the impact score

S_{i}^{(X_{i m p})}

, and the credibility score

S_{i}^{(X_{c r e})}

are calculated for each grid cell

x_{i}

. Then, based on the sample selection logic presented in Figure 9, thresholds are set to identify qualified supplementary waterlogging points using the PI-Priors.

For the threshold parameters

(τ_{μ}^{+}, τ_{μ}^{0}, ε_{I}^{+}, ε_{C}^{+}, ε_{I}^{0}, ε_{C}^{0})

shown in Figure 9, this study adopts an empirical and conservative approach. Specifically,

τ_{μ}^{0}

is set to 2,

ε_{I}^{+} = ε_{C}^{+} = 0.3

, and

ε_{I}^{0} = ε_{C}^{0} = 0.6

. The value of

τ_{μ}^{0}

reflects the assumption that samples in the positive-leaning fuzzy region

Ω_{+}^{0}

, although classified within the fuzzy region

Ω^{0}

, exhibit a stronger tendency toward the acceptance region

Ω^{+}

. The inequalities

ε_{I}^{+} < ε_{I}^{0}

and

ε_{C}^{+} < ε_{C}^{0}

indicate that samples in the positive-leaning fuzzy region

Ω_{+}^{0}

must satisfy stricter criteria for both impact and credibility than those in the weak-acceptance region

Ω_{-}^{+}

in order to be classified as qualified supplementary waterlogging points. Compared with the empirical selection of the above thresholds, a more general strategy is recommended for setting the parameter

τ_{μ}^{+}

. This approach is described as follows:

\begin{matrix} τ_{μ}^{+} = \underset{x \in Ω^{+}}{m a x} \{μ^{f} (x)\} \end{matrix}

(18)

Based on Equation (18), the parameter

τ_{μ}^{+} = 0.5959

was obtained. Following the logic rule outlined in Figure 9, a qualified supplementary waterlogging dataset

Ψ^{+}

comprising 984 additional waterlogging points was generated (Figure 11a). Among them, qualified supplementary waterlogging points are concentrated in low-lying and hydrologically vulnerable areas, while non-qualified supplementary waterlogging points are more scattered, illustrating how terrain and drainage infrastructure filter valid waterlogging events. Due to the limited spatial coverage, HWR data (Figure 11b) were mainly concentrated in the central urban area and relatively sparse elsewhere. In contrast,

Ψ^{+}

offers higher spatial resolution and broader geographic coverage. It is particularly valuable in peripheral zones or regions lacking HWR data, especially extending to peri-urban regions, where it effectively identifies additional areas of potential risk.

4.2. Results of Model Accuracy Validation and Dominant Factor Analysis

4.2.1. Model Accuracy Validation

In the MaxEnt modeling process, the inputs include species occurrence points and environmental variables. In this study, the environmental variables refer to the 19 UWSA evaluation indicators, while the species occurrence points correspond to the union of HWR and

Ψ^{+}

shown in Figure 11. The key parameter settings for the MaxEnt are listed in Table 4. After multiple rounds of experimental iteration, the ROC curve and omission curve for the waterlogging susceptibility prediction were obtained, as shown in Figure 12.

As presented in Figure 12a, the average AUC values for the training and testing datasets over 10 runs are 0.884 and 0.864, respectively, both significantly higher than the simulated value of 0.5 that represents a random distribution. Furthermore, Figure 12b demonstrates a high level of agreement between the predicted omission rate and the omission rates of both the testing and training samples. Taken together, these MaxEnt performance indicators suggest that the model has strong accuracy. It effectively explains the spatial distribution of current waterlogging points and provides reliable predictions for potential waterlogging-prone areas within the city.

4.2.2. Dominant Factor Analysis

To examine variable contributions to waterlogging susceptibility, Figure 13 shows response curves from ten MaxEnt runs. Most indicators (

N_{1}

,

S_{1}

,

I_{2}

,

I_{3}

,

I_{5}

,

I_{6}

,

B_{1}

,

B_{2}

,

B_{3}

,

B_{4}

,

B_{6}

) display limited variations, indicating stable and reliable predictions of waterlogging-prone areas. The curves also reveal both positive and negative effects. Roughness, slope, distance to stormwater pipes, and road density are positively correlated with waterlogging probability, while elevation, population density, and distances to concave-down overpasses and underpasses are negatively correlated. These patterns align with hydrological mechanisms, as areas near concave-down overpasses and underpasses are more prone to water accumulation (

I_{2}

,

I_{3}

). Other indicators, including precipitation, green space, impervious surface, and GDP, show no consistent directional trends.

To further quantify the contribution of each variable, Table 5 and Figure 14 present the percentage contributions and jackknife test results of regularized training gain, respectively. According to Table 5, the top five contributing indicators are

I_{6}

(18.9%),

B_{4}

(13.9%),

I_{4}

(13.5%),

S_{1}

(9.5%), and

I_{5}

(8.7%), totaling 64.5%. This highlights the dominant role of built environment and drainage infrastructure variables.

Permutation importance further reveals model sensitivity to each variable. The top five variables by permutation importance are

I_{3}

(22.2%),

N_{1}

(12.2%),

N_{5}

(12.0%),

I_{6}

(11.3%), and

I_{4}

(7.6%). Compared with contribution rates, permutation importance emphasizes how strongly the model responds to changes in each variable. It highlights variables that may not contribute significantly during model training but have a substantial influence on the final prediction results. For instance, although elevation contributes only 3.2%, its permutation importance reaches 12.2%, suggesting it may have a significant effect through interactions with other variables. Similarly, distance to the underpass contributes only 4.2% but ranks first in permutation importance, indicating its high sensitivity in prediction and underscoring the need for special attention in future urban drainage planning and risk monitoring. By jointly considering both contribution rate and permutation importance, we gain a more comprehensive understanding of the role each variable plays in the model. This dual perspective helps identify key intervention points for urban waterlogging mitigation and supports data-informed decision-making in planning and risk management.

In addition, Figure 14 presents jackknife test results, confirming that most variables show low training gain when used alone (

G a i n^{o}

), indicating the model primarily relies on variable combinations. However, variables like

S_{1}

,

I_{4}

,

I_{5}

, and

I_{6}

demonstrate strong independent predictive power, each with individual training gains above 0.2. Among them,

I_{4}

produces the highest gain as a single variable (0.374), but its removal results in only a small gain reduction (

Δ G a i n = G a i n^{a} - G a i n^{w} = 0.047

). Similar patterns are observed for

S_{1}

and

I_{5}

. These results highlight the strong independent explanatory power of these variables and underscore the direct influence of drainage infrastructure distribution, pipe density, and population clustering on urban waterlogging risk. It is also worth noting that some variables exhibit substantial gain reduction (

Δ G a i n

) when excluded, notably

I_{6}

,

N_{1}

, and

I_{3}

. For example, the removal of road density (

I_{6}

) causes a substantial decline in model performance, and its

G a i n^{o}

is also relatively high. This suggests that the road network not only explains spatial variation in waterlogging independently but also interacts strongly with other variables. Therefore, it can be considered a key factor.

Moreover, elevation and precipitation show only moderate gain when used individually, but cause notable model degradation when excluded. This implies that these two variables influence susceptibility primarily through interactions with urbanization-related indicators. Additionally, while

I_{3}

does not perform prominently in the only variable scenario, it ranks among the top three in terms of

Δ G a i n

, revealing a “low-contribution, high-sensitivity” profile. This suggests that underpasses should receive particular attention in the design of urban drainage systems.

4.3. Results of Susceptibility Assessment and Identification of Waterlogging Prone Areas

Based on the .asc files from ten MaxEnt runs and visualization in ArcGIS Pro, both continuous and classified susceptibility maps are shown in Figure 15. To assess the effect of incorporating PI-Priors, Figure 15 also compares results from the HWR-only strategy. Jenks’ classified maps (Figure 15b,d) were derived from continuous outputs (Figure 15a,c) using the Jenks natural breaks method with five susceptibility levels.

HWR-only predictions (Figure 15a,b) highlight high-risk zones concentrated in the old urban core, with peripheral areas showing near-zero susceptibility. In contrast, incorporating PI-Priors (Figure 15c,d) significantly changes the spatial distribution, identifying additional high-risk areas in newly developed zones, major road corridors, and dense residential districts. This effect is especially evident in the southwestern quadrant

(117.15 ~ 117.25 ° E, 31.70 ~ 31.80 ° N)

, where medium-to-high-susceptibility zones expand beyond locations near HWR, indicating potential future risks.

To further validate the feasibility and effectiveness of incorporating PI-Priors, 127 verified real-waterlogging points from recent Hefei drainage planning documents (2021–2035) and official media reports of extreme rainfall events from 2020 to 2024, including the 18 July 2020 rainstorm event (Figure 16a). Their distribution in Figure 15b,d enables comparative evaluation of HWR-only and PI-Priors-aided predictions (Figure 16b–e).

For validation points, prediction probabilities closer to 1 indicate better outcomes. As shown in Figure 16b,c, the PI-Priors-Aided MaxEnt outperforms the HWR-Only model, with a median probability of 0.7914. Figure 16d further shows that 102 of 127 validation samples fall within the [0.5, 1.0] range, nearly twice the number (53) under the HWR-Only strategy. Using Jenks’ classification, samples in the Medium-high and High levels are considered priority sites for monitoring and intervention. Under this criterion, the PI-Priors-Aided strategy correctly classified 108 of 127 validation samples, achieving an accuracy of 85.04%, compared with 51.97% for the HWR-Only model (Figure 16e). These results confirm that PI-Priors substantially enhance the accuracy and robustness of UWSA.

Urban waterlogging typically shows strong spatial and temporal patterns. While meteorological triggers vary, long-term distribution is shaped by stable factors such as elevation, depressions, drainage density, and land use. Repeated accumulation at specific sites reflects structural vulnerabilities embedded in the urban landscape. The superior performance of PI-Priors-Aided modeling stems from its capacity to capture this spatial determinism. Although rainfall events are uncertain in timing and intensity, susceptibility patterns are historically formed and structurally persistent. Thus, disasters are not random occurrences but manifestations of enduring vulnerabilities. Incorporating PI-Priors, therefore, not only improves prediction accuracy but also aligns with a theoretical understanding of disaster formation as a product of urban spatial inertia and structural rigidity.

4.4. Results of Spatial Autocorrelation Analysis

Building on the identification of high-susceptibility zones in Section 4.3, this section examines whether these patterns show spatial autocorrelation. Global Moran’s I was first calculated to test overall spatial dependence. As shown in Figure 17, the Moran’s I values are 0.68 and 0.74 for the HWR-Only and PI-Priors-Aided strategies, respectively, with p-values < 0.01. These results indicate strong, non-random clustering of waterlogging susceptibility. The higher Moran’s I under the PI-Priors-Aided strategy suggests greater spatial coherence, further supported by the denser alignment of sample points in the Moran scatterplot. This provides empirical evidence that physically based variables enhance spatial explanatory power and robustness of susceptibility mapping.

To complement the global analysis, Local Indicators of Spatial Association (LISA) were applied to reveal fine-scale clustering (High-High, Low-Low, High-Low, Low-High). In Figure 18a (HWR-Only), High-High clusters are concentrated in the urban core along the Nanfei River, while Low-Low clusters dominate peripheral areas. After incorporating PI-Priors (Figure 18b), clusters become more continuous, with new High-High zones emerging in the southwestern industrial area. Notably, risks near rivers and lakes are underestimated by the HWR-Only model, as historical records are concentrated on roadways and residential sites. During events such as the 18 July 2020 Hefei rainstorm, however, rapid overflow of the Nanfei River inundated nearby low-lying areas—dynamics better captured by the PI-Priors-Aided model.

Overall, LISA analysis supports the global Moran’s I results and highlights the spatial heterogeneity of waterlogging risks. Incorporating PI-Priors improves detection of clusters near rivers and lakes, compensating for HWR limitations and providing a more physically grounded basis for urban waterlogging susceptibility modeling.

To make these patterns actionable, each output can be read as a decision cue. The continuous susceptibility surface and its class map indicate where the city is structurally prone; agencies can start with high-probability, high-credibility cells for pre-season desilting, additional inlets, and pump staging. LISA clusters move priorities from cells to corridors and neighborhoods, helping to keep lifeline routes open and plan detours. The TWD triage (accept/defer/reject) mirrors municipal practice under uncertainty: accepted cells enter maintenance and pre-deployment lists; deferred cells receive field checks or a few temporary sensors; rejected cells are archived unless corroborated. Finally, factor importance and response curves point to where pipe upsizing, local storage, and small nature-based retrofits yield the greatest marginal benefit. Read together, these elements turn the susceptibility layer into a practical order of work rather than a static map.

5. Conclusions

This study proposed a MaxEnt-based framework that integrates HWR with PI-Priors under TWD theory to assess urban waterlogging susceptibility in Hefei. The framework vectorized raster-based simulations, applied MCCM with CRITIC-TOPSIS to calculate membership, impact, and credibility, and designed a sample selection mechanism that fuses HWR with PI-Priors for enhanced modeling. Comparative analysis then identified key contributing factors and evaluated model performance under HWR-Only and PI-Priors-Aided strategies. The main conclusions are as follows:

(1) An automated procedure was developed for extracting representative waterlogging points from physically simulated rasters using TWD. Empirical validation in Hefei showed that this approach captures physically plausible, high-risk samples that complement historical records and expand spatial coverage.

(2) The MaxEnt model identified road density (18.9%), impervious surface (13.9%), distance to stormwater drainage pipes (13.5%), population density (9.5%), and drainage pipe density (8.7%) as the five most influential factors. Susceptibility was positively associated with roughness, slope, distance to drainage pipes, and road density, while areas near concave-down overpasses and underpasses were especially vulnerable.

(3) The PI-Priors-Aided strategy markedly improved model performance, achieving 85.04% prediction accuracy on the validation dataset versus 51.97% for HWR-Only. It also enhanced the detection of waterlogging clusters near rivers and lakes, aligning predictions more closely with hydrological processes.

While the proposed framework improves accuracy and spatial robustness, it relies primarily on structured inputs such as historical records and physical simulations. Future work will incorporate big data techniques to leverage unstructured sources, including social media, online news, and other urban data streams, combined with remote sensing and geospatial layers. Coupling these crowdsourced observations with general AI models and Multi-Context Processing (MCP) will enable richer, continuously updated datasets and more generalizable susceptibility frameworks. This direction may support intelligent, real-time waterlogging risk identification and provide stronger evidence for drainage planning and emergency response.

Author Contributions

Conceptualization, G.C. and Z.X.; methodology, G.C.; software, G.C. validation, G.C., J.X. and W.G.; formal analysis, G.C.; investigation, Z.X.; resources, Z.X.; data curation, G.C.; writing—original draft preparation, G.C.; writing—review and editing, Z.X.; visualization, W.G.; supervision, C.G.K.; project administration, Z.X.; funding acquisition, Z.X. and G.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been partially supported by the National Natural Science Foundation of China (72071043), National Key Research and Development Program of China (2022YFC3803600), SEU Innovation Capability Enhancement Plan for Doctoral Students (CXJH_SEU 24094), and the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX23_0286). The authors also thank the reviewers for their thorough reviews and suggestions that helped to improve this paper.

Data Availability Statement

The dataset is available upon request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

USWA	Urban Waterlogging Susceptibility Assessment
HWR	Historical Waterlogging Records
PI-Priors	Physics-Informed Priors
PB-HydroSim	Physics-based Hydrodynamic Simulation
TWD	Three-Way Decision
MCCM	Multi-dimensional Connection Cloud Model
NCM	Normal Cloud Model
CRITIC	Criteria Importance Through Intercriteria Correlation
TOPSIS	Technique for Order Preference by Similarity to Ideal Solution

References

Yan, W.; Ren, H.; Luo, X.; Li, S. Comparison of fluvial and pluvial flood risk curves in urban cities derived from a large ensemble climate simulation dataset: A case study in Nagoya, Japan. J. Hydrol. 2020, 584, 124706. [Google Scholar] [CrossRef]
Yang, H.; Liu, L.; Zhang, J.; Guo, K.; Mudashiru, R.B.; Sabtu, N.; Abustan, I.; Balogun, W. Flood hazard mapping methods: A review. J. Hydrol. 2021, 603, 126846. [Google Scholar] [CrossRef]
Zanuttigh, B.; Nicholls, R.J.; Vanderlinden, J.-P.; Thompson, R.C.; Burcharth, H.F. Coastal Risk Management in a Changing Climate; Butterworth-Heinemann: Oxford, UK, 2014; ISBN 978-0-12-397331-3. [Google Scholar]
Jiang, J.; Qin, C.-Z.; Yu, J.; Cheng, C.; Liu, J.; Huang, J.; Hong, S.; Shen, J.; Yang, H.; Pirasteh, S.; et al. Enhancing flexibility and efficiency for urban waterlogging response scenarios simulation: An open-ended approach involving user participation. Int. J. Digit. Earth 2025, 18, 2468420. [Google Scholar] [CrossRef]
Dharmarathne, G.; Waduge, A.O.; Bogahawaththa, M.; Rathnayake, U.; Meddage, D.P.P. Adapting cities to the surge: A comprehensive review of climate-induced urban flooding. Results Eng. 2024, 22, 102123. [Google Scholar] [CrossRef]
Zhang, Z.; Jian, X.; Chen, Y.; Huang, Z.; Liu, J.; Yang, L.; Yao, K.; Su, H.; Torrenti, J.; Wen, Z.; et al. Urban waterlogging prediction and risk analysis based on rainfall time series features: A case study of Shenzhen. Front. Environ. Sci. 2023, 11, 1131954. [Google Scholar] [CrossRef]
Wang, M.; Liang, H.; Zhu, Z.; Wu, H.; Xu, F.; Koo, K.; Brownjohn, J.; Jiang, Y.; Zevenbergen, C.; Ma, Y. Urban pluvial flooding and stormwater management: A contemporary review of China’s challenges and “sponge cities” strategy. Environ. Sci. Policy 2018, 80, 132–143. [Google Scholar] [CrossRef]
Jiang, Y.; Wang, J.; Shen, X.; Dai, K.; Chen, Y.; Wang, D.; Zhang, L.; Guo, H.; Ma, J.; Gao, W. Flood risk assessment of Wuhan, China, using a multi-criteria analysis model with the improved AHP-Entropy method. Environ. Sci. Pollut. Res. 2023, 30, 96001–96018. [Google Scholar] [CrossRef]
Hu, Y.; Zhang, H.; Hou, Y.; Liu, P.; Zeng, B.; Huang, G.; Chen, W.; Wei, K.; Ouyang, C.; Duan, H.; et al. Reflections on the Catastrophic 2020 Yangtze River Basin Flooding in Southern China. Innovation 2020, 1, 100038. [Google Scholar] [CrossRef]
Zhao, X.; Li, H.; Cai, Q.; Pan, Y.; Qi, Y. Managing Extreme Rainfall and Flooding Events: A Case Study of the 20 July 2021 Zhengzhou Flood in China. Climate 2023, 11, 228. [Google Scholar] [CrossRef]
Yan, W.; Ren, H.; Luo, X.; Li, S. Multi-Dimensional Urban Flooding Impact Assessment Leveraging Social Media Data: A Case Study of the 2020 Guangzhou Rainstorm. Water 2023, 15, 4296. [Google Scholar] [CrossRef]
Zhao, D.; Xu, H.; Li, Y.; Yu, Y.; Duan, Y.; Xu, X.; Chen, L. Locally opposite responses of the 2023 Beijing–Tianjin–Hebei extreme rainfall event to global anthropogenic warming. Npj Clim. Atmos. Sci. 2024, 7, 38. [Google Scholar] [CrossRef]
Domínguez-Cuesta, M.J. Susceptibility. In Encyclopedia of Natural Hazards; Bobrowsky, P.T., Ed.; Springer Netherlands: Dordrecht, The Netherlands, 2013; p. 988. ISBN 978-1-4020-4399-4. [Google Scholar]
Lacasse, S. Risk Assessment. In Encyclopedia of Natural Hazards; Bobrowsky, P.T., Ed.; Springer Netherlands: Dordrecht, The Netherlands, 2013; pp. 862–863. ISBN 978-1-4020-4399-4. [Google Scholar]
Cutter, S.L. Vulnerability. In Encyclopedia of Natural Hazards; Bobrowsky, P.T., Ed.; Springer Netherlands: Dordrecht, The Netherlands, 2013; pp. 1088–1090. ISBN 978-1-4020-4399-4. [Google Scholar]
Porter, K. A Beginner’s Guide to Fragility, Vulnerability, and Risk. In Encyclopedia of Earthquake Engineering; Beer, M., Kougioumtzoglou, I.A., Patelli, E., Au, I.S.-K., Eds.; Springer Berlin Heidelberg: Berlin, Heidelberg, 2021; pp. 235–260. ISBN 978-3-642-36197-5. [Google Scholar]
Xu, L.; Ma, A. Coarse-to-fine waterlogging probability assessment based on remote sensing image and social media data. Geo-Spat. Inf. Sci. 2021, 24, 279–301. [Google Scholar] [CrossRef]
Antwi-Agyakwa, K.T.; Afenyo, M.K.; Angnuureng, D.B. Know to Predict, Forecast to Warn: A Review of Flood Risk Prediction Tools. Water 2023, 15, 427. [Google Scholar] [CrossRef]
Zhang, M.; Xu, M.; Wang, Z.; Lai, C. Assessment of the vulnerability of road networks to urban waterlogging based on a coupled hydrodynamic model. J. Hydrol. 2021, 603, 127105. [Google Scholar] [CrossRef]
Yao, Y.; Dai, P.; Wang, H.; Han, Q.; Li, J.; Song, H.; Gu, Z.; Wang, L.; Guo, Y.; Li, Q.; et al. Method for analyzing urban waterlogging mechanisms based on a 1D–2D water environment dynamic bidirectional coupling model. J. Environ. Manag. 2024, 360, 121024. [Google Scholar] [CrossRef]
Das, B.; Ray, T.K.; Boral, E. Identification of urban waterlogging risk zones using Analytical Hierarchy Process (AHP): A case of Agartala city. Environ. Monit. Assess. 2025, 197, 322. [Google Scholar] [CrossRef]
Yuan, D.; Xue, H.; Du, M.; Pang, Y.; Wang, J.; Wang, C.; Song, X.; Wang, S.; Kou, Y. Urban waterlogging resilience assessment based on combination weight and cloud model: A case study of Haikou. Environ. Impact Asses. 2025, 111, 107728. [Google Scholar] [CrossRef]
Li, H.; Wang, Q.; Li, M.; Zang, X.; Wang, Y. Identification of urban waterlogging indicators and risk assessment based on MaxEnt Model: A case study of Tianjin Downtown. Ecol. Indic. 2024, 158, 111354. [Google Scholar] [CrossRef]
Wang, Z.; Lai, C.; Chen, X.; Yang, B.; Zhao, S.; Bai, X.; Jeon, G.; Han, K.; Yoon, H.; Song, W.; et al. Flood hazard risk assessment model based on random forest. J. Hydrol. 2015, 527, 1130–1141. [Google Scholar] [CrossRef]
Youssef, A.M.; Pourghasemi, H.R.; Mahdi, A.M.; Matar, S.S. Flood vulnerability mapping and urban sprawl suitability using FR, LR, and SVM models. Environ. Sci. Pollut. Res. 2023, 30, 16081–16105. [Google Scholar] [CrossRef]
Zeng, B.; Huang, G.; Chen, W. Research progress and prospects of urban flooding simulation: From traditional numerical models to deep learning approaches. Environ. Model. Softw. 2025, 183, 106213. [Google Scholar] [CrossRef]
Yoshimura, N.; Hiura, T.; Yu, X.; Wu, K.; Yang, Y.; Liu, Q. WaRENet: A Novel Urban Waterlogging Risk Evaluation Network. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–28. [Google Scholar] [CrossRef]
Zhao, C.; Liu, C.; Li, W.; Tang, Y.; Yang, F.; Xu, Y.; Quan, L.; Hu, C. Simulation of Urban Flood Process Based on a Hybrid LSTM-SWMM Model. Water Resour. Manag. 2023, 37, 5171–5187. [Google Scholar] [CrossRef]
Zhang, Z.; Jian, X.; Chen, Y.; Huang, Z.; Liu, J.; Yang, L.; Yao, K.; Su, H.; Torrenti, J.; Wen, Z.; et al. Real-Time Waterlogging Monitoring on Urban Roads Using Edge Computing. Water Resour. Manag. 2025, 39, 5273–5287. [Google Scholar] [CrossRef]
Beijing Water Authority. Waterlogging Risk Map for Suburban New Towns. 2024. Available online: https://swj.beijing.gov.cn/swdt/tzgg/202406/t20240627_3730100.html (accessed on 3 June 2025).
Huang, F.; Zhu, D.; Zhang, Y.; Zhang, J.; Wang, N.; Dong, Z. Urban Flooding Disaster Risk Assessment Utilizing the MaxEnt Model and Game Theory: A Case Study of Changchun, China. Sustainability 2024, 16, 8696. [Google Scholar] [CrossRef]
Yan, M.; Yang, J.; Ni, X.; Liu, K.; Wang, Y.; Xu, F. Urban waterlogging susceptibility assessment based on hybrid ensemble machine learning models: A case study in the metropolitan area in Beijing, China. J. Hydrol. 2024, 630, 130695. [Google Scholar] [CrossRef]
Zhou, S.; Xu, Z.; Zhang, Q.; Yu, P.; Jiang, M.; Li, J.; Yang, M. Rainstorm-induced flood risk assessment in developed urban area using a data-driven approach with watershed units. Sci. Total. Environ. 2024, 946, 174135. [Google Scholar] [CrossRef]
Qi, X.; Khu, S.-T.; Yu, P.; Liu, Y.; Wang, M. Integrating machine learning with the Minimum Cumulative Resistance Model to assess the impact of urban land use on road waterlogging risk. J. Hydrol. 2025, 654, 132842. [Google Scholar] [CrossRef]
Wang, M.; Zhang, Y.; Bakhshipour, A.E.; Liu, M.; Rao, Q.; Lu, Z. Designing coupled LID–GREI urban drainage systems: Resilience assessment and decision-making framework. Sci. Total. Environ. 2022, 834, 155267. [Google Scholar] [CrossRef]
Tang, X.; Wu, Z.; Liu, W.; Tian, J.; Liu, L. Exploring effective ways to increase reliable positive samples for machine learning-based urban waterlogging susceptibility assessments. J. Environ. Manag. 2023, 344, 118682. [Google Scholar] [CrossRef]
Zhang, Q.; Wu, Z.; Zhang, H.; Dalla Fontana, G.; Tarolli, P. Identifying dominant factors of waterlogging events in metropolitan coastal cities: The case study of Guangzhou, China. J. Environ. Manag. 2020, 271, 110951. [Google Scholar] [CrossRef]
Wang, M.; Liu, M.; Zhang, D.; Zhang, Y.; Su, J.; Zhou, S.; Bakhshipour, A.E.; Tan, S.K. Assessing hydrological performance for optimized integrated grey-green infrastructure in response to climate change based on shared socio-economic pathways. Sustain. Cities Soc. 2023, 91, 104436. [Google Scholar] [CrossRef]
Abd-Elaty, I.; Kuriqi, A.; Pugliese, L.; Zelenakova, M.; El Shinawi, A. Mitigation of urban waterlogging from flash floods hazards in vulnerable watersheds. J. Hydrol. Reg. Stud. 2023, 47, 101429. [Google Scholar] [CrossRef]
Wu, M.; Wei, X.; Ge, W.; Chen, G.; Zheng, D.; Zhao, Y.; Chen, M.; Xin, Y. Analyzing the spatial scale effects of urban elements on urban flooding based on multiscale geographically weighted regression. J. Hydrol. 2024, 645, 132178. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, M.; Zhang, D.; Lu, Z.; Bakhshipour, A.E.; Liu, M.; Jiang, Z.; Li, J.; Tan, S.K. Multi-stage planning of LID-GREI urban drainage systems in response to land-use changes. Sci. Total Environ. 2023, 859, 160214. [Google Scholar] [CrossRef]
Alfieri, L.; Bisselink, B.; Dottori, F.; Naumann, G.; de Roo, A.; Salamon, P.; Wyser, K.; Feyen, L. Global projections of river flood risk in a warmer world. Earth’s Future 2017, 5, 171–182. [Google Scholar] [CrossRef]
Zhang, W.; Villarini, G.; Vecchi, G.A.; Smith, J.A. Urbanization exacerbated the rainfall and flooding caused by hurricane Harvey in Houston. Nature 2018, 563, 384–388. [Google Scholar] [CrossRef]
Sun, S.; Zhai, J.; Li, Y.; Huang, D.; Wang, G. Urban waterlogging risk assessment in well-developed region of Eastern China. Phys. Chem. Earth Parts A/B/C 2020, 115, 102824. [Google Scholar] [CrossRef]
Liao, X.; Xu, W.; Zhang, J.; Qiao, Y.; Meng, C. Analysis of affected population vulnerability to rainstorms and its induced floods at county level: A case study of Zhejiang Province, China. Int. J. Disaster Risk Reduct. 2022, 75, 102976. [Google Scholar] [CrossRef]
Mishra, K.; Sinha, R. Flood risk assessment in the Kosi megafan using multi-criteria decision analysis: A hydro-geomorphic approach. Geomorphology 2020, 350, 106861. [Google Scholar] [CrossRef]
Wang, M.; Fu, X.; Zhang, D.; Lou, S.; Li, J.; Chen, F.; Li, S.; Tan, S.K. Urban agglomeration waterlogging hazard exposure assessment based on an integrated Naive Bayes classifier and complex network analysis. Nat. Hazards 2023, 118, 2173–2197. [Google Scholar] [CrossRef]
Zhou, X.; Bai, Z.; Yang, Y. Linking trends in urban extreme rainfall to urban flooding in China. Int. J. Clim. 2017, 37, 4586–4593. [Google Scholar] [CrossRef]
Yan, Y.; Zhang, H.; Zhang, N.; Feng, C. Development of an Integrated Urban Flood Model and Its Application in a Concave-Down Overpass Area. Remote Sens. 2024, 16, 1650. [Google Scholar] [CrossRef]
Lin, J.; He, P.; Yang, L.; He, X.; Lu, S.; Liu, D. Predicting future urban waterlogging-prone areas by coupling the maximum entropy and FLUS model. Sustain. Cities Soc. 2022, 80, 103812. [Google Scholar] [CrossRef]
Liu, Y.; Zhao, W.; Wei, Y.; Sebastian, F.S.M.; Wang, M. Urban waterlogging control: A novel method to urban drainage pipes reconstruction, systematic and automated. J. Clean. Prod. 2023, 418, 137950. [Google Scholar] [CrossRef]
Ding, Y.; Wang, H.; Liu, Y.; Lei, X. Urban waterlogging structure risk assessment and enhancement. J. Environ. Manag. 2024, 352, 120074. [Google Scholar] [CrossRef]
Li, L.; Zhang, Z.; Qi, X.; Zhao, X.; Hu, W.; Cai, R. Spatiotemporal Urban Waterlogging Risk Assessment Incorporating Human and Vehicle Distribution. Water 2023, 15, 3452. [Google Scholar] [CrossRef]
Song, Y.; Guo, L.; Wang, C.; Zhu, J.; Li, Z. Urban road waterlogging multi-level assessment integrated flood models and road network models. Transp. Res. Part. D Transp. Environ. 2024, 133, 104305. [Google Scholar] [CrossRef]
Han, F.; Yu, J.; Zhou, G.; Li, S.; Sun, T. A comparative study on urban waterlogging susceptibility assessment based on multiple data-driven models. J. Environ. Manag. 2024, 360, 121166. [Google Scholar] [CrossRef]
Zou, B.; Nie, Y.; Liu, R.; Wang, M.; Li, J.; Fan, C.; Zhou, X. Assessing the Impact of Urban Morphologies on Waterlogging Risk Using a Spatial Weight Naive Bayes Model and Local Climate Zones Classification. Water 2025, 16, 2464. [Google Scholar] [CrossRef]
Wang, X.; Zhang, Z.; Hu, W.; Zhao, X.; Qi, X.; Cai, R. Vulnerability Assessment and Future Prediction of Urban Waterlogging—A Case Study of Fuzhou. Water 2023, 15, 4025. [Google Scholar] [CrossRef]
Li, J.; Zhang, H.; Zhang, X.; Wang, W. Establishment and Application of a Specialized Physical Examination Indicator System for Urban Waterlogging Risk in China. Sustainability 2023, 15, 4998. [Google Scholar] [CrossRef]
Wang, M.; Li, Y.; Yuan, H.; Zhou, S.; Wang, Y.; Adnan Ikram, R.M.; Li, J. An XGBoost-SHAP approach to quantifying morphological impact on urban flooding susceptibility. Ecol. Indic. 2023, 156, 111137. [Google Scholar] [CrossRef]
Duan, C.; Zhang, J.; Chen, Y.; Lang, Q.; Zhang, Y.; Wu, C.; Zhang, Z. Comprehensive Risk Assessment of Urban Waterlogging Disaster Based on MCDA-GIS Integration: The Case Study of Changchun, China. Remote Sens. 2022, 14, 3101. [Google Scholar] [CrossRef]
Ma, F.; Ao, Y.; Wang, X.; He, H.; Liu, Q.; Yang, D.; Gou, H. Assessing and enhancing urban road network resilience under rainstorm waterlogging disasters. Transp. Res. Part. D Transp. Environ. 2023, 123, 103928. [Google Scholar] [CrossRef]
Liu, Y.; Chen, B.; Duan, C.; Wang, H. Economic loss of urban waterlogging based on an integrated drainage model and network environ analyses. Resour. Conserv. Recycl. 2023, 192, 106923. [Google Scholar] [CrossRef]
Li, G.; Shao, W.; Su, X.; Li, Y.; Zhang, Y.; Song, T. Urban Flood Hazard Assessment Based on Machine Learning Model. Water Resour. Manag. 2025, 39, 1953–1970. [Google Scholar] [CrossRef]
Yao, L.; Chen, L.; Wei, W.; Sun, R. Potential reduction in urban runoff by green spaces in Beijing: A scenario analysis. Urban For. Urban Green. 2015, 14, 300–308. [Google Scholar] [CrossRef]
Wang, Y.; Zhai, J.; Gao, G.; Liu, Q.; Song, L. Risk assessment of rainstorm disasters in the Guangdong–Hong Kong–Macao greater Bay area of China during 1990–2018. Geomat. Nat. Hazards Risk 2022, 13, 267–288. [Google Scholar] [CrossRef]
Shi, Q.; Liu, M.; Marinoni, A.; Liu, X. UGS-1m: Fine-grained urban green space mapping of 31 major cities in China based on the deep learning framework. Earth Syst. Sci. Data 2023, 15, 555–577. [Google Scholar] [CrossRef]
Wang, Z.; Ma, C.; Zhang, Y.; Hu, B.; Xu, S.; Dai, Z. Assessment of urban flooding vulnerability based on AHP-PSR model: A case study in Jining City, China. Geocarto Int. 2023, 38, 2252777. [Google Scholar] [CrossRef]
Ding, Y.; Wang, H.; Liu, Y.; Chai, B.; Bin, C. The spatial overlay effect of urban waterlogging risk and land use value. Sci. Total Environ. 2024, 947, 174290. [Google Scholar] [CrossRef]
Yang, J.; Huang, X. The 30m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 2021, 13, 3907–3925. [Google Scholar] [CrossRef]
Yao, Y. Three-way decision and granular computing. Int. J. Approx. Reason. 2018, 103, 107–123. [Google Scholar] [CrossRef]
Wang, M.; Wang, X.; Liu, Q.; Shen, F.; Jin, J.; Zhang, H.; Hu, Y.; Wang, Q.; Qian, Z.; Liu, P. A novel multi-dimensional cloud model coupled with connection numbers theory for evaluation of slope stability. Appl. Math. Model. 2020, 77, 426–438. [Google Scholar] [CrossRef]
Li, D.; Liu, C.; Gan, W. A new cognitive model: Cloud model. Int. J. Intell. Syst. 2009, 24, 357–375. [Google Scholar] [CrossRef]
Xiang, W.; Yang, X.; Babuna, P.; Bian, D. Development, Application and Challenges of Set Pair Analysis in Environmental Science from 1989 to 2020: A Bibliometric Review. Sustainability 2022, 14, 153. [Google Scholar] [CrossRef]
Chen, G.; Wang, S.; Ran, Y.; Cao, X.; Fang, Z.; Xu, Z. Intelligent monitoring and quantitative evaluation of fire risk in subway construction: Integration of multi- source data fusion, FTA, and deep learning. J. Clean. Prod. 2024, 478, 143832. [Google Scholar] [CrossRef]
Chen, G.; Liang, Y.; Li, S.; Xu, Z. A novel gradient descent optimizer based on fractional order scheduler and its application in deep neural networks. Appl. Math. Model. 2024, 128, 26–57. [Google Scholar] [CrossRef]
Huang, Z.; Chen, G.; Jiang, Z. Assessing L2 writing formality using syntactic complexity indices: A fuzzy evaluation approach. Assess. Writ. 2025, 66, 100973. [Google Scholar] [CrossRef]
Yin, J.; Yu, D.; Lin, N.; Wilby, R.L. Evaluating the cascading impacts of sea level rise and coastal flooding on emergency response spatial accessibility in Lower Manhattan, New York City. J. Hydrol. 2017, 555, 648–658. [Google Scholar] [CrossRef]
Coles, D.; Yu, D.; Wilby, R.L.; Green, D.; Herring, Z. Beyond ‘flood hotspots’: Modelling emergency service accessibility during flooding in York, UK. J. Hydrol. 2017, 546, 419–436. [Google Scholar] [CrossRef]
Green, D.; Yu, D.; Pattison, I.; Wilby, R.; Bosher, L.; Patel, R.; Thompson, P.; Trowell, K.; Draycon, J.; Halse, M.; et al. City-scale accessibility of emergency responders operating during flood events. Nat. Hazards Earth Syst. Sci. 2017, 17, 1–16. [Google Scholar] [CrossRef]
Diakoulaki, D.; Mavrotas, G.; Papayannakis, L. Determining objective weights in multiple criteria problems: The critic method. Comput. Oper. Res. 1995, 22, 763–770. [Google Scholar] [CrossRef]
Deng, H.; Yeh, C.-H.; Willis, R.J. Inter-company comparison using modified TOPSIS with objective weights. Comput. Oper. Res. 2000, 27, 963–973. [Google Scholar] [CrossRef]
Yan, W.; Ren, H.; Luo, X.; Li, S. A modified TOPSIS with a different ranking index. Eur. J. Oper. Res. 2017, 260, 152–160. [Google Scholar] [CrossRef]
Phillips, S.J.; Anderson, R.P.; Schapire, R.E. Maximum entropy modeling of species geographic distributions. Ecol. Model. 2006, 190, 231–259. [Google Scholar] [CrossRef]
Elith, J.; Phillips, S.J.; Hastie, T.; Dudík, M.; Chee, Y.E.; Yates, C.J. A statistical explanation of MaxEnt for ecologists. Divers. Distrib. 2011, 17, 43–57. [Google Scholar] [CrossRef]
Mahatara, D.; Acharya, A.; Dhakal, B.; Sharma, D.; Ulak, S.; Paudel, P. Maxent modelling for habitat suitability of vulnerable tree Dalbergia latifolia in Nepal. Silva Fenn. 2021, 55, 1–17. [Google Scholar] [CrossRef]
Pearson, R.G.; Raxworthy, C.J.; Nakamura, M.; Townsend Peterson, A. ORIGINAL ARTICLE: Predicting species distributions from small numbers of occurrence records: A test case using cryptic geckos in Madagascar. J. Biogeogr. 2007, 34, 102–117. [Google Scholar] [CrossRef]
Wu, L.; Liu, Y.; Zhang, J.; Zhang, B.; Wang, Z.; Tong, J.; Li, M.; Zhang, A.; Diakoulaki, D.; Mavrotas, G.; et al. Demand and supply of cultural ecosystem services: Use of geotagged photos to map the aesthetic value of landscapes in Hokkaido. Ecosyst. Serv. 2017, 24, 68–78. [Google Scholar] [CrossRef]
Chen, J.; Yang, S.T.; Li, H.W.; Zhang, B.; Lv, J.R. Research on Geographical Environment Unit Division Based on the Method of Natural Breaks (Jenks). Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2013, 4, 47–50. [Google Scholar] [CrossRef]

Figure 1. Location and topographic map of the urban study area of Hefei downtown, Anhui province, China.

Figure 2. Spatial data distribution of UWSA indicators and waterlogging susceptibility risk.

Figure 3. Raster data and vectorization workflow of waterlogging risk in major drainage zones of Hefei City.

Figure 4. Urban waterlogging susceptibility assessment framework.

Figure 5. Multicollinearity diagnostics based on VIF, Condition Index, Eigenvalues, and Pearson Correlation: (a) VIF and Condition Index for each variable, VIF < 10 and Condition Index < 30 indicate that most variables exhibit the minimum collinearity, (b) scree plot of eigenvalues, the eigenvalues of the first six components are greater than 1, meaning that there is almost no redundant information among most predictors, (c) Pearson correlation heatmap, color reflects the correlation within the group.

Figure 6. The methodological workflow for constructing supplementary datasets.

Figure 7. Spatial heterogeneity of waterlogging risk within mesh grid cells.

Figure 8. The intervals of the correction factors

{\hat{S}}_{k}

and

{\hat{H}}_{k}

are determined using linear programming: (a) Safety factor, (b) Hazard factor, (c) the modified ranges of

{\hat{S}}_{k}

and

{\hat{H}}_{k}

.

Figure 8. The intervals of the correction factors

{\hat{S}}_{k}

and

{\hat{H}}_{k}

are determined using linear programming: (a) Safety factor, (b) Hazard factor, (c) the modified ranges of

{\hat{S}}_{k}

and

{\hat{H}}_{k}

.

Figure 9. Logical flowchart for high-risk waterlogging point selection based on TWD theory and PI-Priors.

Figure 10. Urban waterlogging risk connection cloud model based on PI-Priors.

Figure 11. Distribution of HWR and qualified supplementary waterlogging points: (a) qualified (

Ψ^{+}

) and non-qualified (

Ψ^{-}

) supplementary waterlogging points, (b) historical waterlogging records (HWR).

Figure 11. Distribution of HWR and qualified supplementary waterlogging points: (a) qualified (

Ψ^{+}

) and non-qualified (

Ψ^{-}

) supplementary waterlogging points, (b) historical waterlogging records (HWR).

Figure 12. Average ROC curve and omission–predicted area curve of the MaxEnt model: (a) average ROC curve showing sensitivity vs. 1–specificity, (b) average omission and predicted area across cumulative thresholds.

Figure 13. Response curves of different USWA indicators.

Figure 14. Jackknife of regularized training gain for waterlogging.

Figure 15. Comparison of UWSA predictions based on MaxEnt: (a) Cloglog probability from HWR-Only strategy, (b) Jenks classification from HWR-Only strategy, (c) Cloglog probability from PI-Priors-Aided strategy, (d) Jenks classification from PI-Priors-Aided strategy.

Figure 16. Comparison of waterlogging susceptibility predictions under PI-Priors-Aided and HWR-Only strategies: (a) validation waterlogging point dataset map, (b) sorted predicted probabilities, (c) boxplot of predicted probabilities, (d) probability prediction value distribution, (e) sample count by susceptibility level.

Figure 17. Global spatial autocorrelation analysis results using Moran’s I statistic: (a) Moran scatterplot under HWR-Only strategy (Moran’s I = 0.68), (b) Moran scatterplot under PI-Priors-Aided strategy (Moran’s I = 0.74).

Figure 18. LISA cluster maps for waterlogging susceptibility: (a) LISA cluster map under HWR-Only strategy, (b) LISA cluster map under PI-Priors-Aided strategy.

Table 1. Index system for UWSA.

Dimension	Indicator	Reference	Data Source
Natural condition	Elevation ( $N_{1}$ )	[33,34]	https://www.rivermap.cn/home/mapdata.html (accessed on 15 July 2025)
	Roughness ( $N_{2}$ )	[35,36]	/
	Relief ( $N_{3}$ )	[37,38]	/
	Slope ( $N_{4}$ )	[39,40]	/
	Precipitation ( $N_{5}$ )	[19,41]	https://gre.geodata.cn/ (accessed on 21 June 2025)
Social capital	Population density ( $S_{1}$ )	[42,43]	https://landscan.ornl.gov/ (accessed on 23 June 2025)
Social capital	GDP ( $S_{2}$ )	[44,45]	https://www.resdc.cn/doi/doi.aspx?DOIid=33 (accessed on 25 June 2025)
Infrastructure	Distance to overpass ( $I_{1}$ )	[46,47]	Gaode map
	Distance to concave-down overpass ( $I_{2}$ )	[48,49]	Plan 2020
	Distance to underpass ( $I_{3}$ )	[50]	Plan 2020
	* Distance to main stormwater pipe ( $I_{4}$ )	[51,52]	Plan 2020
	* Density of main stormwater pipe ( $I_{5}$ )	[51,52]	Plan 2020
	Road density ( $I_{6}$ )	[53,54]	https://www.rivermap.cn/home/mapdata.html (accessed on 8 July 2025)
Built environment	Distance to surface water system ( $B_{1}$ )	[55,56]	/
	Building density ( $B_{2}$ )	[22,57]	https://www.rivermap.cn/home/mapdata.html (accessed on 8 July 2025)
	Building height ( $B_{3}$ )	[58,59]	https://www.rivermap.cn/home/mapdata.html (accessed on 8 July 2025)
	Impervious surface ( $B_{4}$ )	[60,61]	https://zenodo.org/records/12779975 (accessed on 9 July 2025)
	Water density ( $B_{5}$ )	[62,63]	https://zenodo.org/records/12779975 (accessed on 10 July 2025)
	Green space ( $B_{6}$ )	[64,65]	https://doi.org/10.57760/sciencedb.07049 (accessed on 12 July 2025) [66]
	Land use ( $B_{7}$ )	[67,68]	https://zenodo.org/records/12779975 (accessed on 8 July 2025)

* Note: It should be noted that due to historical construction practices, the old urban areas of Hefei have adopted a combined sewer system in which both stormwater and wastewater share the same drainage infrastructure. As a result, the statistics on stormwater pipelines in these areas also include a portion of the wastewater pipes. Additionally, all density and length-related indicators are expressed in units of (m/km²) and (m²/km²), respectively.

Table 2. The initial urban waterlogging risk assessment criterion based on PI-Priors.

State	Safety Factor $S_{k} = P_{k}^{N} + P_{k}^{L}$	Hazard Factor $H_{k} = P_{k}^{M} + P_{k}^{H}$
Waterlogging-prone area	$0 \leq S_{k} \leq ξ$	$1 - ξ < H_{k} \leq 1$
Waterlogging-safe area	$ξ < S_{k} \leq$ 1	$0 < H_{k} \leq 1 - ξ$

Table 3. A modified urban waterlogging risk assessment criterion based on PI-Priors.

State	Modified Safety Factor ${\hat{S}}_{k}$	Modified Hazard Factor ${\hat{H}}_{k}$
Waterlogging-prone area	$0 \leq {\hat{S}}_{k} < w_{L} \cdot ξ$	$w_{H} \cdot (1 - ξ) \leq {\hat{H}}_{k} \leq w_{H}$
Waterlogging-fuzzy area	$w_{L} \cdot ξ \leq {\hat{S}}_{k} < w_{N} \cdot ξ$	$w_{M} \cdot (1 - ξ) \leq {\hat{H}}_{k} < w_{H} \cdot (1 - ξ)$
Waterlogging-safe area	$w_{N} \cdot ξ \leq {\hat{S}}_{k} \leq w_{N}$	$0 \leq {\hat{H}}_{k} < w_{M} \cdot (1 - ξ)$

Table 4. MaxEnt model parameter settings.

Random Seed	Random Test Percentage	Replicates	Maximum Iterations	Replicated Run Type	Output Format
True	25%	10	1000	Bootstrap	Cloglog

Table 5. Analysis of each index contributions.

Dimension	Index	Percent Contribution (%)	Permutation Importance (%)
Natural condition	Elevation ( $N_{1}$ )	3.2	12.2
	Roughness ( $N_{2}$ )	0.4	0.8
	Relief ( $N_{3}$ )	2.9	2.8
	Slope ( $N_{4}$ )	1.9	1.6
	Precipitation ( $N_{5}$ )	6.6	12.0
Social capital	Population density ( $S_{1}$ )	9.5	5.1
Social capital	GDP ( $S_{2}$ )	4.7	5.1
Infrastructure	Distance to overpass ( $I_{1}$ )	1.5	2.5
	Distance to concave-down overpass ( $I_{2}$ )	1.1	2.5
	Distance to underpass ( $I_{3}$ )	4.2	22.2
	Distance to stormwater drainage pipe ( $I_{4}$ )	13.5	7.6
	Density of stormwater drainage pipe ( $I_{5}$ )	8.7	0.7
	Road density ( $I_{6}$ )	18.9	11.3
Built environment	Distance to surface water system ( $B_{1}$ )	2.7	3.5
	Building density ( $B_{2}$ )	2.2	2.3
	Building height ( $B_{3}$ )	2.3	1.5
	Impervious surface ( $B_{4}$ )	13.9	2.6
	Water density ( $B_{5}$ )	0.5	1
	Green space ( $B_{6}$ )	1.3	2.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, G.; Guan, W.; Xu, J.; Koh, C.G.; Xu, Z. Fusing Historical Records and Physics-Informed Priors for Urban Waterlogging Susceptibility Assessment: A Framework Integrating Machine Learning, Fuzzy Evaluation, and Decision Analysis. Appl. Sci. 2025, 15, 10604. https://doi.org/10.3390/app151910604

AMA Style

Chen G, Guan W, Xu J, Koh CG, Xu Z. Fusing Historical Records and Physics-Informed Priors for Urban Waterlogging Susceptibility Assessment: A Framework Integrating Machine Learning, Fuzzy Evaluation, and Decision Analysis. Applied Sciences. 2025; 15(19):10604. https://doi.org/10.3390/app151910604

Chicago/Turabian Style

Chen, Guangyao, Wenxin Guan, Jiaming Xu, Chan Ghee Koh, and Zhao Xu. 2025. "Fusing Historical Records and Physics-Informed Priors for Urban Waterlogging Susceptibility Assessment: A Framework Integrating Machine Learning, Fuzzy Evaluation, and Decision Analysis" Applied Sciences 15, no. 19: 10604. https://doi.org/10.3390/app151910604

APA Style

Chen, G., Guan, W., Xu, J., Koh, C. G., & Xu, Z. (2025). Fusing Historical Records and Physics-Informed Priors for Urban Waterlogging Susceptibility Assessment: A Framework Integrating Machine Learning, Fuzzy Evaluation, and Decision Analysis. Applied Sciences, 15(19), 10604. https://doi.org/10.3390/app151910604

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fusing Historical Records and Physics-Informed Priors for Urban Waterlogging Susceptibility Assessment: A Framework Integrating Machine Learning, Fuzzy Evaluation, and Decision Analysis

Abstract

Featured Application

Abstract

1. Introduction

2. Research Area and Data Collection

2.1. Overview of the Study Area

2.2. UWSA Indicator System

2.3. Indicator Data Collection

3. Materials and Methods

3.1. Dominant Factor Identification and Correlation Analysis

3.2. Supplementary Waterlogging Dataset Construction Using PI-Priors

3.2.1. Membership Degree Quantification Based on the 2-D Connection Cloud Model

3.2.2. Impact and Credibility Score Quantification Based on the CRITIC-TOPSIS Method

Step 1. Indicator Direction Alignment and Normalization Mapping

Step 2. Indicator Weight Calculation Based on the CRITIC Method

Step 3. Composite Scoring Using the TOPSIS Method

3.2.3. Selection of Supplementary Waterlogging Points Based on TWD Theory

3.3. MaxEnt-Based Modeling for UWSA

3.3.1. MaxEnt Principles and Methods

3.3.2. Parameter Setting and Model Construction of MaxEnt

3.4. Spatial Autocorrelation Analysis Method

4. Results

4.1. Results of the Supplementary Dataset Based on PI-Priors

4.2. Results of Model Accuracy Validation and Dominant Factor Analysis

4.2.1. Model Accuracy Validation

4.2.2. Dominant Factor Analysis

4.3. Results of Susceptibility Assessment and Identification of Waterlogging Prone Areas

4.4. Results of Spatial Autocorrelation Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI