An Explainable Ensemble Machine Learning Framework for Flood Susceptibility Mapping Using Social Media Data: A Case Study of Guangzhou, China

Zhou, Yuhan; Lu, Haipeng; Liu, Sicen; Zhang, Shuliang

doi:10.3390/rs18101495

Open AccessArticle

An Explainable Ensemble Machine Learning Framework for Flood Susceptibility Mapping Using Social Media Data: A Case Study of Guangzhou, China

by

Yuhan Zhou

^1,2,

Haipeng Lu

^1,2,*,

Sicen Liu

^1,2 and

Shuliang Zhang

^1,2,3

¹

Key Laboratory of VGE of Ministry of Education, Nanjing Normal University, Nanjing 210023, China

²

Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application, Nanjing 210023, China

³

State Key Laboratory of Climate System Prediction and Risk Management, Nanjing Normal University, Nanjing 210023, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(10), 1495; https://doi.org/10.3390/rs18101495

Submission received: 24 March 2026 / Revised: 29 April 2026 / Accepted: 7 May 2026 / Published: 10 May 2026

(This article belongs to the Section Earth Observation for Emergency Management)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

An interpretable ensemble machine-learning framework integrating social media–derived flood inventories, optimized non-flood sampling, and GeoShapley-based explainability achieved strong flood susceptibility mapping performance in Guangzhou, with an AUC of 0.893 and a precision of 0.859.
The flood susceptibility map produced in this study indicates that areas with High and Very-high susceptibility together cover about 26% of the study area (1897.23 km²). Interpretability analysis identifies the nighttime light index, impervious surface percentage, and population density as the most strongly associated positive factors in the model.

What are the implications of the main findings?

A non-flood sampling strategy that jointly considers sample similarity and diversity can significantly improve model performance and generalization ability in flood susceptibility mapping.
By improving both predictive accuracy and model interpretability, the proposed framework provides scientific support for flood risk identification, spatial planning, and targeted urban flood mitigation strategies.

Abstract

With the intensification of global climate change and rapid urbanization, urban flooding poses an increasing threat to urban safety and sustainable development. Flood susceptibility mapping (FSM) serves as a practical approach for recognizing areas that may be vulnerable to flooding and is therefore essential for flood mitigation and urban planning. In this study, an interpretable ensemble machine-learning framework for urban FSM was developed using social media data. First, the spatial locations of flood events were extracted from social media posts and news reports to construct a flood inventory. Subsequently, a non-flood sample selection strategy, termed Similarity- and Diversity-Based Representative Sampling (SDRS), was proposed to ensure both sample similarity and diversity. Based on these samples, a heterogeneous bagging-based ensemble machine learning model was established for flood susceptibility assessment. To enhance model interpretability, the GeoShapley method was introduced to quantify the contributions of key conditioning factors and reveal their directional effects. The findings indicated that the proposed SDRS strategy delivered the best performance, yielding an AUC of 0.893 and a test-set precision of 0.859. The resulting susceptibility map exhibited a clear south-to-north decreasing gradient, with High- and Very-high-susceptibility zones accounting for approximately 26% of the study area (1897.23 km²). The interpretability analysis further indicated that the Nighttime Light Index (NLI), Impervious Surface Percentage (ISP), and population density were among the most strongly associated positive factors in the model, with a Global Spatial Share of 7.18%. These findings demonstrate that the proposed framework can reliably recognize areas vulnerable to flooding and offer a scientific basis for urban flood management in Guangzhou.

Keywords:

flood susceptibility mapping; ensemble learning; GeoShapley; social media data; Guangzhou

1. Introduction

In recent years, as climate change has intensified and urbanization has accelerated, extreme rainstorm events have become more frequent, and the resulting urban flooding has posed severe challenges to urban safety and sustainable socioeconomic development [1,2]. In 2024, floods and geological disasters across China affected 53.449 million people, left 709 people dead or missing, and caused direct economic losses of CNY 263.04 billion [3]. As human activities intensify, the increasing modification of urban underlying surfaces and the growing concentration of population have further increased urban flood risk [4]. Therefore, Flood Susceptibility Mapping (FSM) is of great significance for enhancing urban resilience, advancing sponge city construction, and supporting environmental management.

Against the background of continuously increasing urban flood risk, research on FSM has continued to deepen, giving rise to several commonly used methodological approaches, including multi-criteria decision analysis (MCDA) methods, statistical methods, and hydrological and hydrodynamic models [5]. Among these approaches, MCDA methods, represented by the analytic hierarchy process (AHP) and the technique for order preference by similarity to an ideal solution (TOPSIS), produce flood susceptibility zonation maps by assigning weights and integrating multiple flood-conditioning factors, such as slope, elevation, land use, and rainfall [6,7]. In addition to MCDA, statistical models, including the frequency ratio, information value, statistical index, and logistic regression, have also been widely used. Based on flood inventory data, these models characterize the contributions of conditioning factors to flood susceptibility and enable the quantitative identification and comparative analysis of flood-susceptible areas [8]. However, MCDA methods rely heavily on expert judgment in weight determination and class delineation, making them inherently subjective and prone to uncertainty, which may, in turn, affect the consistency and reproducibility of results across different regions and scales [9]. Meanwhile, hydrological and hydrodynamic models can characterize key process information, including inundation extent, water depth, and flow velocity, through numerical simulation, thereby providing a more direct physical basis for susceptibility validation and fine-scale local assessment. However, the application of hydrological modeling depends on long-term hydrological, meteorological, and topographic data, which can be difficult to obtain in basins with incomplete records or lacking streamflow monitoring [10].

To address the limitations of the aforementioned approaches, machine learning has been increasingly introduced into FSM in recent years to extract latent response patterns from historical flood inventories and multi-source environmental factors [11,12]. Machine learning can effectively handle nonlinear relationships and high-dimensional data, thereby improving mapping accuracy [13]. At present, algorithms including random forest (RF), support vector machine (SVM), k-nearest neighbor (kNN), and gradient-boosting models have been validated and applied to FSM in different regions [11,14]. However, the accuracy of a single model may vary depending on its assumptions about data distribution, sensitivity to extreme values, internal mechanisms, and design purpose [15]. Therefore, ensemble learning has gradually become an effective strategy for improving the robustness of FSM. Among these approaches, Bootstrap Aggregating (Bagging) constructs multiple base learners and aggregates their outputs through voting or averaging, thereby effectively reducing model variance and alleviating overfitting [16]. Existing studies have shown that Bagging-based ensemble frameworks generally exhibit better predictive performance and mapping reliability than single models in flood susceptibility assessment [17,18].

Moreover, the quality of both flood samples and non-flood samples also plays an important role in the accuracy of FSM. Flood samples may be derived from a range of sources, such as reports from citizens, satellite imagery, and official social media posts [19,20,21]. They can then be used in machine learning after geocoding. This process helps ensure the high quality of flood samples, but a clear evaluation of non-flood sample quality is still lacking. In FSM, two commonly used methods for generating non-flood samples are random sampling (RS) [22] and stratified sampling (SS) [23]; however, these approaches lack an explicit assessment of non-flood sample quality. One potentially serious problem is sample contamination, in which the non-flood training set may contain samples that share characteristics with documented flood records [24]. Therefore, the strategy for selecting non-flood samples and evaluating their quality still requires further improvement.

In summary, this study proposes an interpretable ensemble machine-learning framework based on social media data and validates it through a case study in Guangzhou. Specifically, flood locations are first extracted from social media posts and news reports to construct a flood inventory for modeling. A non-flood sample selection method oriented toward similarity and diversity is then developed to improve the reliability of the training data. Subsequently, a heterogeneous Bagging-based flood susceptibility model is established to enhance mapping accuracy, stability, and generalization ability. Finally, GeoShapley is introduced to interpret the model outputs and reveal the contributions and directional effects of key conditioning factors together with their spatial heterogeneity. Therefore, the main objective of this study is to develop a technical framework for social media-driven urban flood susceptibility mapping that integrates flood inventory construction, non-flood sample optimization, ensemble learning, and spatial interpretability analysis, with the aim of supporting urban flood-control planning and emergency management.

2. Study Area and Data

2.1. Study Area

Guangzhou, located on the southeastern coast of China, covers an area of 7434 km² (Figure 1). In 2024, it had 18.978 million permanent residents and a gross domestic product (GDP) of CNY 3.1 trillion, underscoring its role as a major central city in the Guangdong–Hong Kong–Macao Greater Bay Area [25]. The study area has a subtropical oceanic monsoon climate and receives 1600–1900 mm of annual precipitation, with around 80% occurring from April to September, resulting in frequent flood disasters [26]. From 18 to 22 April 2024, Guangzhou experienced persistent heavy rainfall, resulting in damage to homes, road blockages, and landslides.

2.2. Datasets

2.2.1. Flood Inventory

The flood inventory used in this study was derived from social media data and news texts. The social media data were collected from Sina Weibo (https://weibo.com), while the news data were obtained from major news portals and local media reports. Using a Python-based web crawler, a combined keyword search was conducted with “Guangzhou” and terms such as “waterlogging”, “ponding”, “inundation”, “flooding”, and “rainstorm”, with the publication period restricted to 18–23 April 2024. The raw data were then subjected to duplicate removal and validity screening, and records with locations unrelated to flooding were excluded. Only texts referring to specific roads, communities, or landmarks were retained. In total, 3835 flood-related text records were obtained, including 3373 Weibo posts and 462 news reports. Based on the textual information, the locations of flood events were further extracted and georeferenced. Ultimately, 200 flood-occurrence locations were identified and used as the flood inventory for the subsequent flood-susceptibility analysis in Guangzhou.

2.2.2. Conditioning Factors

The conditioning factors considered in this study are summarized in Table 1 and include: (1) Digital elevation model (DEM): obtained from ASTER GDEM data at a resolution of 30 m; (2) River network distribution: derived from OpenStreetMap (OSM), with the dataset updated in January 2020; (3) Normalized Difference Vegetation Index (NDVI): calculated from Landsat 8 imagery, with a resolution of 30 m; (4) Annual precipitation: obtained from the annual average dataset of meteorological elements in China released by the Resource and Environmental Science Data Center, Chinese Academy of Sciences (RESDC; https://www.resdc.cn/), at a resolution of 1 km; (5) Road distribution: derived from OSM, with the dataset updated in January 2020; (6) Land use: obtained from the multi-period land-use remote sensing monitoring dataset of China released by RESDC, at a resolution of 30 m; (7) Population density: with a resolution of 1 km, obtained from the China population spatial distribution kilometer-grid dataset published by RESDC; (8) GDP: the China GDP spatial distribution kilometer-grid dataset released by RESDC, with a spatial resolution of 1 km; (9) Nighttime light: with a spatial resolution of 0.0083°, obtained from the annual nighttime light dataset of China published by RESDC; and (10) Emergency shelters: shelter-related points of interest (POIs) obtained from Amap.

3. Methodology

Figure 2 illustrates a three-step framework for FSM and interpretability analysis. First, based on social media posts and news texts, the XLNet-BiLSTM-CRF model was used to extract the spatial locations of flood events and construct a flood inventory. Subsequently, a non-flood sample selection method oriented toward similarity and diversity, termed Similarity- and Diversity-Based Representative Sampling (SDRS), was proposed to generate non-flood samples. Next, 14 conditioning factors were constructed from both natural environmental and socioeconomic dimensions, and feature selection was performed through multicollinearity testing. Finally, a heterogeneous Bagging framework integrating RF, Light Gradient Boosting Machine (LightGBM), and multilayer perceptron (MLP) was constructed to generate the flood susceptibility map of the study area. GeoShapley was further introduced to interpret the model outputs and reveal the contributions of key conditioning factors and their spatial heterogeneity.

3.1. Flood Location Extraction from Social Media Text Using XLNet-BiLSTM-CRF

To obtain the spatial locations of flood events from social media texts, an XLNet-BiLSTM-CRF-based location-entity recognition method was developed to automatically extract place names. First, the collected flood-related social media texts were cleaned and standardized, including the removal of irrelevant symbols and the unification of text formats. A location-entity dataset was then constructed from manual annotations, and the boundaries of place names were labeled using the BIO tagging scheme. Subsequently, the samples were partitioned into training and test sets in a 7:3 ratio for model training and performance evaluation.

In terms of model construction, XLNet was first employed to encode the contextual semantics of the texts to fully capture the semantic information embedded in the non-standard expressions commonly found in social media. Subsequently, a bidirectional long short-term memory network (BiLSTM) was incorporated to better capture contextual dependencies within the sequence and enhance the representation of contextual features associated with location entities. Finally, a conditional random field (CRF) layer was employed for global decoding of the label sequence, and the transition constraints between labels were utilized to optimize entity boundary recognition. After model training was completed, the model was applied to unlabeled texts to automatically identify flood-related location entities, thereby providing spatial sample support for the subsequent flood susceptibility analysis.

3.2. Non-Flood Sampling

3.2.1. Random Sampling (RS)

Random sampling (RS) is a commonly used method for selecting non-flood samples in FSM. This method assumes that each grid cell within the study area has the same probability of not experiencing flooding. Therefore, after excluding the grid cells corresponding to recorded flood events, non-flood samples equal in number to the flood samples are randomly selected across the entire study area and used for model training and analysis.

3.2.2. Stratified Sampling (SS)

When the sampling population shows marked heterogeneity across subgroups, it should be partitioned into several strata to ensure that between-group differences are maximized while within-group differences are minimized. The number of samples drawn from each group is determined by that group’s proportion in the overall population, for example, an area-weighted proportion. The samples obtained from different groups are then combined to form a stratified sample. In this study, the area was divided into five clusters, labeled G1, G2, G3, G4, and G5, by applying k-means clustering to elevation, slope, the Topographic Wetness Index (TWI), distance to rivers, impervious surface percentage, and road density. This method belongs to unsupervised classification and can effectively identify spatial units with similar environmental conditions. Under an area-weighted scheme, non-flood samples were sampled from the five clusters [27]. The number of non-flood samples assigned to subgroup

i

was determined as follows:

M_{n f} (i) = M_{n f} \times \frac{S_{i}}{\sum_{i = 1}^{5} S_{i}}

(1)

where

M_{n f} (i)

represents the number of non-flood samples assigned to subgroup

i

,

S_{i}

is the area of that subgroup, and

M_{n f}

represents the total number of non-flood samples.

3.2.3. Similarity- and Diversity-Based Representative Sampling (SDRS)

This study proposes a non-flood sample selection method oriented toward similarity and diversity (Figure 3). At the pixel level, a non-flood suitability score,

S (p)

, is constructed. Based on this score, the non-flood sample selection follows a two-stage procedure consisting of candidate-pool construction and representativeness-constrained sampling. First, the boundary of the study area is moderately contracted to reduce boundary effects, and a candidate pool is constructed from areas with high

S

values. Subsequently, candidate points are grouped according to the quantiles of

S

within the candidate pool, and within-group sampling quotas are assigned to balance high-scoring candidates against overall spatial representativeness. Finally, the selection order within each group is randomized, and greedy screening is performed under a global minimum-distance constraint to avoid excessive sample clustering and improve spatial coverage, until the predefined number of non-flood sample points is obtained. Let

p

denote the center of a grid cell, and let the flood sample set be

F = {x_{i}}_{i = 1}^{n}

. The suitability score

S (p)

is defined as a weighted combination of a distance factor and a density factor.

(1): Distance Factor

The distance factor is used to characterize the degree of spatial separation between a grid cell and its nearest flood sample. First, the nearest distance from each grid cell to the flood sample set is calculated. A fractional normalization with a scale parameter is then adopted to prevent the distance effect in remote areas from saturating too rapidly. The formulation is given as follows:

d (p) = \min_{q \in F} ∥ p - x i ∥

(2)

α = m e d i a n d_{i},

(3)

f_{d} (p) = \frac{d (p)}{d (p) + α}

(4)

where

∥ \cdot ∥

denotes the planar Euclidean distance;

d_{i}

represents the distance from the

i

th flood sample to its nearest neighboring flood sample; and

α

is a scale parameter. In this study, the median nearest-neighbor distance of the flood samples is used for

α

to achieve scale adaptivity.

(2): Density Factor

The density factor is used to characterize the spatial clustering intensity of flood samples to reduce the likelihood that non-flood samples are selected within neighborhoods with a high flood occurrence frequency near highly clustered flood areas. In this study, kernel density estimation (KDE) is employed to calculate the density of flood samples at location

p

, followed by min–max normalization. To enhance the discrimination of low- and medium-density areas, the complement of the square-root-transformed normalized density is taken to obtain the density factor:

K (p) = \frac{1}{n} \sum_{i = 1}^{n} \exp (\frac{∥ p - x_{i} ∥^{2}}{2 h^{2}})

(5)

\tilde{K} (p) = \frac{K (p) - K_{m i n}}{K_{m a x} - K_{m i n}}

(6)

f_{k} (p) = 1 - \sqrt{\tilde{K} (p)}

(7)

where

h

is the kernel bandwidth (

h = 500 m

in this study);

K_{m i n} = {m i n}_{q} K (q)

, and

K_{m a x} = {m a x}_{q} K (q)

.

(3): Integrated Suitability Score

The non-flood sample suitability score for each grid cell in the study area is obtained by combining the distance factor and the density factor in a weighted manner, yielding the integrated score:

S (p) = w_{d} f_{d} (p) + w_{k} f_{k} (p)

(8)

w_{d} + w_{k} = 1, w_{d} > 0, w_{k} > 0

(9)

where

w_{d}

and

w_{k}

denote the weights of the distance factor and the density factor, respectively. In this study,

w_{d} = 0.6

and

w_{k} = 0.4

. A larger

S (p)

indicates that the location is farther from flood samples and lies in an area with a lower clustering intensity of flood samples, and is therefore more suitable as a candidate unit for non-flood samples.

3.3. Flood Susceptibility Modeling

3.3.1. Selection of Conditioning Factors

Based on the flood-generating mechanisms of urban flooding and data availability, this study identified 14 conditioning factors from two dimensions, namely the natural environment and socioeconomic conditions (Table 2, Figure 4). To make the different factors comparable, all raster datasets were first converted to a unified spatial reference system (WGS_1984_UTM_Zone_49N) and subsequently resampled to 30 m. This resolution was selected to ensure consistency with the main environmental variables and to adequately represent the local spatial heterogeneity of urban flood susceptibility in the study area.

(1): Topographic factors. Elevation (Elv) was directly acquired from ASTER GDEM data. Slope (SLO) and aspect (ASP) were then derived from the DEM in ArcMap 10.8 to characterize the potential energy conditions of surface runoff and flow-direction characteristics; in this study, ASP was retained as a continuous variable to maintain a consistent preprocessing scheme for topographic factors, although this treatment may not fully represent its circular directional property. The TWI was used to characterize topographic convergence and potential water accumulation conditions.
(2): Hydrological factors. River Density was used to characterize the development of flow-conveyance pathways and runoff connectivity. Based on river network line features from OSM, a continuous raster was generated using kernel density estimation. Distance to River (DR) was used to quantify the strength of river-channel influence, and the Euclidean distance algorithm was applied to calculate the distance from any point in the study area to the nearest river channel.
(3): Vegetation and meteorological factors. Rainfall was represented by annual precipitation in 2024 and was used to describe the external triggering conditions for flood occurrence. Vegetation conditions were represented by NDVI, which was calculated from Landsat 8 imagery and used to characterize the regulating effects of vegetation cover on infiltration, runoff generation, and flow concentration.
(4): Urbanization and socioeconomic factors. Based on road distribution data from OSM, Road Density was generated using the Line Density tool in ArcMap 10.8 to characterize urban transportation connectivity and its potential influence on surface runoff organization. Impervious Surface Percentage (ISP) was derived by extracting the built-up land category from land-use data and calculating its area proportion within each grid cell to reflect the runoff-enhancing effect of the underlying surface. Population Density and GDP were obtained, respectively, from the China population spatial distribution kilometer-grid dataset and the China GDP spatial distribution kilometer-grid dataset. Both were then uniformly processed to a 30 m spatial resolution for use as proxy variables of urban activity intensity and asset concentration. Nighttime Light Index (NLI) was used to reflect urban economic vitality and the intensity of human activities and was derived by standardizing nighttime light radiance. Emergency Shelter Density was generated from shelter-related POIs obtained from Amap, including explicitly designated emergency shelters and potentially convertible shelter-carrying spaces, such as parks and squares, schools, and sports venues. After category screening and spatial deduplication, kernel density estimation was performed in ArcMap 10.8 to derive this factor, which was used to characterize the spatial supply intensity of emergency shelter resources and was incorporated as an auxiliary indicator of urban emergency support and adaptive capacity.

3.3.2. Multicollinearity Analysis of Conditioning Factors

In this study, Variance Inflation Factor (VIF) and Tolerance (TOL) were employed to assess multicollinearity among the conditioning factors, thereby identifying the factors most closely associated with flood occurrence and improving the performance and stability of model classification. In general, a TOL value of less than 0.1 indicates that the variable can be well explained linearly by the remaining variables and is therefore subject to a significant risk of multicollinearity. When VIF is less than 10, no significant multicollinearity is considered to exist. Factors with VIF values greater than or equal to 10 were regarded in this study as highly collinear variables and were therefore removed or replaced [10].

3.3.3. Ensemble Machine Learning and Model Training

To enable the synergistic integration of multi-source evidence in urban flood susceptibility identification and to improve model stability and generalization under complex urban environmental conditions, this study develops a heterogeneous ensemble machine-learning model for FSM. The model adopts RF, LightGBM, and MLP as base learners and achieves collaborative multi-model training and result fusion through a heterogeneous Bagging strategy [40,41].

Specifically, the ensemble is constructed using 200 bootstrap iterations, in which each base learner is randomly selected according to predefined probabilities (RF: 0.4, LightGBM: 0.4, MLP: 0.2). During each iteration, a resampled dataset is generated using a spatial block-based bootstrap strategy (block size = 2000 m), which preserves the original class distribution while mitigating the influence of spatial autocorrelation and potential information leakage. Each sub-model independently learns the mechanisms of flood occurrence based on different algorithmic structures and feature subsets, thereby increasing diversity at both the data and model levels. Upon training, each base learner produces probabilistic predictions of flood occurrence. These predictions are subsequently aggregated using a performance-based weighting scheme, where the weight assigned to each sub-model is determined by its discriminative performance on out-of-bag (OOB) samples. This mechanism prioritizes more reliable models while attenuating the influence of less informative ones, thereby improving the overall robustness of the ensemble. Furthermore, to enhance the reliability of probabilistic outputs, a Platt scaling calibration step based on out-of-fold (OOF) predictions is incorporated into the ensemble framework, which improves probability calibration without affecting the ranking performance of the model.

Mechanistically, RF is well-suited for capturing local decision rules and exhibits strong noise tolerance; LightGBM can efficiently characterize the nonlinear responses and high-order interactions between environmental factors and flood risk; and MLP complements these models by enhancing the representation of latent patterns in continuous feature space. Through heterogeneous integration, the three models form complementary advantages, enabling a better trade-off between bias and variance. Considering the spatial autocorrelation of flood samples, a spatial block-based resampling strategy is further adopted during training to reduce the potential bias induced by spatial dependence, thereby improving the model’s generalization ability across different regions [42].

3.3.4. Model Performance Evaluation

To evaluate the performance of the proposed flood susceptibility model, the dataset was randomly divided into training and testing subsets with a ratio of 7:3. To ensure reproducibility, a fixed random seed (RANDOM_SEED = 42) was used during the data partitioning process. This strategy enables a straightforward assessment of model generalization performance on unseen data.

Model performance is evaluated using multiple statistical indicators, namely Accuracy, Precision, Recall, and F1-score [43]. In addition, the classification capability of the model was further evaluated using ROC-AUC [44]. A higher AUC value indicates better model performance, with values ranging from 0.5 to 1.

3.4. GeoShapley-Based Model Explainability

To examine how the effects of different conditioning factors on flood susceptibility vary across space, this study employed GeoShapley analysis. Based on the traditional Shapley Additive exPlanations (SHAP) framework, GeoShapley directly incorporates geographic coordinates into the feature interaction model [45], thereby enabling the contribution of spatial location to model outputs to be quantified and effectively explaining the spatial heterogeneity in geospatial machine-learning results [46]. The contribution of the location feature group is calculated as follows:

ϕ_{G E O} = \sum_{s = 0}^{p - g} \sum_{\begin{matrix} S \subseteq M ∖ G E O \\ ∣ S ∣ = s \end{matrix}} \frac{s! (p - s - g)!}{(p - g + 1)!} [f (S \cup G E O) - f (S)]

(10)

where

p

denotes the total number of features in FSM—the selected conditioning factors.

M ∖ {G E O}

denotes the set of all features excluding the location feature group

G E O

.

S

is a subset of

M ∖ {G E O}

, with cardinality

s

; accordingly, s = |S| and ranges from 0 to p − g.

G E O

denotes the location feature group, with cardinality

g

;

g

is the number of geographic coordinates, and in this study,

g = 2

, corresponding to longitude and latitude.

f (S)

refers to the model output when only the feature subset

S

is used, whereas

f (S \cup {G E O})

denotes the model output obtained after adding the location feature group

G E O

to

S

.

ϕ_{G E O}

measures the marginal contribution of the location feature group to the model output while the remaining features are held constant [45]. The summation in Equation (10) is taken over all admissible subsets

S \subseteq M ∖ {G E O}

. GeoShapley decomposes the model output

\hat{y} \in R^{n}

into the sum of four components:

\hat{y} = ϕ_{0} + ϕ_{G E O} + \sum_{j = 1}^{p} ϕ_{j} + \sum_{j = 1}^{p} ϕ_{G E O, j}

(11)

where

ϕ_{0}

denotes the global baseline term, representing the average output level and serving as the global intercept;

ϕ_{G E O} \in R^{n}

measures the intrinsic location effect for each observation;

ϕ_{j} \in R^{n}

denotes the location-invariant effect of the

j

th conditioning factor; and

ϕ_{G E O, j} \in R^{n}

denotes the spatially varying interaction effect induced by its interaction with location, thereby capturing how the effect of this factor varies across space.

4. Results

4.1. Validation of Social-Media-Derived Flood Locations

Table 3 presents the validation results of the XLNet-BiLSTM-CRF model for flood location-entity recognition. Overall, the model effectively identified flood-related location entities in social media texts, with relatively stable performance, thereby meeting the requirements for flood sample extraction. These results indicate that deriving flood location information from social media texts is reasonably reliable and can provide credible flood sample support for subsequent flood susceptibility analysis.

4.2. Multicollinearity Diagnostics of Conditioning Factors

Table 4 presents the results of the multicollinearity diagnostics. The collinearity statistics showed that the VIF values of the 14 conditioning factors were far below the critical threshold of 10, while all TOL values were above the critical threshold of 0.1. These results indicate that no multicollinearity existed among the 14 conditioning factors; therefore, all conditioning factors were retained as model inputs.

4.3. Model Performance Results

As shown in Table 5, model performance differed markedly among the three sampling approaches, with SDRS achieving the best performance. Across Accuracy, Precision, Recall, and F1-score, SDRS performed better than both RS and SS. By contrast, SS showed only a slight improvement over RS, indicating that optimizing sample distribution solely through stratification constraints provided relatively limited gains in model performance. In comparison, SDRS, by jointly considering the similarity and diversity of non-flood samples, more effectively enhanced the model’s discrimination between flood and non-flood units.

The ROC curves and AUC values for the different sampling approaches are shown in Figure 5, and the results were consistent with the evaluation metrics reported in Table 5. SDRS achieved the highest AUC values in both the training (0.914) and test (0.893) sets, indicating stronger discriminative ability and better generalization performance. The AUC values of RS and SS were overall very close. Specifically, on the training set, SS (0.861) was slightly higher than RS (0.846), whereas on the test set, RS (0.796) was slightly higher than SS (0.794), suggesting that the two conventional sampling approaches still had certain limitations in improving the model’s discrimination ability. In contrast, by preferentially selecting highly suitable samples that were distant from flood samples and located in low-density areas, SDRS effectively reduced the interference caused by sample misselection and spatial clustering.

4.4. Flood Susceptibility Mapping Results

Figure 6 depicts how flood susceptibility is distributed in space under the three sampling approaches: RS, SS, and SDRS. Using the K-means clustering method, the flood susceptibility index (FSI) was classified into five levels: Very-low, Low, Medium, High, and Very-high. Overall, the spatial patterns of flood susceptibility obtained from the three sampling approaches were generally consistent. Very-low and Low-susceptibility zones were mainly located in the north and northeast of the study area. In contrast, the south, southwest, and south-central areas formed relatively continuous belts of High and Very-high susceptibility. This pattern indicates that flood-prone areas in the study area generally exhibited a spatial differentiation characterized by increasing susceptibility from north to south. Recorded flood samples were primarily concentrated within High- and Very-high susceptibility zones in the southern and southwestern parts, while a distinct local hotspot also appeared in the central area. These results indicate that all three sampling approaches were able to identify the major flood-prone areas and their spatial clustering characteristics in the study area reasonably well.

Although the overall spatial patterns were similar, different sampling approaches still showed marked differences in susceptibility-level allocation and hotspot boundary delineation (Figure 7). Among them, SS assigned more areas to the Very-low and Low classes, with areas of 2444.45 and 2065.41 km², respectively, indicating that this sampling approach tended to enlarge the extent of low-susceptibility zones and produced relatively more conservative zoning results. RS had the largest Medium area (1525.88 km²), suggesting that this approach preserved a wider transitional belt between High- and Low-susceptibility zones and thus exhibited a stronger spatial gradient of differentiation. In contrast, SDRS produced the largest Very-high area (1038.64 km²), and the combined area of the High and Very-high classes reached 1897.23 km², indicating that it identified the core High-susceptibility hotspots in a more concentrated manner and more readily highlighted the spatial continuity of High-susceptibility zones. This finding further confirms its applicability to FSM.

4.5. Feature Importance and Model Explainability Results

The bar chart of mean absolute SHAP values (Figure 8a) quantitatively revealed the ranking of the conditioning factors’ importance in the model outputs. Among them, NLI, ISP, and Population Density had the highest mean absolute SHAP values, indicating that the spatial differentiation of flood susceptibility in the study area was strongly associated with urban development intensity and population exposure. This rank ordering should, however, be interpreted in light of the social-media reporting bias discussed in Section 5.3, because information-active areas may be more likely to contribute reported flood records to the inventory. NDVI (0.288) and Emergency Shelter Density (0.254) showed the next highest contributions, suggesting that ecological buffering capacity and the allocation of disaster-prevention facilities also played significant roles in regulating the susceptibility pattern. In contrast, GDP, Elv, and Road Density made moderate contributions, whereas SLO, DR, ASP, River Density, TWI, and Rainfall had relatively weak overall effects, acting mainly as background constraints. The SHAP summary plot (Figure 8b) further revealed the direction of the effect of each input variable on the model output. High values of NLI, ISP, and Population Density were mainly distributed in the positive SHAP range, suggesting that these features tended to contribute positively to the predicted flood susceptibility scores, particularly in areas characterized by intense human activity, high imperviousness, and concentrated population exposure. Among these factors, the positive effect of ISP was the most direct, reflecting that surface hardening weakened infiltration and intensified surface runoff concentration. In contrast, high NDVI values were mainly associated with negative SHAP values, indicating that greater vegetation cover suppressed flood susceptibility by intercepting rainfall, promoting infiltration, and slowing runoff concentration. Elv and SLO showed similar patterns: samples with low elevation and low slope were more likely to make positive contributions, suggesting that low-lying, flat areas were more conducive to water accumulation. Overall, the SHAP results indicated that flood susceptibility in the study area was jointly controlled primarily by urban development intensity, population exposure, and local ecological and topographic conditions.

Building upon the interpretation of attribute factors, GeoShapley further identified the additional contribution of spatial effects to the model output. The results showed that the Global Spatial Share was 7.18%, indicating that the model output was still dominated by the non-spatial effects of attribute variables. Although spatial effects had a certain explanatory power, they were not the dominant source. In the context of this study, this value indicates a relatively limited but still meaningful spatial contribution, suggesting that spatial effects mainly played a supplementary rather than dominant role in explaining flood susceptibility. This suggests that, beyond variables such as socioeconomic conditions, underlying surface characteristics, and topographic factors, some spatial dependence remained within the study area and contributed additionally to the local manifestation of flood susceptibility. The map of the Local Spatial Share (GeoRatio) further showed that the spatial effect was not uniformly distributed, with pronounced local heterogeneity (Figure 9). The GeoRatio ranged from 0.0347 to 0.1456 and overall exhibited a patchy pattern, with higher values concentrated in the central urban area and in parts of the southwestern and southeastern regions. In contrast, lower values were mainly observed in peripheral areas. This indicates that spatial effects were more pronounced in specific local areas, suggesting that flood susceptibility in these regions may be further influenced by interactions among neighboring units in addition to environmental and anthropogenic factors. Overall, flood susceptibility in the study area was primarily associated with urban development intensity and population exposure. Meanwhile, the observed spatial pattern was also related to ecological buffering capacity, local topographic conditions, and spatial neighborhood effects.

5. Discussion

5.1. Policy Implications

By overlaying flood susceptibility with the GeoRatio, this study further identifies four mechanism-based zones: low-susceptibility/low-spatial-share (LL), low-susceptibility/high-spatial-share (LH), high-susceptibility/low-spatial-share (HL), and high-susceptibility/high-spatial-share (HH). In this classification, both flood susceptibility and GeoRatio are dichotomized into “low” and “high” categories, with the 50th percentile (median) of their distributions within the study area serving as the threshold. Figure 10 shows that extensive HH clusters are mainly located in the southern and central-western areas of the study region. In contrast, the HL, LH, and LL zones show different degrees of mosaic-like and transitional distributions. These zone types suggest differentiated spatial characteristics in the formation and propagation of flood risk, thereby providing a preliminary analytical basis for zone-specific and differentiated flood management.

Among these zone types, HH areas should be prioritized for coordinated regional governance. Such areas not only exhibit high flood susceptibility but may also amplify risk accumulation, transmission, and neighborhood spillover across adjacent subregions. They should therefore be treated as key targets for coordinated regional intervention, with emphasis placed on improving the conveyance capacity of trunk drainage corridors, rehabilitating critical flow-convergence nodes, and coordinating the allocation of detention and retention spaces to weaken their risk-propagation effects. In contrast, HL areas are more characterized by pronounced localized flood impacts, while their spatial diffusion effects are relatively limited. Accordingly, they are better suited to site-specific and fine-grained interventions centered on local hazard-generating factors, including the remediation of low-lying spots, enhancement of surface permeability, and improvement of small-scale stormwater detention and retention facilities, thereby reducing the severity of localized flood impacts. Meanwhile, although LH areas are not the most prominent local high-risk zones, their relatively high spatial share indicates that they may still play a potential role in runoff convergence, risk transfer, and neighborhood spillover, reflecting a certain degree of hidden vulnerability or potential spatial spillover effects. These areas should therefore be included in the scope of preventive control and priority inspection and receive particular attention regarding development-intensity regulation, the preservation of blue–green corridors, and coordinated governance with adjacent subregions. By contrast, LL areas exhibit relatively low risk levels and a comparatively clear mechanism of influence. They can therefore serve as benchmark zones for routine monitoring and for evaluating governance effectiveness. Overall, mechanism-based zoning, derived from the dual dimensions of susceptibility and spatial share, helps align governance measures with the mechanisms of risk formation. Therefore, it may provide a useful reference for shifting urban flood governance from universal, study-area-wide remediation toward more risk-identification-based and spatially differentiated strategies.

5.2. Contribution to Flood Susceptibility Mapping

This study extends existing FSM research through three key advances. First, this study transforms heterogeneous flood information from social media and news texts into a flood inventory that can be directly used for modeling, rather than treating such information merely as an auxiliary source for event identification or information supplementation. By constructing flood samples through text extraction and geolocation, this study not only expands the use of emerging data sources for sample acquisition in FSM but also provides a more operational pathway for constructing flood inventories in regions where official disaster records are insufficient.

Second, this study further addresses a longstanding methodological issue in FSM related to non-flood sample selection. Compared with conventional non-flood sample construction strategies, the proposed SDRS approach jointly considers the similarity and diversity of non-flood samples, thereby reducing the influence of sample contamination and class confusion on mapping results. Unlike the inverse-occurrence sampling approach of Wang et al. (2023) [27], which allocates non-flood samples across WOE–k-means-derived subpopulations in inverse proportion to historical flood-report counts, SDRS operates at the pixel level by integrating distance and density factors into a suitability score and further applying representativeness-constrained selection to balance sample similarity and diversity.

Finally, this study integrates ensemble machine learning with spatial interpretability analysis, enabling FSM not only to identify flood-susceptible areas but also to further reveal the directions of influence of different conditioning factors and their spatial heterogeneity. Compared with research pathways that focus only on susceptibility zoning results, this framework further strengthens the ability to identify the directions of influence, relative contributions, and spatial heterogeneity of different conditioning factors, thereby advancing FSM from result-oriented mapping toward mechanism interpretation. Therefore, the main contribution of this study is to advance FSM from result-oriented mapping toward mechanism identification through flood inventory construction, non-flood sample optimization, and interpretability analysis of model outputs.

5.3. Research Limitations and Prospects

Although the framework developed in this study shows strong capability and interpretability in identifying flood susceptibility in the Guangzhou case, several limitations still remain. First, the flood inventory used in this study is mainly derived from social media texts and news reports collected during the extreme rainfall event in April 2024. Such data improve the feasibility of sample acquisition in situations where official disaster records are insufficient; however, their spatial and temporal distributions are still inevitably affected by selective reporting bias, geolocation errors, and differences in information coverage, which may lead to the concentration of event samples in information-active areas. Therefore, the flood inventory used in this study should be more appropriately understood as event-based reported flood locations rather than a complete and unbiased representation of all actual flood occurrences. However, it can still provide useful sample support for general flood susceptibility mapping. In addition, independent external validation using hydrological observations or satellite-derived inundation data was not conducted in this study. Second, SDRS shows better discriminative ability and stability than RS and SS, indicating that it can effectively improve sample construction and model identification under the current research setting. Nevertheless, more rigorous statistical significance testing and repeated spatial resampling are still needed in future studies to further verify the robustness of this advantage. However, under the current experimental setting, a strict paired statistical test is not straightforward, because SDRS, RS, and SS generate different non-flood sample sets and therefore different sample compositions within each spatial fold. As a result, the fold-wise performance estimates across sampling strategies cannot be regarded as fully matched observations under identical experimental conditions. Future work should therefore consider controlled-comparison designs with strictly matched samples or repeated spatial cross-validation schemes to enable more rigorous statistical inference of performance differences. In addition, future work should further examine SDRS’s sensitivity to key parameter settings, compare it with more robust negative sampling strategies, and analyze flood/non-flood distance distributions across sampling methods to more comprehensively assess its robustness and applicability. Comparative experiments under different sample sizes and flood/non-flood sample configurations would also help to further assess the stability of the proposed framework under varying data conditions. Finally, this study is validated using only the single case of Guangzhou. It has not yet been systematically compared across cities with different climatic backgrounds, topographic conditions, and drainage system characteristics. Therefore, the cross-regional applicability and generalization ability of the proposed framework still require further evaluation.

Further research should integrate multi-event, multi-city, and multi-source flood inventories to enhance sample representativeness and to assess the robustness and generalizability of this framework under different urban contexts. Priority should also be given to multi-source cross-validation by incorporating satellite-derived inundation extent, hydrodynamic simulation outputs, and higher-quality official flood or disaster records where available. In addition, incorporating more dynamic variables, such as drainage network density, drainage capacity, and short-duration rainfall intensity, would help to more fully characterize the hydrological mechanisms and human–environment coupling involved in urban flood formation. Systematic cross-city applications and comparative analyses are likewise essential for rigorously evaluating and enhancing the applicability and generalizability of this framework. Further integrating interpretable machine learning with hydrodynamic simulation, scenario analysis, and transfer learning also represents an important pathway for advancing FSM from static pattern recognition toward mechanism interpretation oriented to climate adaptation and planning-oriented flood governance.

6. Conclusions

This study developed an interpretable ensemble machine-learning framework for FSM using social media data and validated it through a case study in Guangzhou. The framework integrated flood inventory extraction from social media texts, optimized non-flood sample selection, ensemble learning-based modeling, and spatial interpretability analysis, thereby improving both the classification performance and the interpretability of urban FSM. The main findings are summarized as follows.

(1): Social media texts provided reliable support for urban flood inventory construction. On the test set, the XLNet-BiLSTM-CRF-based location-entity recognition model achieved an F1-score of 0.822 and a Recall of 0.852, demonstrating that it was able to extract flood-related location entities with satisfactory accuracy and meet the requirements of flood sample geolocation, providing credible samples for subsequent FSM.
(2): Among the three non-flood sampling approaches, the proposed SDRS performed best in flood susceptibility modeling. Its AUC on the test set reached 0.893, representing increases of 0.097 and 0.099 over RS and SS, respectively. Its Precision reached 0.859, representing increases of 0.137 and 0.132 over RS and SS, respectively. These results indicate that a non-flood sample construction approach that jointly considers sample similarity and diversity can substantially improve the model’s discriminative ability and generalization performance.
(3): The final flood susceptibility map showed a clear spatial differentiation pattern, with higher susceptibility in the south, lower susceptibility in the north, and evident hotspot clustering. Very-low- and Low-susceptibility zones together made up about 59% of the study area, Medium-susceptibility zones accounted for approximately 15%, and High- and Very-high-susceptibility zones accounted for approximately 26%. Among them, the combined area of High and Very-high susceptibility reached 1897.23 km² and was mainly distributed across the southern, southwestern, and south-central parts of the study area. Meanwhile, the northern and northeastern parts were dominated by Very-low and Low susceptibility.
(4): The interpretability analysis showed that flood susceptibility in Guangzhou was primarily associated with urban development intensity and population exposure. The SHAP results indicated that NLI, ISP, and Population Density were the most important positive factors, whereas NDVI showed a negative effect. The GeoShapley results showed that the Global Spatial Share was 7.18%, indicating that the spatial differentiation of flood susceptibility was still mainly associated with attribute factors, while local spatial dependence provided only relatively limited supplementary explanation.

Author Contributions

Conceptualization, H.L. and Y.Z.; methodology, Y.Z. and H.L.; software, Y.Z. and S.L.; validation, Y.Z. and H.L.; formal analysis, Y.Z. and H.L.; investigation, Y.Z.; resources, H.L. and S.Z.; data curation, Y.Z. and S.L.; writing—original draft preparation, Y.Z.; writing—review and editing, H.L. and S.Z.; visualization, Y.Z. and S.L.; supervision, H.L. and S.Z.; project administration, H.L.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (Grant No. 42271483).

Data Availability Statement

The data that support the findings of this research are available from the corresponding author upon reasonable request.

Acknowledgments

We would like to express our gratitude to the editors and the reviewers for their valuable comments and suggestions, which helped to improve the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, Y.; Li, C.; Liu, M.; Cui, Q.; Wang, H.; Lv, J.; Li, B.; Xiong, Z.; Hu, Y. Spatial Characteristics and Driving Factors of Urban Flooding in Chinese Megacities. J. Hydrol. 2022, 613, 128464. [Google Scholar] [CrossRef]
Lu, Z.; Tian, Z.; Zhang, H.; Lu, Y.; Chen, X. Flood Susceptibility and Risk Assessment in Myanmar Using Multi-Source Remote Sensing and Interpretable Ensemble Machine Learning Model. ISPRS Int. J. Geo-Inf. 2026, 15, 45. [Google Scholar] [CrossRef]
Wu, Z.; Chen, L.; Shi, L. When the Bordering Levee Breaks: The Impact of Floods on Corporate Financialization in China. Econ. Model. 2026, 155, 107432. [Google Scholar] [CrossRef]
Liu, J.; Shao, W.; Xiang, C.; Mei, C.; Li, Z. Uncertainties of Urban Flood Modeling: Influence of Parameters for Different Underlying Surfaces. Environ. Res. 2020, 182, 108929. [Google Scholar] [CrossRef]
Kaya, C.M.; Derin, L. Parameters and Methods Used in Flood Susceptibility Mapping: A Review. J. Water Clim. Change 2023, 14, 1935–1960. [Google Scholar] [CrossRef]
Allafta, H.; Opp, C. GIS-Based Multi-Criteria Analysis for Flood Prone Areas Mapping in the Trans-Boundary Shatt al-Arab Basin, Iraq-Iran. Geomat. Nat. Hazards Risk 2021, 12, 2087–2116. [Google Scholar] [CrossRef]
Duan, C.; Zhang, J.; Chen, Y.; Lang, Q.; Zhang, Y.; Wu, C.; Zhang, Z. Comprehensive Risk Assessment of Urban Waterlogging Disaster Based on MCDA-GIS Integration: The Case Study of Changchun, China. Remote Sens. 2022, 14, 3101. [Google Scholar] [CrossRef]
Sharma, A.; Poonia, M.; Rai, A.; Biniwale, R.B.; Tügel, F.; Holzbecher, E.; Hinkelmann, R. Flood Susceptibility Mapping Using GIS-Based Frequency Ratio and Shannon’s Entropy Index Bivariate Statistical Models: A Case Study of Chandrapur District, India. ISPRS Int. J. Geo-Inf. 2024, 13, 297. [Google Scholar] [CrossRef]
de Brito, M.M.; Evers, M. Multi-Criteria Decision-Making for Flood Risk Management: A Survey of the Current State of the Art. Nat. Hazards Earth Syst. Sci. 2016, 16, 1019–1033. [Google Scholar] [CrossRef]
Yang, H.; Yao, R.; Dong, L.; Sun, P.; Zhang, Q.; Wei, Y.; Sun, S.; Aghakouchak, A. Advancing Flood Susceptibility Modeling Using Stacking Ensemble Machine Learning: A Multi-Model Approach. J. Geogr. Sci. 2024, 34, 1513–1536. [Google Scholar] [CrossRef]
Gudiyangada Nachappa, T.; Tavakkoli Piralilou, S.; Gholamnia, K.; Ghorbanzadeh, O.; Rahmati, O.; Blaschke, T. Flood Susceptibility Mapping with Machine Learning, Multi-Criteria Decision Analysis and Ensemble Using Dempster Shafer Theory. J. Hydrol. 2020, 590, 125275. [Google Scholar] [CrossRef]
Tien Bui, D.; Hoang, N.-D.; Martínez-Álvarez, F.; Ngo, P.-T.T.; Hoa, P.V.; Pham, T.D.; Samui, P.; Costache, R. A Novel Deep Learning Neural Network Approach for Predicting Flash Flood Susceptibility: A Case Study at a High Frequency Tropical Storm Area. Sci. Total Environ. 2020, 701, 134413. [Google Scholar] [CrossRef] [PubMed]
Janizadeh, S.; Kim, D.; Jun, C.; Bateni, S.M.; Pandey, M.; Mishra, V.N. Impact of Climate Change on Future Flood Susceptibility Projections under Shared Socioeconomic Pathway Scenarios in South Asia Using Artificial Intelligence Algorithms. J. Environ. Manag. 2024, 366, 121764. [Google Scholar] [CrossRef]
Arora, A.; Durga, G.P.; Pandey, M.; Arabameri, A. Machine Learning Model Optimization for Flood Susceptibility Zonation over the Kosi Megafan, Himalayan Foreland Basin, India. Sci. Rep. 2025, 15, 32757. [Google Scholar] [CrossRef]
Shafizadeh-Moghadam, H.; Valavi, R.; Shahabi, H.; Chapi, K.; Shirzadi, A. Novel Forecasting Approaches Using Combination of Machine Learning and Statistical Models for Flood Susceptibility Mapping. J. Environ. Manag. 2018, 217, 1–11. [Google Scholar] [CrossRef]
Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Shahabi, H.; Shirzadi, A.; Ghaderi, K.; Omidvar, E.; Al-Ansari, N.; Clague, J.J.; Geertsema, M.; Khosravi, K.; Amini, A.; Bahrami, S.; et al. Flood Detection and Susceptibility Mapping Using Sentinel-1 Remote Sensing Data and a Machine Learning Approach: Hybrid Intelligence of Bagging Ensemble Based on K-Nearest Neighbor Classifier. Remote Sens. 2020, 12, 266. [Google Scholar] [CrossRef]
Pham, B.T.; Jaafari, A.; Phong, T.V.; Yen, H.P.H.; Tuyen, T.T.; Luong, V.V.; Nguyen, H.D.; Le, H.V.; Foong, L.K. Improved Flood Susceptibility Mapping Using a Best First Decision Tree Integrated with Ensemble Learning Techniques. Geosci. Front. 2021, 12, 101105. [Google Scholar] [CrossRef]
Fu, S.; Lyu, H.; Wang, Z.; Hao, X.; Zhang, C. Extracting Historical Flood Locations from News Media Data by the Named Entity Recognition (NER) Model to Assess Urban Flood Susceptibility. J. Hydrol. 2022, 612, 128312. [Google Scholar] [CrossRef]
Plataridis, K.; Mallios, Z. Flood Susceptibility Mapping Using Hybrid Models Optimized with Artificial Bee Colony. J. Hydrol. 2023, 624, 129961. [Google Scholar] [CrossRef]
Lu, H.; Zhang, S.; Gao, Y.; Jin, H.; Zhao, P.; Gao, Y.; Li, Y.; Wang, W.; Zhang, Y. Using Social Media Data to Construct and Analyze Knowledge Graph for “7.20” Henan Rainstorm Flood Event. Int. J. Disaster Risk Reduct. 2025, 116, 105129. [Google Scholar] [CrossRef]
Tehrany, M.S.; Pradhan, B.; Jebur, M.N. Flood Susceptibility Mapping Using a Novel Ensemble Weights-of-Evidence and Support Vector Machine Models in GIS. J. Hydrol. 2014, 512, 332–343. [Google Scholar] [CrossRef]
Zhao, G.; Pang, B.; Xu, Z.; Peng, D.; Xu, L. Assessment of Urban Flood Susceptibility Using Semi-Supervised Machine Learning Model. Sci. Total Environ. 2019, 659, 940–949. [Google Scholar] [CrossRef] [PubMed]
Huang, H.; Tao, Z.; Zhan, J.; Wang, C. Contrast or Diversity: Non-Flood Sampling in Urban Flood Susceptibility Modelling. J. Hydrol. 2025, 656, 133053. [Google Scholar] [CrossRef]
Ruan, J.; Chen, Y.; Yang, Z. Assessment of Temporal and Spatial Progress of Urban Resilience in Guangzhou under Rainstorm Scenarios. Int. J. Disaster Risk Reduct. 2021, 66, 102578. [Google Scholar] [CrossRef]
Li, Y.; Wang, W.; Chang, M.; Wang, X. Impacts of Urbanization on Extreme Precipitation in the Guangdong-Hong Kong-Macau Greater Bay Area. Urban Clim. 2021, 38, 100904. [Google Scholar] [CrossRef]
Wang, C.; Lin, Y.; Tao, Z.; Zhan, J.; Li, W.; Huang, H. An Inverse-Occurrence Sampling Approach for Urban Flood Susceptibility Mapping. Remote Sens. 2023, 15, 5384. [Google Scholar] [CrossRef]
Tehrany, M.S.; Jones, S.; Shabani, F. Identifying the Essential Flood Conditioning Factors for Flood Prone Area Mapping Using Machine Learning Techniques. CATENA 2019, 175, 174–192. [Google Scholar] [CrossRef]
Rafiei-Sardooi, E.; Azareh, A.; Choubin, B.; Mosavi, A.H.; Clague, J.J. Evaluating Urban Flood Risk Using Hybrid Method of TOPSIS and Machine Learning. Int. J. Disaster Risk Reduct. 2021, 66, 102614. [Google Scholar] [CrossRef]
Arora, A.; Arabameri, A.; Pandey, M.; Siddiqui, M.A.; Shukla, U.K.; Bui, D.T.; Mishra, V.N.; Bhardwaj, A. Optimization of State-of-the-Art Fuzzy-Metaheuristic ANFIS-Based Machine Learning Models for Flood Susceptibility Prediction Mapping in the Middle Ganga Plain, India. Sci. Total Environ. 2021, 750, 141565. [Google Scholar] [CrossRef]
Rahman, M.; Ningsheng, C.; Mahmud, G.I.; Islam, M.M.; Pourghasemi, H.R.; Ahmad, H.; Habumugisha, J.M.; Washakh, R.M.A.; Alam, M.; Liu, E.; et al. Flooding and Its Relationship with Land Cover Change, Population Growth, and Road Density. Geosci. Front. 2021, 12, 101224. [Google Scholar] [CrossRef]
Islam, T.; Zeleke, E.B.; Afroz, M.; Melesse, A.M. A Systematic Review of Urban Flood Susceptibility Mapping: Remote Sensing, Machine Learning, and Other Modeling Approaches. Remote Sens. 2025, 17, 524. [Google Scholar] [CrossRef]
Zhang, T.; Wu, K.; Wang, X.; Li, X.; Li, L.; Chen, L. Impact of Land Use Patterns on Flood Risk in the Chang-Zhu-Tan Urban Agglomeration, China. Remote Sens. 2025, 17, 2889. [Google Scholar] [CrossRef]
Choudhury, S.; Basak, A.; Biswas, S.; Das, J. Flash Flood Susceptibility Mapping Using GIS-Based AHP Method. In Spatial Modelling of Flood Risk and Flood Hazards; Pradhan, B., Shit, P.K., Bhunia, G.S., Adhikary, P.P., Pourghasemi, H.R., Eds.; GIScience and Geo-Environmental Modelling; Springer International Publishing: Cham, Switzerland, 2022; pp. 119–142. [Google Scholar]
Qin, X.; Wang, S.; Meng, M.; Long, H.; Zhang, H.; Shi, H. Enhancing Urban Resilience through Machine Learning-Supported Flood Risk Assessment: Integrating Flood Susceptibility with Building Function Vulnerability. npj Urban Sustain. 2025, 5, 19. [Google Scholar] [CrossRef]
Wu, M.; Wei, X.; Ge, W.; Chen, G.; Zheng, D.; Zhao, Y.; Chen, M.; Xin, Y. Analyzing the Spatial Scale Effects of Urban Elements on Urban Flooding Based on Multiscale Geographically Weighted Regression. J. Hydrol. 2024, 645, 132178. [Google Scholar] [CrossRef]
Hu, Y.; Liu, Y.; Zeng, H. Whole Process Assessment of Flood Resilience in Urban and Rural Communities Based on Nighttime Lights: A Case Study of Zhuozhou Flood. Urban Clim. 2025, 61, 102438. [Google Scholar] [CrossRef]
Wang, Z.; Chen, X.; Qi, Z.; Cui, C. Flood Sensitivity Assessment of Super Cities. Sci. Rep. 2023, 13, 5582. [Google Scholar] [CrossRef]
Zhu, S.; Jiang, Y.; Zhang, J.; Dai, Q.; Yang, X. Evaluating the Effect of Urban Flooding on Spatial Accessibility to Emergency Shelters Based on Social Sensing Data. Trans. GIS 2024, 28, 23–39. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30, pp. 3146–3154. [Google Scholar]
Roberts, D.R.; Bahn, V.; Ciuti, S.; Boyce, M.S.; Elith, J.; Guillera-Arroita, G.; Hauenstein, S.; Lahoz-Monfort, J.J.; Schröder, B.; Thuiller, W.; et al. Cross-Validation Strategies for Data with Temporal, Spatial, Hierarchical, or Phylogenetic Structure. Ecography 2017, 40, 913–929. [Google Scholar] [CrossRef]
Zhao, G.; Pang, B.; Xu, Z.; Peng, D.; Zuo, D. Urban Flood Susceptibility Assessment Based on Convolutional Neural Networks. J. Hydrol. 2020, 590, 125235. [Google Scholar] [CrossRef]
Pham, B.T.; Tien Bui, D.; Dholakia, M.B.; Prakash, I.; Pham, H.V. A Comparative Study of Least Square Support Vector Machines and Multiclass Alternating Decision Trees for Spatial Prediction of Rainfall-Induced Landslides in a Tropical Cyclones Area. Geotech. Geol. Eng. 2016, 34, 1807–1824. [Google Scholar] [CrossRef]
Li, Z. GeoShapley: A Game Theory Approach to Measuring Spatial Effects in Machine Learning Models. Ann. Am. Assoc. Geogr. 2024, 114, 1365–1385. [Google Scholar] [CrossRef]
Ke, E.; Zhao, J.; Zhao, Y. Investigating the Influence of Nonlinear Spatial Heterogeneity in Urban Flooding Factors Using Geographic Explainable Artificial Intelligence. J. Hydrol. 2025, 648, 132398. [Google Scholar] [CrossRef]

Figure 1. Study Area.

Figure 2. Technical flow chart of this study.

Figure 3. Flowchart for Similarity- and Diversity-Based Representative Sampling (SDRS).

Figure 4. Conditioning factors: (a) Elv; (b) ASP; (c) SLO; (d) TWI; (e) River Density (normalized); (f) DR; (g) Rainfall; (h) NDVI; (i) Road Density (normalized); (j) ISP; (k) Population Density; (l) GDP; (m) NLI; (n) Emergency Shelter Density.

Figure 5. ROC-AUC results of models under three sampling approaches (RS, SS, and SDRS) for flood susceptibility mapping: (a) TRAIN set (7:3 split); (b) TEST set (7:3 split).

Figure 6. Flood susceptibility maps generated using three sampling approaches: (a) RS, (b) SS, (c) SDRS.

Figure 7. The flood-susceptible area under different classes for the three sampling approaches.

Figure 8. Contribution of each feature to the model’s outputs based on SHAP: (a) bar chart, (b) summary plot.

Figure 9. Spatial distribution of GeoShapley-based GeoRatio.

Figure 10. Bivariate zoning of flood susceptibility and GeoRatio.

Table 1. Description for the datasets used in this study.

Data	Source	Resolution	Time
DEM	ASTER GDEM	30 m	2020
River network	OpenStreetMap	-	2020
NDVI	Landsat 8	30 m	2020
Annual precipitation	Resource and Environmental Science Data Center (RESDC)	1 km	2024
Road distribution	OpenStreetMap	-	2020
Land use	RESDC	30 m	2020
Population density	RESDC	1 km	2020
GDP	RESDC	1 km	2020
Nighttime light	RESDC	0.0083°	2020
Emergency shelters	Amap	-	2020

Table 2. List of Conditioning Factors.

Conditioning Factors	Descriptions	References
Elevation (Elv)	Reflects the topographic conditions related to surface potential energy, flow accumulation, and waterlogging.	[28]
Slope (SLO)	Affects overland flow velocity and the time of concentration.	[29]
Aspect (ASP)	Influences local precipitation receipt and incoming solar radiation conditions.	[30]
TWI	Characterizes topographic convergence and the propensity for water accumulation, thereby identifying terrain prone to waterlogging.	[28,30]
River Density	Indicates the degree of development of drainage pathways and is associated with runoff connectivity and flood propagation routes.	[31,32]
Distance to River (DR)	Areas closer to river channels are more susceptible to overflow influence and are therefore more prone to flooding.	[28,31]
NDVI	Vegetation affects runoff generation and water accumulation by intercepting rainfall, enhancing infiltration, and increasing surface roughness.	[32,33]
Rainfall	Greater rainfall intensity or amount is associated with a higher likelihood of flood occurrence.	[30]
Road Density	Roads are commonly associated with increased imperviousness and the reconfiguration of runoff pathways.	[31,34]
Impervious Surface Percentage (ISP)	Impervious surfaces reduce infiltration, amplify peak runoff, and increase drainage pressure.	[35]
Population Density	Represents the intensity of urban human activities.	[36,37]
GDP	Serves as a proxy for economic activity and development intensity, reflecting the level of urbanization and potential exposure.	[37]
Nighttime Light Index (NLI)	Depicts the spatial distribution of human activity intensity and socioeconomic vitality.	[38]
Emergency Shelter Density	Represents the spatial supply of emergency shelter resources and adaptive capacity.	[39]

Table 3. Performance of the XLNet-BiLSTM-CRF model for flood location extraction.

Index	Precision	Recall	F1-Score
XLNet-BiLSTM-CRF	0.794	0.852	0.822

Table 4. The multicollinearity diagnostics results.

Conditioning Factors	VIF	TOL
Elv	1.392	0.718
ASP	1.190	0.840
SLO	1.537	0.651
TWI	1.341	0.746
River Density	1.655	0.604
DR	1.404	0.712
Rainfall	1.440	0.694
NDVI	1.545	0.647
Road Density	1.092	0.916
ISP	3.180	0.314
Population Density	1.914	0.523
GDP	1.937	0.516
NLI	3.616	0.277
Emergency Shelter Density	1.333	0.750

Table 5. Performance indicators of different sampling approaches.

Sampling Approach	Accuracy	Precision	Recall	F1-Score
RS	0.728	0.722	0.740	0.731
SS	0.738	0.727	0.760	0.743
SDRS	0.833	0.859	0.795	0.826

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, Y.; Lu, H.; Liu, S.; Zhang, S. An Explainable Ensemble Machine Learning Framework for Flood Susceptibility Mapping Using Social Media Data: A Case Study of Guangzhou, China. Remote Sens. 2026, 18, 1495. https://doi.org/10.3390/rs18101495

AMA Style

Zhou Y, Lu H, Liu S, Zhang S. An Explainable Ensemble Machine Learning Framework for Flood Susceptibility Mapping Using Social Media Data: A Case Study of Guangzhou, China. Remote Sensing. 2026; 18(10):1495. https://doi.org/10.3390/rs18101495

Chicago/Turabian Style

Zhou, Yuhan, Haipeng Lu, Sicen Liu, and Shuliang Zhang. 2026. "An Explainable Ensemble Machine Learning Framework for Flood Susceptibility Mapping Using Social Media Data: A Case Study of Guangzhou, China" Remote Sensing 18, no. 10: 1495. https://doi.org/10.3390/rs18101495

APA Style

Zhou, Y., Lu, H., Liu, S., & Zhang, S. (2026). An Explainable Ensemble Machine Learning Framework for Flood Susceptibility Mapping Using Social Media Data: A Case Study of Guangzhou, China. Remote Sensing, 18(10), 1495. https://doi.org/10.3390/rs18101495

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Explainable Ensemble Machine Learning Framework for Flood Susceptibility Mapping Using Social Media Data: A Case Study of Guangzhou, China

Highlights

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. Datasets

2.2.1. Flood Inventory

2.2.2. Conditioning Factors

3. Methodology

3.1. Flood Location Extraction from Social Media Text Using XLNet-BiLSTM-CRF

3.2. Non-Flood Sampling

3.2.1. Random Sampling (RS)

3.2.2. Stratified Sampling (SS)

3.2.3. Similarity- and Diversity-Based Representative Sampling (SDRS)

3.3. Flood Susceptibility Modeling

3.3.1. Selection of Conditioning Factors

3.3.2. Multicollinearity Analysis of Conditioning Factors

3.3.3. Ensemble Machine Learning and Model Training

3.3.4. Model Performance Evaluation

3.4. GeoShapley-Based Model Explainability

4. Results

4.1. Validation of Social-Media-Derived Flood Locations

4.2. Multicollinearity Diagnostics of Conditioning Factors

4.3. Model Performance Results

4.4. Flood Susceptibility Mapping Results

4.5. Feature Importance and Model Explainability Results

5. Discussion

5.1. Policy Implications

5.2. Contribution to Flood Susceptibility Mapping

5.3. Research Limitations and Prospects

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI