Next Article in Journal
A Fractional Computer Virus Propagation Model with Saturation Effect
Previous Article in Journal
A Blind Few-Shot Learning for Multimodal-Biological Signals with Fractal Dimension Estimation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Flash Flood Risk Classification Using GIS-Based Fractional Order k-Means Clustering Method

1
Zhejiang Institute of Hydraulics & Estuary (Zhejiang Institute of Marine Planning & Design), Hangzhou 310020, China
2
Forestry and Water Conservancy Bureau of Changshan County, Quzhou 324299, China
3
School of Hydraulic Engineering, Zhejiang University of Water Resources and Electric Power, Hangzhou 310018, China
4
Zhejiang Key Laboratory of River-Lake Water Network Health Restoration, Hangzhou 310018, China
*
Author to whom correspondence should be addressed.
Fractal Fract. 2025, 9(9), 586; https://doi.org/10.3390/fractalfract9090586
Submission received: 3 June 2025 / Revised: 1 September 2025 / Accepted: 2 September 2025 / Published: 4 September 2025
(This article belongs to the Section Probability and Statistics)

Abstract

Flash floods arise from the interaction of rugged topography, short-duration intense rainfall, and rapid flow concentration. Conventional risk mapping often builds empirical indices with expert-assigned weights or trains supervised models on historical event inventories—approaches that degrade in data-scarce regions. We propose a fully data-driven, unsupervised Geographic Information System (GIS) framework based on fractional order k-means, which clusters multi-dimensional geospatial features without labeled flood records. Five raster layers—elevation, slope, aspect, 24 h maximum rainfall, and distance to the nearest stream—are normalized into a feature vector for each 30 m × 30 m grid cell. In a province-scale case study of Zhejiang, China, the resulting risk map aligns strongly with the observations: 95% of 1643 documented flash flood sites over the past 60 years fall within the combined high- and medium-risk zones, and 65% lie inside the high-risk class. These outcomes indicate that the fractional order distance metric captures physically realistic hazard gradients while remaining label-free. Because the workflow uses commonly available GIS inputs and open-source tooling, it is computationally efficient, reproducible, and readily transferable to other mountainous, data-poor settings. Beyond reducing subjective weighting inherent in index methods and the data demands of supervised learning, the framework offers a pragmatic baseline for regional planning and early-stage screening.

1. Introduction

Flash floods are one of the most fastest-acting and destructive natural hazards capable of transforming small upland streams into lethal torrents within an hour of intense rainfall [1]. Effective flash flood risk zoning underpins early-warning, land-use control, and infrastructure planning in steep, densely dissected basins where minutes matter [2,3]. In regions like Zhejiang, which is characterized by short-duration intense storms, complex topography, and heterogeneous settlement patterns, risk mapping must remain reliable even where event inventories are sparse or biased. This motivates the creation of methods that (i) avoid dependence on labeled floods, (ii) reduce subjective weighting of factors, and (iii) scale to province-level rasters with transparent, reproducible workflows.
Conventional approaches often construct an empirical flash flood evaluation index by combining elevation, slope, rainfall intensity, land cover, and drainage density into an index system whose factor weights are calibrated from historical records [4,5,6,7,8,9]. While such index-based methods can perform well where event catalogs are extensive and spatially uniform, many mountainous catchments suffer from fragmentary or biased records. With the development of computer science, supervised machine learning models such as random forest and neural network models have been used to learn relationships between environmental variables and observed events [10,11,12,13,14,15,16]. However, all of these methods are supervised [17]. That is, they require sufficiently large and well-distributed labeled inventories, which are often unavailable in mountain areas.
Unsupervised clustering offers a data-driven alternative. By grouping raster cells with similar geo-hydrological characteristics, it leverages the full spatial resolution of digital elevation models (DEMs), gridded rainfall, and channel proximity data without relying on historical flood records [18,19,20]. Instead of transferring weights calibrated in one basin to another, unsupervised clustering adapts automatically to local physiography, which is advantageous in data-scarce, topographically complex settings.
Typical clustering methods include the hierarchical clustering method, Gaussian mixture model (GMM), and k-means clustering [21,22]. Hierarchical clustering constructs a dendrogram by iteratively merging clusters based on a chosen linkage criterion, which allows one to explore cluster structure at multiple scales [23]. GMMs assume that the dataset follows a mixture of multivariate normal distributions, and the Expectation–Maximization algorithm is used to estimate mixture weights, means, and covariances [24]. In contrast, k-means assigns each point to exactly one cluster (hard membership) and alternates between assignment and centroid update to minimize within-cluster squared distances [25].
Among these, k-means is particularly appealing for raster-based risk zoning [26,27]. It is simple, scales well, and is straightforward to implement in a GIS environment where each grid cell corresponds to a feature vector [28,29]. However, in high-dimensional geospatial feature spaces, the Euclidean metric can blur contrasts between ordinary and extreme cells, yielding overly smooth maps that miss hazardous combinations (e.g., steep, rain-saturated slopes near streams) or, conversely, fragmented clusters [30,31,32]. By extending k-means to a fractional-order distance metric, one gains additional flexibility in capturing subtler variations among geo-environmental variables while retaining the original algorithm’s computational benefits [13,33,34].
In this study, we develop a GIS-based fractional order k-means framework and apply it to 30 m raster data for Zhejiang Province [35,36,37]. Here, GIS processing was performed in QGIS (free and open-source), and the clustering algorithm was implemented in Python 3.13. Five geo-variables, including elevation, slope, aspect, 24 h maximum rainfall, and distance to the nearest river channel, are normalized and clustered into three risk classes [38]. This work’s novelties are multi-fold: (a) we propose an unsupervised, GIS-based fractional order k-means framework that maps flash flood risk without historical event labels, reducing subjectivity and improving regional transferability; (b) we introduce a fractional order distance metric that better separates hazardous feature combinations while retaining the simplicity and efficiency of k-means; and (c) we deliver a province-scale application (Zhejiang) using open-source tooling and commonly available rasters with validation against 1643 documented sites to demonstrate robustness.
The remainder of this article is organized as follows: Section 3 details the selected input variables and mathematical principles of the GIS-based fractional order k-means clustering algorithm; Section 4 describes the study area and original raster dataset; Section 5 presents the pre-processing and clustering results, as well as the model validation based on historical flash floods; Section 6 discusses the limitations and advantages of the proposed risk classification method on the basis of the GIS-based fractional order k-means clustering algorithm; and Section 7 concludes with key findings and future research directions.

2. Study Area

Zhejiang Province, China lies on the southeast coast of China, bordered by the East China Sea (Figure 1a). Administratively, it comprises 11 prefecture-level cities (Figure 1b). The province transitions rapidly from low-lying coastal plains to steep mountain ranges in the interior, producing dense, deeply incised drainage networks. Mountainous counties are highly prone to short-lived but destructive flash floods during the warm season.
Zhejiang has a humid subtropical monsoon climate with a marked wet season from late spring to early autumn. Two synoptic regimes dominate extreme precipitation: (i) the Mei-yu (plum-rain) front, which can stall and deliver multi-day accumulations, and (ii) landfalls or peripheral passages of western North Pacific typhoons, which bring intense short-duration bursts. Orographic enhancement along the windward slopes further concentrates rainfall, while convective cells embedded in frontal bands produce high sub-daily intensities conducive to rapid runoff generation.
The combination of steep headwater catchments, thin regolith, and high drainage density yields short concentration times, so rainfall spikes quickly translate into channel response. Transportation corridors, culverts, and channelization within populated valleys can accelerate conveyance and redirect flows, whereas small reservoirs and check dams may locally attenuate peaks. Rapid urbanization in the coastal valley belt increases impervious surfaces and exposure. These factors, together with spatially heterogeneous settlements, shape both hazard and reporting patterns.
Given this setting and the availability of documented historical flash flood sites, Zhejiang provides a suitable test bed to evaluate our GIS-based fractional order k-means clustering framework for flash flood risk zoning. The added climate and exposure context also motivates the predictor set used in this study (terrain attributes, extreme rainfall indicator, and channel proximity) and the choice of a fine (30 m) grid to resolve slope and drainage controls.

3. Method

3.1. GIS-Based Raster Quantification

We select five physiographic and hydrological input variables including elevation Z i , slope S i , aspect Θ i , 24 h maximum rainfall R i , and distance to the nearest stream D i based on their critical roles in mountain flash flood generation [39]. Elevation Z i governs potential energy and flow pathways; slope S i controls the acceleration and volume of surface runoff; aspect Θ i influences microclimate, vegetation, and soil moisture; rainfall intensity R i denotes the single largest 24 h accumulation observed in the past 60 years and directly triggers peak flows; and channel proximity determines flow concentration and timing.
The aspect Θ i reflects the compass direction of steepest descent, which can be computed as follows:
Θ i = arctan 2 q , p , if p 0 or q 0 , undefined ( flat ) , if p = 0 and q = 0 ,
where arctan 2 ( y , x ) returns an angle in ( π , π ] . Θ i is shift to [ 0 , 2 π ) by adding 2 π to negative values. Any cell where p = q = 0 is flat, and its aspect is value-masked.
The recorded 24 h maximum rainfall R i is available only at 167 scattered station locations. To assign a rainfall value R i to each DEM grid cell i, we employ the Inverse Distance Weighting (IDW) method, which estimates R i as a weighted average of nearby station observations [40,41].
To compute the Euclidean-approximate distance D i from each grid cell i to its nearest river channel, we first rasterize the vector river network shapefile onto the same 30 m grid. All cells intersecting a river segment are assigned a binary value of 1, and all other cells are 0. Let this binary raster be denoted by B m , n , where B m , n = 1 if cell ( m , n ) lies on a river and B m , n = 0 otherwise. We then apply a two-pass Chamfer Distance Transform (CDT) to approximate the Euclidean distance [42]. In the forward pass, each cell’s distance D m , n ( 1 ) is initialized as follows:
D m , n ( 1 ) = 0 , B m , n = 1 , , B m , n = 0 .
The updated algorithm is as follows:
D m , n ( 1 ) = min { D m , n ( 1 ) , D m 1 , n ( 1 ) + w v , D m , n 1 ( 1 ) + w h , D m 1 , n 1 ( 1 ) + w d ( 1 ) , D m 1 , n + 1 ( 1 ) + w d ( 2 ) } ,
where w v = w h = 30 , w d ( 1 ) = w d ( 2 ) = 30 2 42.4 . In the backward pass, we set the following:
D m , n = min D m , n ( 1 ) , D m + 1 , n ( 1 ) + w v , D m , n + 1 ( 1 ) + w h , D m + 1 , n + 1 ( 1 ) + w d ( 1 ) , D m + 1 , n 1 ( 1 ) + w d ( 2 ) .
After both passes, each cell’s D m , n ( CDT ) approximates the true Euclidean distance in meters. Thus, we have the following:
D i = D m , n ( CDT ) .
Both slope S i and aspect Θ i are derived directly from the DEM using a standard 3 × 3 moving-window algorithm. Let z m , n denote the elevation at the ( m , n ) th cell within a 3 × 3 neighborhood centered on cell i. Denote the cell size by Δ x = Δ y = 30 m. We first compute the partial derivatives of elevation in the x- and y-directions q x and q y , and then the slope magnitude is given by the following:
S i = arctan q x 2 + q y 2 .
The aspect Θ i reflects the compass direction of steepest descent, which can be computed as follows:
Θ i = arctan 2 ( q y , q x ) , if q x 0 or q y 0 , undefined ( flat ) , if q x = 0 and q y = 0 ,
where arctan 2 ( y , x ) returns an angle in ( π , π ] . Θ i is shift to [ 0 , 2 π ) by adding 2 π to negative values. Any cell where q x = q y = 0 is flat, and its aspect is value-masked.

3.2. Fractional Order k-Means Clustering

First, each input raster variable is standardized to have zero mean and unit variance, thereby removing differences in units and scales. The exact transformation formulas and their associated statistical parameters for each variable are detailed in Table 1.
Here, i is the index of a raster cell; μ Z , σ Z are the mean and standard deviation of the elevations, respectively; μ S , σ S are the mean and standard deviation of the slopes, respectively; μ sin Θ , σ sin Θ are the mean and std. of sin ( Θ i ) , respectively; μ cos Θ , σ cos Θ are the mean and std. of cos ( Θ i ) , respectively; μ R , σ R are the mean and std. of 24 h maximum rainfall R i , respectively; μ ln D , σ ln D are the mean and std. of ln ( D i + 1 ) , respectively; and z i , s i , a i , 1 , a i , 2 , r i , d i are normalized features for cell i.
Each valid cell i has the final feature vector, and the feature vector x of ith cell can be written as follows:
x i = [ z i , s i , a i , 1 , a i , 2 , r i , d i ] R 6 .
Figure 2 illustrates the iterative procedure of the k-means clustering for the case of three clusters [43]. Initially, three centroids (C1, C2, and C3) are randomly placed in the feature space. Then, each data point is assigned to the nearest centroid according to Euclidean distance, forming three preliminary clusters, as shown in Figure 2b. Once all points have been labeled, the centroids are recomputed by calculating the arithmetic mean of the feature vectors within each of the clusters. Figure 2d shows the new cluster assignments after updating the centroids: points whose closest centroid has changed are reassigned, and centroids are again recalculated until the assignment no longer changes. This process of assign and update repeats iteratively until convergence is reached. For the fractional-order k-means framework, the same workflow applies, except that the distance metric in the assignment step is modified to a fractional-order norm, allowing for more flexible weighting of feature differences and improved separation of risk zones in complex terrains.
The empirical weights can be assigned as follows:
w = [ w z , w s , w a 1 , w a 2 , w r , w d ] ,
with w being the vector of empirical weights; w z , w s , w r , w a 1 , w a 2 , and w d are empirical weights of z, s, r, a 1 , a 2 , and d, respectively. Here, we set the weights w z , w s , w r larger than w a 1 , w a 2 , w d so that differences in z, s, r influence the clustering more than w a 1 , w a 2 , w d . In the hazard index, distance is subtracted, so being farther from a river lowers the risk, while more rainfall and steeper slopes raise it.
We select the initial centroids using the k-means++ strategy under the weighted fractional distance d p , w . Here, p is the order of the the p-norm distance L p metric used in d p , w . First, choose one centroid uniformly at random from all feature vectors. Then, for each remaining centroid, sample a new point with probability proportional to d p , w ( x i , C chosen ) p . This ensures that the three initial centroids typically lie in regions of high rainfall, steep slope, and close proximity to streams. C chosen is the set of centroids already chosen during k-means++ seeding. It starts with one random centroid and grows until K centroids are selected.
For each feature vector x i , compute its distance to all centroids using
d p , w ( x i , c k ) = j = 1 6 w j | x i , j c k , j | p 1 / p .
Assign x i to cluster C k , whose centroid minimizes this distance. C k is the set of samples assigned to cluster k. c k , a single 6D vector in feature space, is the centroid of cluster k. Because 0 < p < 1 , this fractional distance tends to pull out “extreme” cells, highlighting high-risk areas more sharply than the Euclidean norm. The number of clusters is set to 3, corresponding to high risk, medium risk, and high risk of mountain flash, respectively. The objective function is thus as follows:
J p , w = k = 1 3 x i C k j = 1 6 w j | x i , j c k , j | p .
For each cluster k and feature index j, solve
x i C k sgn ( c k , j x i , j ) | c k , j x i , j | p 1 = 0 c k , j = Lp median .
Then, stop iterating when either
max k c k ( new ) c k ( old ) < 10 3 or | J new J old | J old < 10 4 .
If any cluster becomes empty, re-initialize its centroid by selecting the cell with the largest current distance error. To mitigate local minima, we run the algorithm from n = 5 random initializations and report the solution with the smallest J p , w . Increasing n beyond five produced negligible changes in the objective and cluster assignments in our pilots while increasing runtime linearly.
Aspect is included during clustering via sin Θ i and cos Θ i to shape the feature space, but we omit it from the monotone hazard index H k , because its effect is directional and non-monotonic at the province scale; instead, aspect influences risk indirectly through the learned clusters. We quantify the relative mountain flash flood risk of each cluster by a linear hazard index H k defined over the cluster’s mean feature values:
H k = β r r ¯ k + β s s ¯ k + β z z ¯ k β d d ¯ k ,
where r ¯ k , s ¯ k , z ¯ k , d ¯ k are the means of the normalized rainfall, slope, elevation, and log-distance features in cluster k, respectively, β r , β s , β z , β d > 0 are weighting coefficients reflecting the physical importance of each factor, with β j = 1 . Risk is primarily driven by the elevation gradient (slope); accordingly, we set β s > β z . Absolute elevation is used only as a weak proxy for headwater physiography and is interpreted jointly with slope. In regions where elevation is not monotonic with hazard, β z may be reduced or replaced by a relative relief/TPI term.
Once H k is computed for k = 1 , 2 , 3 , we rank the clusters in descending order of H k :
  • max k H k high-risk area;
  •   middle k H k medium-risk area;
  • min k H k low-risk area.
Figure 3 shows the flowchart of the GIS-based fractional order k-means clustering algorithm. Here, all input variables are represented as spatially referenced raster grids, each grid cell is identified by its geographic coordinates, and the corresponding feature values (i.e., elevation, slope, aspect, 24 h maximum rainfall, and distance to nearest river) are stored in array form so that the data sequence directly reflects the spatial arrangement of each 30 m × 30 m grid cell.

4. Dataset

As detailed in Section 3.1, five input variables are used for clustering the flash-flood risk level of each 30 m × 30 m grid cell: elevation Z i , slope S i , aspect Θ i , 24 h maximum rainfall R i , and distance to the nearest river channel D i . The original datasets consist of a 30 m × 30 m DEM (GeoTIFF), a vector river network shapefile (SHP), and recorded 24 h maximum rainfall observations at 167 stations. Figure 4a displays the 30 m resolution digital elevation model (DEM) of Zhejiang Province, where terrain slopes downward from the forested mountains in the southwest, whose peaks exceed 1800 m, to the deeply indented coastline and deltaic plains in the northeast. All raster layers used in this study were reprojected into the China Geodetic Coordinate System 2000 (CGCS2000) and clipped to Zhejiang’s administrative boundary to ensure spatial consistency across datasets. Figure 4b presents a high-resolution, province-wide spatial distribution of river network (SHP file) derived from the DEM and supplementary remote sensing data.
Figure 5 illustrates the spatial distribution of 167 rainfall monitoring stations across Zhejiang Province. These stations have been recording 24 h precipitation totals since 1970s, capturing both common monsoonal rains and extreme events. The provincial maximum 24 h total exceeds 600 mm, reflecting the potential for exceptionally intense convective outbreaks during the mei-yu and typhoon seasons. The long temporal span of the dataset (over 50 years) provides a robust basis for characterizing the climatological baseline, which are critical inputs for our flash flood risk zoning.
Figure 6 overlays a 5 km scale resolution DEM with the river network for a typical mountainous area in Liandu district. A lattice of steep ridges and narrow V-shaped valleys dominates the landscape. The drainage pattern is equally dense—tributaries emerge within a few hundred meters of most ridge crests and merge rapidly into deeply incised valley floors, leaving little buffering time between rainfall input and channel response. In many places, the blue river polylines run almost flush against dark-shaded valley walls, underscoring how closely streams track the steepest gravitational flow paths. High-resolution data (30 m) are therefore essential to (i) resolve the short hillslope travel distances that characterize headwater torrents, (ii) distinguish micro-topographic controls on flow direction, and (iii) preserve the sharp, valley-confined rainfall maxima typical of convective storms in this humid subtropical setting.

5. Results

5.1. GIS-Based Raster Quantification

The GIS-based raster quantification is to determine the five predictor variables from the original source layers while preserving the 30 m × 30 m resolution for every grid cell i. Figure 7a,b illustrate the resulting slope and aspect for each grid cell, respectively. They are determined from DEM (see Figure 4a). Steeper slopes concentrate in the interior mountain belts while low slopes dominate the coastal plains; aspect patterns illustrate prevailing hillslope orientations, which—together with slope—govern runoff concentration and thus risk.
Figure 8 displays the spatially interpolated 24 h maximum rainfall surface R i over the study area. The analysis indicates that the 24 h maximum rainfall in Zhejiang Province varies between 150 mm and 680 mm. Overall, the region is characterized by frequent short-duration, high-intensity precipitation events, which substantially increase the likelihood of flood occurrence. Spatial heterogeneity is evident, with the southeastern coastal areas exhibiting markedly higher 24 h maximum rainfall, whereas the western regions experience comparatively lower values.
Figure 9 illustrates the resulting distance of each grid cell to the nearest river channel. Overall, the river network in Zhejiang Province is highly dense, with most areas located less than 3 km from the nearest river. In the northeastern part of the province, the network is particularly dense, with much of the land situated within 1 km of a river, and many areas even less than 500 m away. In contrast, the southwestern mountainous region has a lower river network density, yet the distance from most areas to a river remains relatively short. From the perspective of river network density, the province as a whole faces inherently high natural risks of flood disasters. Furthermore, considering the topographic and climatic conditions, the southwestern region not only has a developed river system but also features steep mountainous terrain and high rainfall, with frequent short-duration heavy precipitation, all of which create favorable conditions for the occurrence of flash floods.

5.2. Flash Flood Risk Classification

A GIS-based fractional order k-means algorithm is employed to classify the flash flood risk level of the study area based on the five input variables determined in Section 5.1 (e.g., elevation Z i , slope S i , aspect Θ i , 24 h maximum rainfall R i , and distance to the nearest river channel D i ). Given a grid resolution of 30 m × 30 m and a total area of 105,600 km2 for Zhejiang Province, the number of grid cells exceeds 1.17 × 10 8 . Training on the entire dataset is thus computationally expensive. To reduce the computing cost, 20% of the grids are randomly selected for training to determine the cluster centroids, and the remaining 80% are assigned to the nearest centroid [44]. Each raster was normalized to zero mean and unit variance following the formulas in Table 1, producing the six-dimensional feature vector x i used for clustering. Figure 10 shows the variation in the within-cluster sum of squared distances with the number of clusters k. An evident elbow is observed at k = 3 , indicating that three clusters provide an optimal partition of the data [45]. These clusters are accordingly defined as low-risk, medium-risk, and high-risk areas.
Figure 11 distinguishes three flash flood susceptibility classes: low risk (yellow), medium risk (orange), and high risk (red). Medium-risk and high-risk zones are mainly in southwestern Zhejiang Province, where steep, dissected terrain and large elevation gradients prevail. In contrast, the northern part of Zhejiang Province occupies the Hangjiahu Plain. Despite its dense river network, the broad and low-relief topography keeps the risk of potential flash flood relatively low. Most high-risk cells lie along river corridors and at the foot of mountains that favor a rapid runoff concentration during short-duration and high-intensity runoff. As mountain catchments possess large contributing areas relative to channel length, intense rainfall in mountain areas is efficiently converted into peak discharges, making high flash floods a possibility.
Figure 12 presents the spatial distribution of each class along (a) latitude X and (b) longitude Y. Low-risk grid cells are the most abundant, followed by high-risk grid cells, whereas the number of medium-risk grid cells is relatively smaller. The concentration of high-risk grid cells in southwestern Zhejiang Province reflects the region’s steep, highly dissected terrain and dense river network. Grid cells situated at mountain foots and along river corridors are therefore frequently classified as high risk. By contrast, the northeastern part of the province is dominated by the broad, low-relief Hangjiahu Plain, so a large proportion of its grid cells fall into the low-risk class. Medium-risk grid cells mainly occupy the transition belt between these two extremes; hence, they have a smaller overall count.
Table 2 shows that factor means are not monotonic from low to high risk. This is expected because clusters are defined jointly in the 6D feature space, and high, medium, and low labels are assigned after clustering using the composite hazard index H k (positive for rainfall and slope, negative for distance to channel). Consequently, class means can move in different directions across variables due to trade-offs. In our case, the medium-risk cluster represents headwater or interfluve terrain—higher elevations but larger distances to channels and moderate slopes—whereas the high-risk cluster captures steeper, channel-proximal valley sides or hollows. Low-risk areas are coastal plains and wide valleys with gentle slopes. Thus, the ranking reflects the combined effect of rainfall, slope, elevation, and proximity to channels rather than a monotonic change along any single factor. It can be seen that the low-risk zone occupies the lowest, flattest terrain with a mean elevation of 82 m and mean slope of 8.5°. The medium-risk zone corresponds to the highest, steepest ridges with a mean elevation of 887 m and mean slope of 20°, but these zones are slightly farther from rivers than high-risk zones. High-risk cells sit at intermediate elevations with a mean elevation of 433 m, exhibit large slope variability, and are, on average, closest to river channels, mirroring the dissected foothills and valley bottoms where flash flood impacts concentrate. Rainfall extremes are similar across classes, so topography and channel proximity, rather than precipitation alone, drive the differentiation of flash flood susceptibility.

5.3. Model Validation Based on Historical Data

Figure 13 compares the clustering results with the 1643 documented flash flood events in Zhejiang Province since 1970s. The background colors denote risk zones, i.e., red for high-risk areas, orange for medium-risk areas, and yellow for low-risk areas. Green circles indicate locations of historical flash flood events. Over 95% of recorded events fall within the medium-risk areas or high-risk areas, and 75% of these occur in the high-risk areas. Only 5% lie in low-risk regions. Because most historical events were reported by residents of nearby villages, the data are logged as points rather than areas. Many of the apparent miss-predicted data points are situated just outside the high and medium boundary or along minor tributaries that the river network extraction threshold may have omitted, suggesting that a finer-resolution drainage mask or local rainfall and runoff factors might reconcile these discrepancies. In addition, some high-risk classifications extend into sparsely populated ridges where no events have been recorded. These areas have similar topography and rainfall characteristics with documented sites but lack gauge coverage, so their apparent over-prediction likely reflects reporting bias rather than model error.

6. Discussion

This study introduced a GIS-based fractional-order k-means clustering framework for flash flood risk zoning in mountainous regions, with Zhejiang Province, China, serving as the case study. The proposed method produces risk classifications without relying on extensive historical flood records. The fractional-order formulation allows for the more flexible weighting of inter-feature distances, enabling the clustering algorithm to capture subtle variations in terrain and hydro-meteorological conditions. As demonstrated by our validation against 1643 documented flash-flood sites, the results confirm that the proposed model is both robust and transferable, particularly valuable for data-scarce areas where conventional supervised models may falter due to insufficient historical data.
Unlike supervised models such as SVMs and ANNs that require historical labels, and unlike index-based schemes that depend on expert weighting, our method is an unsupervised, fully quantitative framework that learns risk structure directly from geospatial features. Despite these advantages, several limitations remain. First, the current workflow does not incorporate hydraulic infrastructure (e.g., dams, levees, engineered drainage networks) that can modify flow pathways and attenuate peaks. In heavily engineered basins, geomorphically high-risk zones may exhibit a less-realized flash flood hazard because of upstream retention and conveyance. Second, we use the recorded 24 h maximum rainfall as the sole precipitation indicator; this omits frequency/return-period information, so two watersheds with identical 24 h maxima can have very different exceedance probabilities. The limited availability of long-term, high-resolution precipitation observations in many mountain regions currently constrains the inclusion of frequency-based metrics. Third, as noted in Section 5.3, the inventory of historical flash flood sites derives from resident reports; sparsely populated areas are therefore under-represented, which may inflate apparent performance in populated valleys while leaving remote catchments insufficiently validated.
Several directions for future work could address these shortcomings. Incorporating infrastructure layers as geospatial predictors—or explicitly modeling their effects on routing and attenuation—would allow for risk classifications to reflect both natural susceptibility and engineered protection. Developing a rainfall frequency module (e.g., regional intensity–duration–frequency curves or extreme value analysis) would provide a more realistic characterization of precipitation extremes. To mitigate population-based reporting bias, future studies should leverage independent validation sources such as remote sensing-derived inundation footprints and stream-gauge records to capture events in less accessible areas. Finally, extending the framework to semi-supervised or hybrid clustering could improve accuracy where limited labels exist while preserving the transferability benefits of an unsupervised approach.

7. Conclusions

In this work, we developed a GIS-based fractional-order k-means clustering framework for flash-flood risk zoning and applied it to Zhejiang Province, China. The study area was discretized into 30 m × 30 m grid cells and clustered using five geospatial factors: elevation Z i , slope S i , aspect Θ i , 24 h maximum rainfall R i , and distance to the nearest river channel D i . The central innovation is the fractional order distance metric, which flexibly weights inter-feature differences, yielding a quantitative, label-free alternative to index-based schemes and supervised models. Our concluding remarks are as follows:
(1)
Evidence of effectiveness. Validation against 1643 documented flash flood sites shows that 95% of recorded events fall within the model’s high- and medium-risk zones, with 75% correctly identified as high-risk zones. These results indicate that the unsupervised formulation captures key terrain and hydro-meteorological controls without relying on historical labels.
(2)
Practical implications and transferability. Because the workflow operates on commonly available GIS layers and an extreme rainfall indicator, it is computationally efficient and readily transferable to data-scarce mountainous regions. In practice, users need only assemble the input stack, choose k via internal clustering metrics and spatial realism checks, and tune the fractional order α to local feature contrasts.
(3)
Limitations. The current implementation does not explicitly represent hydraulic infrastructure (dams, levees, engineered drainage), and it uses the recorded 24 h maximum rainfall without incorporating frequency/return-period information. In addition, validation sites are drawn largely from populated valleys, introducing potential reporting bias.
(4)
Future directions. Future work will (i) integrate infrastructure layers or routing effects, (ii) add rainfall frequency information, (iii) broaden validation with remote sensing-derived inundation footprints and gauge records, and (iv) explore semi-supervised or hybrid clustering to leverage limited labels while preserving transferability.
Overall, the fractional-order k-means clustering method offers a robust and computationally efficient tool for flash flood risk zoning in topographically complex regions. The framework integrates with DEM-derived morphometrics and rainfall proxies for rapid screening and prioritization. By applying the fractional order, it produces risk maps that better capture local hydrologic extremes while damping noise in sparse observations.

Author Contributions

Conceptualization, H.L.; methodology, H.L. and Z.M.; validation, J.H., L.H. and J.Z.; formal analysis, X.Z., Y.F. and L.W.; writing—original draft preparation, H.L.; writing—review and editing, Z.M.; supervision, X.W.; project administration, X.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Funds of Scientific Research Institutes for the Provincial Institute of Zhejiang (Grant No. ZIHEYS24002) and the Joint Funds of the Zhejiang Provincial Natural Science Foundation of China (Grant No. LZJMZ24D050007, LZJWY24E090005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Hapuarachchi, H.; Wang, Q.; Pagano, T. A review of advances in flash flood forecasting. Hydrol. Processes 2011, 25, 2771–2784. [Google Scholar] [CrossRef]
  2. Montz, B.E.; Gruntfest, E. Flash flood mitigation: Recommendations for research and applications. Glob. Environ. Change Part B Environ. Hazards 2002, 4, 15–22. [Google Scholar] [CrossRef]
  3. Gao, J.; Shi, H.; Zang, J.; Liu, Y. Mechanism analysis on the mitigation of harbor resonance by periodic undulating topography. Ocean Eng. 2023, 281, 114923. [Google Scholar] [CrossRef]
  4. Schroeder, A.J.; Gourley, J.J.; Hardy, J.; Henderson, J.J.; Parhi, P.; Rahmani, V.; Reed, K.A.; Schumacher, R.S.; Smith, B.K.; Taraldsen, M.J. The development of a flash flood severity index. J. Hydrol. 2016, 541, 523–532. [Google Scholar] [CrossRef]
  5. Davis, R.S. Flash flood forecast and detection methods. In Severe Convective Storms; Springer: Berlin/Heidelberg, Germany, 2001; pp. 481–525. [Google Scholar]
  6. Doswell, C.A., III; Brooks, H.E.; Maddox, R.A. Flash flood forecasting: An ingredients-based methodology. Weather Forecast. 1996, 11, 560–581. [Google Scholar] [CrossRef]
  7. Zhang, J.; Zhang, X.; Li, H.; Fan, Y.; Meng, Z.; Liu, D.; Pan, S. Optimization of Water Quantity Allocation in Multi-Source Urban Water Supply Systems Using Graph Theory. Water 2025, 17, 61. [Google Scholar] [CrossRef]
  8. Gao, J.; Ma, X.; Dong, G.; Chen, H.; Liu, Q.; Zang, J. Investigation on the effects of Bragg reflection on harbor oscillations. Coast. Eng. 2021, 170, 103977. [Google Scholar] [CrossRef]
  9. Kim, B.S.; Kim, H.S. Evaluation of flash flood severity in K orea using the modified flash flood index (MFFI). J. Flood Risk Manag. 2014, 7, 344–356. [Google Scholar] [CrossRef]
  10. Ali, K.; Bajracharyar, R.M.; Raut, N. Advances and challenges in flash flood risk assessment: A review. J. Geogr. Nat. Disasters 2017, 7, 2. [Google Scholar] [CrossRef]
  11. Chen, H.; Huang, S.; Qiu, H.; Xu, Y.P.; Teegavarapu, R.S.; Guo, Y.; Nie, H.; Xie, H.; Xie, J.; Shao, Y.; et al. Assessment of ecological flow in river basins at a global scale: Insights on baseflow dynamics and hydrological health. Ecol. Indic. 2025, 178, 113868. [Google Scholar] [CrossRef]
  12. Chen, H.; Xu, B.; Qiu, H.; Huang, S.; Teegavarapu, R.S.; Xu, Y.P.; Guo, Y.; Nie, H.; Xie, H. Adaptive assessment of reservoir scheduling to hydro-meteorological comprehensive dry and wet condition evolution in a multi-reservoir region of southeastern China. J. Hydrol. 2025, 648, 132392. [Google Scholar] [CrossRef]
  13. Meng, Z.; Hu, Y.; Jiang, S.; Zheng, S.; Zhang, J.; Yuan, Z.; Yao, S. Slope Deformation Prediction Combining Particle Swarm Optimization-Based Fractional-Order Grey Model and K-Means Clustering. Fractal Fract. 2025, 9, 210. [Google Scholar] [CrossRef]
  14. Liu, X.; Li, X.; Ma, G.; Rezania, M. Characterization of spatially varying soil properties using an innovative constraint seed method. Comput. Geotech. 2025, 183, 107184. [Google Scholar] [CrossRef]
  15. Gao, J.; Hou, L.; Liu, Y.; Shi, H. Influences of bragg reflection on harbor resonance triggered by irregular wave groups. Ocean Eng. 2024, 305, 117941. [Google Scholar] [CrossRef]
  16. Gao, J.; Ma, X.; Zang, J.; Dong, G.; Ma, X.; Zhu, Y.; Zhou, L. Numerical investigation of harbor oscillations induced by focused transient wave groups. Coast. Eng. 2020, 158, 103670. [Google Scholar] [CrossRef]
  17. Barlow, H.B. Unsupervised learning. Neural Comput. 1989, 1, 295–311. [Google Scholar] [CrossRef]
  18. Abu El-Magd, S.A.; Orabi, H.O.; Ali, S.A.; Parvin, F.; Pham, Q.B. An integrated approach for evaluating the flash flood risk and potential erosion using the hydrologic indices and morpho-tectonic parameters. Environ. Earth Sci. 2021, 80, 694. [Google Scholar] [CrossRef]
  19. Liu, T.; Yu, H.; Blair, R.H. Stability estimation for unsupervised clustering: A review. Wiley Interdiscip. Rev. Comput. Stat. 2022, 14, e1575. [Google Scholar] [CrossRef] [PubMed]
  20. Grira, N.; Crucianu, M.; Boujemaa, N. Unsupervised and semi-supervised clustering: A brief survey. In A Review of Machine Learning Techniques for Processing Multimedia Content; 2025; Volume 1, pp. 9–16. Available online: https://deptinfo.cnam.fr/~crucianm/src/BriefSurveyClustering.pdf (accessed on 2 June 2025).
  21. Omran, M.G.; Engelbrecht, A.P.; Salman, A. An overview of clustering methods. Intell. Data Anal. 2007, 11, 583–605. [Google Scholar] [CrossRef]
  22. García-Escudero, L.A.; Gordaliza, A.; Matrán, C.; Mayo-Iscar, A. A review of robust clustering methods. Adv. Data Anal. Classif. 2010, 4, 89–109. [Google Scholar] [CrossRef]
  23. Murtagh, F.; Contreras, P. Algorithms for hierarchical clustering: An overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2012, 2, 86–97. [Google Scholar] [CrossRef]
  24. Viroli, C.; McLachlan, G.J. Deep Gaussian mixture models. Stat. Comput. 2019, 29, 43–51. [Google Scholar] [CrossRef]
  25. Kodinariya, T.M.; Makwana, P.R. Review on determining number of Cluster in K-Means Clustering. Int. J. 2013, 1, 90–95. [Google Scholar]
  26. Sinaga, K.P.; Yang, M.S. Unsupervised K-means clustering algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
  27. Li, J.; Meng, Z.; Zhang, J.; Chen, Y.; Yao, J.; Li, X.; Qin, P.; Liu, X.; Cheng, C. Prediction of Seawater Intrusion Run-Up Distance Based on K-Means Clustering and ANN Model. J. Mar. Sci. Eng. 2025, 13, 377. [Google Scholar] [CrossRef]
  28. Grubesic, T.H.; Murray, A.T. Detecting hot spots using cluster analysis and GIS. In Proceedings of the Fifth Annual International Crime Mapping Research Conference, Dallas, TX, USA, 1–4 December 2001; Volume 26. [Google Scholar]
  29. Fan, B. A hybrid spatial data clustering method for site selection: The data driven approach of GIS mining. Expert Syst. Appl. 2009, 36, 3923–3936. [Google Scholar] [CrossRef]
  30. Hamfelt, A.; Karlsson, M.; Thierfelder, T.; Valkovsky, V. Beyond K-means: Clusters identification for GIS. In Information Fusion and Geographic Information Systems Towards the Digital Ocean; Springer: Berlin/Heidelberg, Germany, 2011; pp. 93–105. [Google Scholar]
  31. Eghtesadifard, M.; Afkhami, P.; Bazyar, A. An integrated approach to the selection of municipal solid waste landfills through GIS, K-Means and multi-criteria decision analysis. Environ. Res. 2020, 185, 109348. [Google Scholar] [CrossRef]
  32. Soor, S.; Challa, A.; Danda, S.; Sagar, B.D.; Najman, L. Iterated watersheds, a connected variation in k-means for clustering gis data. IEEE Trans. Emerg. Top. Comput. 2019, 9, 626–636. [Google Scholar] [CrossRef]
  33. Sinan, M.; Leng, J.; Shah, K.; Abdeljawad, T. Advances in numerical simulation with a clustering method based on K-means algorithm and Adams Bashforth scheme for fractional order laser chaotic system. Alex. Eng. J. 2023, 75, 165–179. [Google Scholar] [CrossRef]
  34. Xu, D.; Li, Y.; Yuan, Y.H.; Qiang, J.; Zhu, Y. Incomplete Multi-Kernel k-Means Clustering With Fractional-Order Embedding. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–18 December 2024; pp. 1144–1150. [Google Scholar]
  35. Liu, X.; Sun, Y.; Chen, H.; Hong, J.; Chen, C.; Dong, H.; Fang, K. Investigation and Analysis of Torrential Rain and Flooding Caused by Typhoon Lekima in Wenzhou City, Zhejiang Province. In Proceedings of the 2024 IEEE International Conference on Smart Internet of Things (SmartIoT), Shenzhen, China, 14–16 November 2024; pp. 452–456. [Google Scholar]
  36. Feng, L.; Hong, W. Characteristics of drought and flood in Zhejiang Province, East China: Past and future. Chin. Geogr. Sci. 2007, 17, 257–264. [Google Scholar] [CrossRef]
  37. Cui, Y.l.; Hu, J.h.; Xu, C.; Zheng, J.; Wei, J.b. A catastrophic natural disaster chain of typhoon-rainstorm-landslide-barrier lake-flooding in Zhejiang Province, China. J. Mt. Sci. 2021, 18, 2108–2119. [Google Scholar] [CrossRef]
  38. Yuhao, W.; Xiaohui, W.; Xiaoli, G.; Ju, T.; Yan, J.; Wangxing, X. Investigating the characteristics of short-duration heavy rainfall during flood season in Zhejiang Province using the minute rain gauge data. Torrential Rain Disasters 2025, 44, 60–70. [Google Scholar]
  39. Diakakis, M.; Deligiannakis, G.; Pallikarakis, A.; Skordoulis, M. Factors controlling the spatial distribution of flash flooding in the complex environment of a metropolitan urban area. The case of Athens 2013 flash flood event. Int. J. Disaster Risk Reduct. 2016, 18, 171–180. [Google Scholar] [CrossRef]
  40. Chen, F.W.; Liu, C.W. Estimation of the spatial rainfall distribution using inverse distance weighting (IDW) in the middle of Taiwan. Paddy Water Environ. 2012, 10, 209–222. [Google Scholar] [CrossRef]
  41. Bartier, P.M.; Keller, C.P. Multivariate interpolation to incorporate thematic surface data using inverse distance weighting (IDW). Comput. Geosci. 1996, 22, 795–799. [Google Scholar] [CrossRef]
  42. Kwon, J.S.; Choi, J.H.; Choi, J.S. Two-dimensional object recognition using chamfer distance transform on morphological skeleton. In Proceedings of the Visual Communications and Image Processing’95, Taipei, China, 23–26 May 1995; Volume 2501, pp. 1750–1761. [Google Scholar]
  43. Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
  44. Zhu, X.; Vondrick, C.; Fowlkes, C.C.; Ramanan, D. Do we need more training data? Int. J. Comput. Vis. 2016, 119, 76–92. [Google Scholar] [CrossRef]
  45. Pham, D.T.; Dimov, S.S.; Nguyen, C.D. Selection of K in K-means clustering. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2005, 219, 103–119. [Google Scholar] [CrossRef]
Figure 1. The study area Zhejiang Province: (a) geographical location; (b) administrative divisions.
Figure 1. The study area Zhejiang Province: (a) geographical location; (b) administrative divisions.
Fractalfract 09 00586 g001
Figure 2. The process of k-means clustering: (a) randomly placing three centroids, (b) assign objects to clusters, (c) separation of three clusters, (d) centroids moved to new positions.
Figure 2. The process of k-means clustering: (a) randomly placing three centroids, (b) assign objects to clusters, (c) separation of three clusters, (d) centroids moved to new positions.
Fractalfract 09 00586 g002
Figure 3. Flowchart of the GIS-based fractional order k-means clustering algorithm.
Figure 3. Flowchart of the GIS-based fractional order k-means clustering algorithm.
Fractalfract 09 00586 g003
Figure 4. (a) DEM of the study area; (b) spatial distribution of river network.
Figure 4. (a) DEM of the study area; (b) spatial distribution of river network.
Fractalfract 09 00586 g004
Figure 5. Distribution of rainfall monitoring station in Zhejiang Province.
Figure 5. Distribution of rainfall monitoring station in Zhejiang Province.
Fractalfract 09 00586 g005
Figure 6. The 5 km scale elevation and river network of a typical mountainous area in Liandu district.
Figure 6. The 5 km scale elevation and river network of a typical mountainous area in Liandu district.
Fractalfract 09 00586 g006
Figure 7. The slope S i and aspect Θ i of each grid cell: (a) slope; (b) aspect.
Figure 7. The slope S i and aspect Θ i of each grid cell: (a) slope; (b) aspect.
Fractalfract 09 00586 g007
Figure 8. The integrated rainfall data using the IDW method.
Figure 8. The integrated rainfall data using the IDW method.
Fractalfract 09 00586 g008
Figure 9. Distance from each grid to the nearest river channel.
Figure 9. Distance from each grid to the nearest river channel.
Fractalfract 09 00586 g009
Figure 10. Variation in sum of squared distance to number of clusters k.
Figure 10. Variation in sum of squared distance to number of clusters k.
Fractalfract 09 00586 g010
Figure 11. Classifying the study area into three zones using GIS-based fractional order k-means clustering: low-risk, medium-risk, and high-risk zones.
Figure 11. Classifying the study area into three zones using GIS-based fractional order k-means clustering: low-risk, medium-risk, and high-risk zones.
Fractalfract 09 00586 g011
Figure 12. Number of grid cells corresponding to (a) latitude X and (b) longitude Y of each class.
Figure 12. Number of grid cells corresponding to (a) latitude X and (b) longitude Y of each class.
Fractalfract 09 00586 g012
Figure 13. Model validation against historical flash flood records.
Figure 13. Model validation against historical flash flood records.
Fractalfract 09 00586 g013
Table 1. Normalization of input raster variables.
Table 1. Normalization of input raster variables.
Raw RasterSymbolNormalized Value
Elevation (m) Z i z i = Z i μ Z σ Z
Slope (°) S i s i = S i μ S σ S
Aspect (°) Θ i a i , 1 = sin Θ i μ sin Θ σ sin Θ , a i , 2 = cos Θ i μ cos Θ σ cos Θ
24 h max rainfall (mm) R i r i = R i μ R σ R
Distance to stream (m) D i d i = ln ( D i + 1 ) μ ln D σ ln D
Table 2. Summary of statistical features for each class.
Table 2. Summary of statistical features for each class.
StatisticLow RiskMedium RiskHigh Risk
DEM Mean82.207887.223432.993
DEM Std119.052587.835183.051
Slope Mean8.50519.98112.916
Slope Std24.84320.26650.728
Aspect Mean−195.148179.673177.247
Aspect Std360.246149.512215.051
Precip Mean327.832312.102325.617
Precip Std54.96463.48062.651
Distance Mean−4.254−2.125−0.683
Distance Std45.841103.16858.626
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, H.; Huang, J.; Zhang, X.; Meng, Z.; Fan, Y.; Wu, X.; Wang, L.; Hu, L.; Zhang, J. Flash Flood Risk Classification Using GIS-Based Fractional Order k-Means Clustering Method. Fractal Fract. 2025, 9, 586. https://doi.org/10.3390/fractalfract9090586

AMA Style

Li H, Huang J, Zhang X, Meng Z, Fan Y, Wu X, Wang L, Hu L, Zhang J. Flash Flood Risk Classification Using GIS-Based Fractional Order k-Means Clustering Method. Fractal and Fractional. 2025; 9(9):586. https://doi.org/10.3390/fractalfract9090586

Chicago/Turabian Style

Li, Hanze, Jie Huang, Xinhai Zhang, Zhenzhu Meng, Yazhou Fan, Xiuguang Wu, Liang Wang, Linlin Hu, and Jinxin Zhang. 2025. "Flash Flood Risk Classification Using GIS-Based Fractional Order k-Means Clustering Method" Fractal and Fractional 9, no. 9: 586. https://doi.org/10.3390/fractalfract9090586

APA Style

Li, H., Huang, J., Zhang, X., Meng, Z., Fan, Y., Wu, X., Wang, L., Hu, L., & Zhang, J. (2025). Flash Flood Risk Classification Using GIS-Based Fractional Order k-Means Clustering Method. Fractal and Fractional, 9(9), 586. https://doi.org/10.3390/fractalfract9090586

Article Metrics

Back to TopTop