A New Hybrid Fireﬂy–PSO Optimized Random Subspace Tree Intelligence for Torrential Rainfall-Induced Flash Flood Susceptible Mapping

: Flash ﬂood is one of the most dangerous natural phenomena because of its high magnitudes and sudden occurrence, resulting in huge damages for people and properties. Our work aims to propose a state-of-the-art model for susceptibility mapping of the ﬂash ﬂood using the decision tree random subspace ensemble optimized by hybrid ﬁreﬂy–particle swarm optimization (HFPS), namely the HFPS-RSTree elevation, slope, aspect, topographic wetness index (TWI), normalized di ﬀ erence vegetation index (NDVI), plant curvature, and proﬁle curvature) were used as explanatory variables. These indicators were compiled from a geological and mineral resources map, soil type map, and topographic map, ALOS PALSAR DEM 30 m, and Landsat-8 imagery. The HFPS-RSTree model was trained and veriﬁed using the inventory map and the eleven conditioning variables and then compared with four machine learning algorithms, i.e., the support vector machine (SVM), the random forests (RF), the C4.5 decision trees (C4.5 DT), and the logistic model trees (LMT) models. We employed a range of statistical standard metrics to assess the predictive performance of the proposed model. The results show that the HFPS-RSTree model had the best predictive performance and achieved better results than those of other benchmarks with the ability to predict ﬂash ﬂood, reaching an overall accuracy of over 90%. It can be concluded that the proposed approach provides new insights into ﬂash ﬂood prediction in mountainous regions.


Introduction
Flash floods that occurr in tropical and semi-tropical areas, caused by extraordinary rainfall, are one of the most dangerous natural phenomena due to the significant socio-economic damage and loss of human lives, particularly in the frequent cyclone regions in Southeast Asia [1,2]. Floods are often classified into different types, i.e., city flooding, river flooding, coastal flooding, and flash flood [3], of which flash floods are more vulnerable and severe because of their speed in short timescales [4,5]. Prior studies suggest that most of the areas are exposed to offensive and destructive flooding, resulting in an increase in the huge damages, casualties, and financial losses during flooding events. Thus, in order to prevent and control the floods, the susceptibility areas, where the potential flood risks are high, should be identified and mapped [6]. On the other hand, human factors, i.e., deforestation and unplanned land-use changes, also considerably contribute to the occurrence of sudden flooding [7] because forests play an important role in reducing surface runoff and transferring excess water to groundwater. Moreover, population growth causes land conversion from forested areas to new settlements built in flood-prone areas. This situation becomes more severe because of the impacts under a changing climate along with land-use changes, which is anticipated to exceed 1 trillion US$ in damage by 2050 [8]. However, accurate and timely prediction of the flash flood still remains challenging because of the complex nature of this phenomenon [9]. Thus, the development of a cost-effective, reliable, and precise accuracy model for predicting and mapping the occurrence of flash floods in areas with high and frequently-induced rainfall is essential in order to support sustainable land-use planning [10].
A previous literature review shows that a large number of studies have been conducted to predict the probability of flooding based on the three main categories, ranging from the traditional analysis, the rainfall-runoff approach to the pattern classification [11]. In recent years, the rapid development of innovative technologies involved earth observations (EO), geographic information system (GIS)-based approaches, and machine learning techniques, which have been proven as promising tools to account for the complexity of spatial flood modeling [12]. Importantly, the integration of satellite remotely sensed imagery and GIS data had been proven as an effective way to map and evaluate flash flood damages [13,14]. For instance, Klemas [14] reported the use of satellite imagery and modeling techniques to predict flood vulnerability, whereas Lee et al. [15] reported the usability of the random forest methods for mapping flood vulnerability in the metropolis. Recently, Khosravi, et al. [16] used GIS-based frequency and weight ratio statistical bivariate statistical models for mapping flash flooding susceptibility. A wide range of attempts have been made to map flash flooding using various artificial intelligence techniques optimized by metaheuristic algorithms for flooding capacity [17,18]. More recent studies used different machine learning algorithms in predicting and zoning the flash flooding areas [19][20][21]. However, only a few studies integrated remotely sensed data and spatial data in machine learning techniques for improving the accuracy of spatial prediction of flash floods, despite the fact that air-borne remote sensing data provide a number of benefits such as easier repeatability, low cost, and wider area coverage [21,22], resulting in a lack of cost-effective, precise, and timely models for the susceptibility mapping of flash floods. Thus, this study aims at developing a state-of-the-art model incorporating Sentinel-1 C band free-of-charge data and an advanced machine learning algorithm using the decision tree-based random subspace optimized by hybrid swarm intelligence, namely the HFPS-RSTree model, for the spatial prediction of flash floods in a mountainous area in Northwestern Vietnam.

Decision Tree Algorithm
The decision tree (DT) algorithm is a simple supervised learning classifier [23]. While other supervised learning algorithms collate all available features together to determine each individual label, the DT operates multiple steps based on decision rules to decide the outcome of a label class. It creates a tree-like structure-that is, nodes represent tests on attributes while branches and leaves represent the consequences of the tests and a category label, respectively. At each node, tests can be applied to one or more of the attributes, namely univariate and multivariate applications. The univariate application analyzes a single attribute while the multivariate application simultaneously tests for one or more attributes. For instance, the Gini index can be applied for single-attribute splits or univariate applications, whereas the support vector machine [24] can be used for the multivariate approaches.
The benefit of this approach is that it is simple and flexible. It can not only be applied to both categorical and continuous data, but it also performs rapidly due to its requirement of simple mathematical calculations. Its structure can be easily visualized for interpretation. However, it sometimes provides a non-optimal result or overfitting. Overfitting can be solved by removing branches.

Random Subspace Ensemble
Ensemble-based learning techniques are well-known methods for multiple classifiers [25,26], of which the random subspace (RS), proposed first by Ho [23], has proven to be one of the most potent techniques. To handle the deficiency of a single decision tree method, RS takes full advantage of multiple classifiers that are decision trees. To predict flash flood output, each decision tree votes for the flood class and the majority vote of all decision trees is an aid to decide a final outcome. This can reduce overfitting and non-optimal resolution, which may be major issues of single-classifier approaches. More significant improvements have been reported by training each classifier with a random subset of the reference data as opposed to using only a subset of input attributes for that classifier. Although this reduction process may reduce the performance of individual classifiers, it can deal with too strong correlations causing unreliable solutions. It also reduces the amount of calculation time, while the remaining reference data could be adopted for the independent accuracy assessment of the random subspace algorithm.

Hybrid Firefly-Particle Swarm Algorithm (HFPS)
One difficulty in using ensemble-based learning for predicting flash floods is in optimizing the model parameters, and in this context, metaheuristic optimization algorithms have demonstrated their superiority [27,28]. In this research, a hybrid optimization algorithm (HFPS) proposed by Aydilek [29], which is an integration of the firefly algorithm (FA) and particle swarm optimization (PSO), was used. The reason for this selection is that the HFPS algorithm inherits the fast computation of PSO and the robustness of FA to form a new powerful algorithm. Consequently, the HFPS algorithm outperforms benchmark algorithms in various engineering problems [29]. Overall, the procedure of the HFPS can be briefly described as follows.
(1) Determine the population of the swarm, the position and the velocity for each particle, and the total number of iterations used. (2) Establish a cost function to measure the fitness of each particle, called particle best (pbest), and then compare all pbests to obtain the global best (gbest). (3) For each iteration, calculate and update the position (Pos) and the velocity (Vel) for all particles in the swarm using Equations (1) and (2) and then compute pbest and gbest. If the fitness is not improved, Pos and Vel for each practice will be updated using Equations (3) and (4).
where Pos i and Vel i are the position and the velocity of i-th particle or i-th firefly, a is the random parameter from 0 to 1, r ij is the distance between the two fireflies, γ is the light absorption coefficient of FA, B 0 is the attractiveness value, w is the inertia weight of PSO, tp is the temporal position, r 1 and r 2 are random parameters ∈ [0, 1], C 1 and C 2 are the acceleration coefficients, and t is the current iteration. (4) Compute the best gbest in all iterations, and then extract the coordinates of the particle with this gbest. The coordinate values are called the optimized parameters for the flash flood ensemble model.

Descriptions of the Study Site
We conducted the current work in the Van Ban district, located in the mountainous the Lao Cai Province (approximately 263 km from Northwest Hanoi), Vietnam. The total area is approximately 1435 km 2 , accounting for around 22.5% of Lao Cai Province. It lies between latitudes of 21 • 57 32"N and 22 • 17 12"N and between longitudes of 103 • 57 18"E and 104 • 30 38"E ( Figure 1).
The study area has a complex terrain condition, lying between two large mountain ranges, the Hoang Lien Son in the northwest and the Con Voi in the southeast. The topography contains approximately 90% mountainous area and around 10% low land area. The former area consists of various hills and mountainous ranges, with altitude ranging from 700 m to 1500 m and an average slope from 25 to 35 • , exceeding 50 • in some areas. The remaining areas are valleys at an altitude of 400-700 m. The highest place is located in the Nam Chay commune, at a height of approximately 2875 m, while the lowest is along the Nam Chan stream area, with an altitude of 85 m. In the study area, there are small streams and springs starting from the Hoang Lien Son and the Nui Voi mountainous areas and discharging into the Hong River in the northeast. The study area is highly vulnerable to flash floods due to the complex terrain and dense drain network; they particularly occur when rapid runoff from hilly and mountainous areas discharges quickly into Ngoi Nhu, Nam Tha, and Ngoi Chan streams in a short time before reaching the Hong River [30]. Geologically, the study site has a total of fifteen formations and a complex outcrop. Seven formations occupy over 89% of the total study site: Ye Yen Sun (26.98%), TLNT (21.60%), Sinh Quyen (11.66%), Bac Ha (11.46%), PS complex (5.98%), Cam Duong (6.01%), and Suoi Bang (4.87%) ( Figure 2a). The dominant lithology area consists of biotite granite, marble, motley limestone, clay shale, quartz-plagioclase-biotite, clay sericite shale schist, and crystalline schist (See Table 1).
The study site has a subtropical monsoon climate with two seasons, of which the rainy season lasts from April to September and the dry season starts in October and ends in March. Average annual precipitation rates range from 1500 mm to 2500 mm and are mainly allocated to the rainy season, making up 70.74-89.25% of the total annual precipitation. Noticeably, very high precipitation intensity events often occur in the rainy season within a short period, observed in steep slopes, leading to the frequent occurrence of flash floods along with landslides in the case study.

Flash Flood Inventory Map
Flood inventory maps are often required in order to investigate the relationship between flood and causative agents [27,31,32]. To prepare the flood-prone map, the initial step is to acquire the relevant data and to construct a spatial geodatabase. In the present study, a total of 200 flash flood locations was interpreted using Sentinel-1 C band free-of-charge data to generate an inventory map, as suggested by Nguyen et al. [21]. Accordingly, 2653 flash flooding polygons occurring in the rainfall season during the year 2018 were identified. These flooded polygons were utilized to construct an inventory map for the study site, of which a total of 1858 ( Figure 1) polygons were employed for the training phase to predict flash floods occurring in the study site, and the remaining 796 polygons were utilized for the testing phase to validate the predictive performance [31].

Flash Flood Indicators
It is widely recognized that flash flooding events often take place on a local scale, depending mainly on rainfall, land use, topography, and soil features of the region [16,31,33,34]. Therefore, identification of the conditioning factors associated with flash floods is often required in predicting the possibility of flash floods occurring in a specific region. However, the conditioning or influencing factors in each flash flood event are complicated and depend on a number of factors involved. In the current work, we selected eleven influencing factors for flash flood modeling, including geology, soil type, river density, rainfall, slope, elevation, aspect, plan curvature, profile curvature, TWI, and NDVI, based on the literature reviews [19,35,36]. Herein, geology, soil type, and river density indicators were derived from the geology map with a scale of 1:50,000, acquired from the geological and mineral resources map of the Van Ban district [37] and the soil type and the topographic maps of Vietnam at a scale of 1:50,000. The rainfall indicator was obtained from the stations in the study area, whereas the slope, elevation, plan curvature, aspect, profile curvature, and TWI conditioning factors were computed from the ALOS-PALSAR DEM 30 m [38]. The NDVI factor was computed from Landsat-8 imagery (acquired on 20 December 2017) at a spatial resolution of 30 m. All indicators were transformed in a raster format at 30 m spatial resolution to construct a flash flood susceptibility map.
Geology: Geology is an important factor related to flash flooding occurrence [39,40]. This is because different geological terrains have various capacities of water absorption and therefore can be susceptible to rapid runoff generation during high rainfall events, exacerbating the potentiality of severe flooding to downstream regions [27,40]. The geology characteristics were classified into fifteen formations (Figure 2a) consisting of different mother-rocks such as sedimentary, igneous, and metamorphic components. The geological characteristics of the study site are presented and summarized in Table 2.
Soil type: Hydrologically, soil type is among the crucial factors influencing infiltration, runoff generation, and soil erosion processes affecting flash flood characteristics [41]. It is widely recognized that different soil types have different properties (soil moisture, soil texture, and soil profile). Thus, the type of soil directly influences the formation of flash flood flow and its components (i.e., water, muds, and alluvial) [42]. In this study, thirteen soil types were observed in the study area, of which the Fa, Ha, and Fs soil types exceed 90% of the total area, followed by the Fj type and other soils (Figure 2b).
River density: River density plays a vital role in carrying flash flood flow out of the watershed [43]. Although the characteristics of the flash flood may vary according to different topographical conditions, a higher river density likely has a more significant impact on flood flow expansion [16]. For instance, a dense river system in flat areas could lead to rapidly expanding flood flow, while it has a reverse trend in steep areas. Thus, we considered the river density factor, which was generated from the digital elevation model (DEM) (Figure 2c).

Rainfall:
Rainfall, which is characterized by rainfall intensity, duration, and frequency, is the most important influencing factor of any hydrological processes within a watershed. Although flash flood flow likely depends on a number of factors involved, high rainfall intensity tends to contribute to forming high energy and fast mass transferring of the flash flood in a specific area. As the study site was located in a tropical monsoon region with high rainfall intensity occurring in steep areas, the site is highly vulnerable to flash floods and landslides [19]. In this work, the highest cumulative rainfall events which occurred for 72 h during the last three years were employed to compute the rainfall map using the inverse distance weight (IDW) method [31]. The rainfall level recorded within three days was around 177.5 mm in the west and approximately 11.8 mm in southeastern areas, interpolated in the ArcGIS software (Figure 2d).
Slope and elevation: Slope and elevation are two main drivers of topographical conditions indicating the speed of flash flood flow within a watershed. For example, high altitude coupled with steep slopes has a greater probability of contributing to extreme flash floods, even in the event of low rainfall intensity [19,31]. Meanwhile, high rainfall intensity in flat areas with small slopes will probably result in less flash flood occurrences. It was noted that most of the high slope angle is located in the high elevation areas generating runoff, forming and speeding up flash flood flow in the study area. In contrast, the low slopes along streams (Ngoi Nhu, Nam Tha, and Ngoi Chan) may decrease the capacity of carrying flash flood flow out of the watershed. The slope layer revealed a large amount of variation, ranging from 0.01 to 68.16 degrees (Figure 2e), whereas the elevation map was generated from a DEM with 30 m spatial resolution, showing elevation ranging between approximately 32 m and 3000 m (Figure 2f). The DEM was generated using the national topographic map at a scale of 1:50,000, acquired from the Vietnam Institute of Geosciences and Mineral Resources.
Aspect: The aspect factor is another component of topography presenting the potential stream flow direction and sensitivity processes that regulate the components of a flash flood [44,45]. The aspect map created for the study area was categorized into eight classes [16,46], as shown in Figure 2g. The positions of flash floods occurring in the case study were corresponded to the aspect map, showing the influencing level of this indicator on the probability of flash flood occurrences.
Plan curvature: Plan curvature reveals the morphometrical characteristics and indicates the change in a slope's inclination or aspect [45]. Plan curvature may largely affect the acceleration and deceleration of water and muds/sediment during downslope flow and, therefore, likely influences the velocity of the flash flood [45].
More importantly, the plan curvature influences the divergence of flow, thus deeply affecting the flash flood energy and mass transfer from upstream to downstream in a specific watershed. Plan curvature values are generally defined as concavity (positive), convexity (negative), and flat (zero), which are largely affected by the runoff processes [47,48]. We generated the plan curvature map using the DEM with a pixel size of 30 m × 30 m. Figure 2h shows that roughly 75% of the study area is covered by the concave zones.
Profile curvature: Profile curvature corresponds to the direction of the maximum slope, thus indicating the convergence and the divergence of a surface flow [49]. A negative value in the top of the mountains indicates that the surface is upwardly convex, while a positive value reveals that the surface is upwardly concave at that location (Figure 2i). A zero value of the profile curvature shows that the surface is linear. The profile curvature often influences the acceleration or deceleration of flash flow across the surface area. We used the DEM with a grid of 30 m × 30 m to generate the profile curvature map in the current work. Figure 2i shows that approximately 80% of the study area is occupied by concave zones.

Topographic wetness index (TWI):
The TWI is considered the most critical parameter measuring topographic controls of basic hydrological processes [50]. The TWI map was created using the altitude map by applying Equation (5) [51,52].
where A s is an upslope area, and β is the slope angle at one pixel. Figure 2j shows that the TWI ranges from 5.10 to 21.62, in which high TWI values indicate the greater capability for water accumulation in the study area.

Normalized difference vegetation index (NDVI):
The NDVI is a crucial indicator, showing the degree of vegetation coverage, which largely influences flood processes [31]. Greater NDVI values demonstrate higher vegetation coverage, while lower values indicate less vegetation. Previous studies show that low vegetation coverage indicates high probability of flash flood occurrence [27,36].
The NDVI map (Figure 2k) for the study area was calculated and computed using Landsat-8 Operational Land Imager (OLI) multispectral imagery with a pixel size of 30 m × 30 m for predicting flash flood susceptibility (Equation (6)) [31].
where RED and NIR are the surface reflectance of the red and the near-infrared wavelengths derived from Landsat-8 OLI, respectively.
The NDVI values range between −0.19 and 0.59, indicating the different impact levels of vegetation coverage on flash flood processes.

Proposed HFPS-RSTree for Flash Flood Susceptibility Modeling
In this work, the flash flood indicators and inventories were processed using ArcMap 10.6 (See Figure 3). The HFPS-RSTree model was computed and constructed by the authors in the Matlab environment. The RSTree is available in the API Python Weka Wrapper [53], whereas the HFPS code in Matlab was introduced by Aydilek [29].

Database Establishment
The geospatial database for flash floods in the Van Ban district was constructed using ArcCatalog. The flash flood inventory map and the eleven influencing factors were converted into a raster format with a spatial resolution of 30 m. Note that in the proposed model, a number of factors (slope, elevation, plan curvature, profile curvature, TWI, NDVI, river density, and rainfall) were represented as continuous values, while the remaining categorical indicators, including the aspect, soil type, and lithology factors, were converted into numeric values using the method suggested in [31].
The flash flood inventories were randomly split into two subsets for the flash flood modeling in the next phase, of which the training dataset consisted of 1858 polygons and the validating dataset contained 796 polygons.

Configuration of the HFPS-RSTree Model
The structure of the proposed HFPS-RSTree model consists of three algorithms: the RS ensemble, the decision tree algorithm, and the HFPS optimization. Using the training dataset, the RS ensemble will generate m subsets (m-ss), and each subset will have p flash flood indicators (p-ffi). Each subset will be used to generate a tree using the decision tree algorithm, where the maximum depth of the tree (d-max) must be defined. Therefore, the HFPS-RSTree model was configured using values determined by the three above parameters, m-ss, p-ffi, and d-max.
Herein, the HFPS algorithm was integrated in order to search for and optimize the best combination of them autonomously. A number of parameters used for the HFPS algorithm were suggested by Aydilek [29]. Accordingly, the acceleration coefficient was set to 1.49445 for both C 1 and C 2 . The swarm population was 30, whereas the total number of iterations was 1000. The searching space was as follows: m-ss ∈ [10-500], p-ffi ∈ [1][2][3][4][5][6][7][8][9][10][11], and d-max ∈ . It should be noted that the default maximum depth of the tree was computed using an integer value.

The Objective Function and Training the HFPS-RSTree Model
To quantitatively measure the best combination of the three parameters (m-ss, p-ffi, and d-max), an objective function (ObjF) must be established, and in this work, the ObjF suggested in [27] was used, as shown below.
where Predict i is the estimated value of the HFPS-RSTree model; Target i is the flash flood inventory value; n is the total number of samples.

Model Performance Assessment
The model's performance was assessed using a number of statistical measures, such as the receiver operating characteristic (ROC) curve and area under the curve (AUC), the overall accuracy, and the kappa coefficient, because these metrics have been widely used for checking the performance of flash flood modeling in the literature. Detailed formulas of these statistical measures can be found in [27,54,55].

Variable Importance Ranking
In this study, variable importance was assessed using the random forest algorithm. The results in Table 2 show that the slope, aspect, and elevation factors had higher importance for assessing flood risk, thus minimizing the impact on the occurrence of flash floods in the case study. Remarkably, the slope factor is likely the most important factor for predicting the spatial distribution of the flash flood in this study. Other remaining factors, such as the aspect, elevation, plan curvature, profile curvature, TWI, NDVI, river density, lithology, rainfall pattern, and soil type, were ranked from 2 to 10, respectively, in the occurrence of floods in the study area.  Figure 4 and Table 3 show the predictive performance of the HFPS-RSTree, the RF, the C4.5-DT, the LMT, and the SVM algorithms in the training and the testing phases. The AUC for the prediction-rate curve demonstrates how well the model predicts the flash flood. The results in Figure 4 and Table 3 show that the proposed algorithms performed very well in both the training and the validation datasets. It could be observed that the AUC values of the HFPS-RSTree, the RF, the C4.5-DT, the LMT, and the SVM models were 0.973, 0.970, 0.920, 0.945, and 0.964, respectively, in the training data, whereas these corresponding values were 0.967, 0.965, 0.914, 0.927, and 0.951, respectively, in the testing data, showing satisfactory results for the spatial prediction of flash floods in the study area.  Overall, the HFPS-RSTree model yielded the highest predictive performance both in the training phase (kappa = 0.860, overall accuracy = 92.99) and in the testing phase (kappa = 0.838, overall accuracy = 91.88), followed by the RF algorithm. In contrast, the SVM algorithm produced the lowest performance (kappa = 0.844, overall accuracy = 92.18 in the training set and kappa = 0.790, overall accuracy = 89.48 in the testing set). The results showed that the ensemble-based methods using the decision tree learning algorithm yielded better predictive performance than those of well-known machine learning (ML) algorithms in this study. Our results are in agreement with the recent studies reported by [21,56]. We conclude that, among the five ML algorithms, the proposed model using a combination of the decision tree ensemble-based algorithm and an advanced optimization technique produced the most precise results for the spatial prediction of flash floods in the study area. Table 4 shows the Wilcoxon rank-sum test results for five ML models. It can be clearly seen that all pairwise comparisons were statistically significant except the RF vs. C4.5-DT (p-value = 0.099) and the RF vs. LMT (p-value = 0.055).

Flash Flood Susceptibility Map
Since the HFPS-RSTree produced the best predictive performance regarding the AUC, overall accuracy, and kappa index among the five ML models for predicting flash flood risk, we employed this model to compute the flash flood susceptibility map in the study area. The final model results were transformed into a raster format and interpreted in the ArcGIS environment. The flash flood susceptibility map was generated and visualized, as shown in Figure 5. The susceptibility index was varied from 0.01-1.00, of which the darker blue color in the map represents the high-frequency occurrences of flash floods. In contrast, the brighter yellow color shows the low probabilities of flash flood risk.  Figure 5 shows that the highest possibility of flash floods likely occurred in Khanh Yen town, followed by the Van Son, the Dan Thang, and the Nam Chay communes. These areas are flat and are located closer to the rivers that were likely the most affected by the flash flood risk during the last five years. Therefore, the policymakers or local authorities should pay more attention to these areas when prioritizing the development of flood risk measures. In contrast, the other areas have a lower probability of flash flood. This is possibly due to the terrain slope of these areas being steep, which may prevent water accumulation.

Discussion
In the last decade, the adverse effects of global warming have resulted in a higher frequency of floods in various regions around the globe [57][58][59]; therefore, new studies to develop better tools for flood prediction are highly necessary. In this research, we proposed a new modeling approach, named the HFPS-RSTree, for the spatial prediction of flash flood susceptibility, with a case study of a high-frequency torrential rainfall area. The proposed HFPS-RSTree is a new machine learning ensemble consisting of three components: the decision tree (Tree), the random subspace (RS), and the HFPS technique. Herein, the flood ensemble model was created using the RS and Tree, while the HFPS was integrated in order to optimize the model.
As a result, the precise accuracy of the HFPS-RSTree model for the spatial prediction of flash floods indicates that a combination of the HFPS, the RS, and the Tree techniques is efficient in predicting flash flood potential areas. This is due to the mechanism of ensemble-based learning, in which the RS plays a vital role in generating flood subsets to ensure the diversity of the final ensemble model. Thus, during the last ten years, decision tree ensemble-based learning methods have confirmed their high prediction power in various domains [60][61][62] in which flood studies have been conducted [15,26]. The results of this HFPS-RSTree model in this regard confirm the above statement.
The success of building the HFPS-RSTree model is also strongly dependent on three parameters, namely the number of subsets (m-ss), the number of indicators used in these subsets (p-ffi), and the maximum depth of the tree (d-max); therefore, these parameters should be carefully determined. The highest performance of the HFPS-RSTree model, compared to the RF, C4.5-DT, LMT, and SVM, shows that these parameters have been searched and optimized successfully by the HFPS algorithm. This is a reasonable result because the HFPS has proven its capacity in searching and optimizing parameters in various engineering domains recently [29].
In this research, eleven indicators were considered for flash flood modeling, and the superior performance of the HFPS-RSTree model demonstrates that these indicators were selected and processed properly. Among these indicators, the slope degree and slope direction are likely the most important factors for mapping and predicting flash floods in the present study. This result is consistent with the results reported by Tehrany et al. [63], showing that flood-prone areas are often located in flat areas and low altitudes. On the other hand, as the slope increases, the rate of water infiltration decreases, and the water velocity increases [16].

Concluding Remarks
We proposed a new ensemble machine learning model, namely the HFPS-RSTree model, to map the spatial prediction of flash floods in the present work. The Van Ban district, located in the northern mountainous region of Vietnam, was selected as a case study. The predictive performance results of the HFPS-RSTree were compared with the four machine learning techniques, namely the RF, C4.5 DT, LMT, and SVM models. The conclusions which can be drawn from the results of the current study are the following: The integration of HFPS, RS, and Tree, which results in a new ensemble model, is capable of predicting flash floods accurately. HFPS is a useful tool for optimizing the RSTree model. The HFPS-RSTree model yielded higher predictive performance than those of other benchmarks such as the RF, C4.5-DT, LMT, and SVM models, which was confirmed by the Wilcoxon rank-sum test. This denotes that the HFPS-RSTree model is a promising tool to be considered for flash flood studies. Regarding the 11 conditioning flash flood indicators, the slope and the aspect factors are the most important features. Finally, the flash flood susceptibility map may assist local authorities and policymakers with watershed management and sustainable development in the district.