A Novel Hybrid Swarm Optimized Multilayer Neural Network for Spatial Prediction of Flash Floods in Tropical Areas Using Sentinel-1 SAR Imagery and Geospatial Data

Flash floods are widely recognized as one of the most devastating natural hazards in the world, therefore prediction of flash flood-prone areas is crucial for public safety and emergency management. This research proposes a new methodology for spatial prediction of flash floods based on Sentinel-1 SAR imagery and a new hybrid machine learning technique. The SAR imagery is used to detect flash flood inundation areas, whereas the new machine learning technique, which is a hybrid of the firefly algorithm (FA), Levenberg–Marquardt (LM) backpropagation, and an artificial neural network (named as FA-LM-ANN), was used to construct the prediction model. The Bac Ha Bao Yen (BHBY) area in the northwestern region of Vietnam was used as a case study. Accordingly, a Geographical Information System (GIS) database was constructed using 12 input variables (elevation, slope, aspect, curvature, topographic wetness index, stream power index, toposhade, stream density, rainfall, normalized difference vegetation index, soil type, and lithology) and subsequently the output of flood inundation areas was mapped. Using the database and FA-LM-ANN, the flash flood model was trained and verified. The model performance was validated via various performance metrics including the classification accuracy rate, the area under the curve, precision, and recall. Then, the flash flood model that produced the highest performance was compared with benchmarks, indicating that the combination of FA and LM backpropagation is proven to be very effective and the proposed FA-LM-ANN is a new and useful tool for predicting flash flood susceptibility.


Introduction
Floods are considered as one of the major natural disasters in the world, in terms of human casualties and financial losses [1,2]. Among several types of floods, flash floods are typically disastrous and are distinguished from regular floods by their rapid occurrence on short timescales, i.e., less than six hours [3]. Flash flood hazards are often triggered by heavy downpours, torrential rainfalls, or tropical rainstorms. Reports on the destructive effects of flash floods on human lives have been observed worldwide [4][5][6][7][8][9]. Human factors also contribute to the occurrence of flash floods i.e., deforestation and unplanned land use. Deforestation obviously weakens the capability of flood prevention because forests significantly reduce water surface runoff and transfer the excess water into the groundwater and aquifers [10], In addition, the population growth leads to the fact that many newly built settlements are located in areas susceptible to floods.
Due to the devastating economic, environment, and social aspect effects of flash floods, many studies have been dedicated to spatial modeling of floods and establishing flood susceptibility maps at a regional scale [11][12][13][14]. This is because the determination of flood-prone areas is an essential step in the prevention and management of future floods [15,16]. Nevertheless, the construction of flash flood susceptibility maps is a difficult task, especially in large areas, because flash floods are complicated processes which have region-dependent features and occur nonlinearly across a variety of spatio-temporal scales [17].
In recent years, the rapid advancement of Geographic Information System (GIS), remote sensing, and machine learning have given scientists effective tools for dealing with the complexity of spatial flood modeling [18][19][20]. The spatial data extracted from GIS greatly enhances the understanding and the assessment of flood risks for the whole region under analysis. Moreover, these GIS-based datasets can be combined with modern machine learning approaches to construct powerful tools for spatial prediction of floods. New remote sensing sensors i.e., Sentinel-1A and B, provide new tools for flood detection and mapping with high accuracy [21,22]. Machine learning methods with their capabilities dealing with nonlinear and multivariate data have proven their usefulness in establishing flood susceptibility maps in various countries around the world [23].
Moreover, recent reports with positive results of machine learning applications in solving the problem of interest have been observed extensively in the literature. This is because machine learning has the ability to explore complicated relationships between factors in various real-world problems [24,25]. For flood modeling, Nandi, et al. [26] constructed a flood hazard map in Jamaica based on logistic regression and principal component analysis. A GIS-based flood susceptibility assessment and mapping using frequency ratio and weights-of evidence bivariate statistical models have been put forward by Khosravi, et al. [27]. Tien Bui, Pradhan, Nampak, Bui, Tran and Nguyen [15] and Razavi Termeh, et al. [28] proposed novel data-driven methods based on artificial intelligence optimized by metaheuristic algorithms for flood susceptibility. Lee, et al. [29] investigated the applicability of boosted-tree and random forest techniques for flood susceptibility prediction in a metropolitan city. A probabilistic model based on Bayesian framework for spatial prediction of floods has been proposed by Tien Bui and Hoang [30]. Chapi, et al. [31] combined a bagging algorithm and a logistic model tree to create a new tool for flood susceptibility mapping. Sachdeva, et al. [32] recently incorporated GIS, support vector machine and a swarm optimization algorithm to formulate a flood risk assessment model applied in India. Rahmati and Pourghasemi [33] analyzed the spatial data and identified critical flood prone areas with the help of various techniques including the evidential belief function and the classification trees.
Among machine learning methods, artificial neural networks (ANNs) are perhaps some of the most extensively used in flood modeling [34,35] as well as spatial predictions of other natural hazards [36][37][38][39]. This method possesses a strong capability in analyzing nonlinear and multivariate data as well as the ability of universal modeling. Despite these advantages, the application of ANNs in GIS-based modeling of flash flood susceptibility is still limited. In addition, previous works applying ANN in spatial modeling of natural hazards often resorted to gradient-based algorithms with backpropagation as a conventional way for training the models. This conventional approach updates the weights of an ANN model to minimize the prediction errors during the training phase. Although gradient-based algorithms with backpropagation are fast, this training method suffers from the risk of being trapped in local minima, especially in a multi-modal error space [40]. This disadvantage significantly deteriorates the predictive capability of ANN-based flash flood prediction models.
To counteract the aforementioned limitation of gradient-based algorithms, metaheuristics as a global searching method have been employed to improve the ANN training phase. Various metaheuristic algorithms, such as cuckoo search optimization [41], bat optimization [42], monarch butterfly optimization [43], shuffled frog leap algorithm [44], kidney-inspired algorithm [45], and an improved particle swarm optimization [46], have been recently proposed and investigated. Previous studies show improved performances of metaheuristic-assisted models compared to the traditional models. A review by Ojha, et al. [47] pointed out an increasing trend of applying metaheuristics as a tool for ANN models' construction phase.
The construction of an ANN model involves the optimization of connecting weights; in addition, the landscape of the error function can be highly complicated with numerous local minima. These facts entail that the stochastic search of metaheuristic must involve the cooperation of a considerable number of searching agents (also called population members). The search space exploration of such searching agents typically represents a huge computational burden and has a slow convergence rate. Metaheuristic algorithms often require a large amount of function during the optimization of the ANN models 'weights. Therefore, it is necessary to combine the advantages of both metaheuristic and gradient-based algorithms to come up with an effective method for ANN model training.
This study puts forward a novel method, which employs gradient-based algorithm of Levenberg-Marquardt backpropagation and the metaheuristic firefly algorithm algorithm. In this integrated framework, the firefly algorithm acts as a global search engine and the backpropagation algorithm plays the role of a local search with the aim of accelerating the optimization process. To train and verify the new ANN model used for flash flood susceptibility mapping, the Bac Ha Bao Yen (BHBY) area in the northwestern region of Vietnam was selected as a case study. This area belongs to a region which is highly susceptible to flash flooding occurrences due to its relief characteristics, i.e., rough and steep terrains [10]. Reports on the losses of human lives after the occurrences of flash floods in this area are regular news in the mass media. For instance, in August 2017, flash floods isolated many towns in this region and killed 18 people [48].

Flash-Flood Detection from Multitemporal Sentinel-1A SAR Imagery
Spatial prediction of areas prone to flash flooding using machine learning requires understanding and learning from events occurred in the past and present [30,49]; therefore, establishment of flash-flood inventory map is a key issue and mandatory task. A literature review points out that mapping of flash flood inventories is still the most critical task in the literature because flash floods are usually characterized both by short temporal and spatial scales that are difficult to observe and detect [49]. Optical images are not suitable because they are sensitive to illumination and bad weather conditions [22]. Most of published works collected flash-flood event data using handheld GPS devices and field surveys, which consume both time and cost, i.e., in [16,20].
In this research, Sentinel-1A SAR imagery is used for deriving flood inventories. Sentinel-1A is a satellite launched on 3 April 2014 by the Europe Space Agency (ESA) in the Copernicus Programme [50]. The mission has a repeat cycle of 12 days providing C-band SAR data (wavelength 3.75-7.5 cm, frequency 4-8 GHz) in four acquisition modes, interferometric wide-swath (IW), extra wide-swath (EW), wave mode (WV), and strip map (SM). Although Sentine-1A provides two dual-polarized data sources, co-polarized vertical transmit/vertical receive (VV) and cross-polarized vertical transmit/vertical receive (VH); however, the VV data provides better results [51,52], therefore it was selected for flash flood detection in this study. Accordingly, four images (Table 1) were acquired in IW mode (250 km swath width and 10-m resolution), Level-1 ground range detected (GRD) format, and ascending direction. The proposed methodological approach to obtain flash-flood inventories for the study area using Sentinel-1A SAR imagery is shown in Figure 1. This approach uses the concept of change detection that requires image pairs captured pre-and post-flash flood events and the same satellite track. The processing of the Sentinel-1 GRD imagery consists of the following main tasks: (1) updated satellite position and velocity information using the precise orbit files, and then, the Lee filter [53] and multi-looking were applied to remove the speckle in these images; (2) Radiometric calibration was used to remove radiometric bias and ensure values at pixels are the real backscatter of the reflecting surface; (3) Range-Doppler terrain correction was applied using shuttle radar topography mission digital elevation model (SRTM DEM) to remove images distortions and re-projected the resulting images to the UTM 48N projection of the study area.
Once the processing phase of these images were completed, co-registration between the pre-flash flood and post-flash flood images were performed, and subsequently, flash flood areas were detected. These flood areas were manually digitalized using ArcGIS. Finally, these flash flood results were randomly checked in the fieldwork phase using handhold GPS. Figure 2 shows flash flood areas detected by the above Sentinel-1A SAR imagery. transmit/vertical receive (VH); however, the VV data provides better results [51,52], therefore it was selected for flash flood detection in this study. Accordingly, four images (Table 1) were acquired in IW mode (250 km swath width and 10-m resolution), Level-1 ground range detected (GRD) format, and ascending direction. The proposed methodological approach to obtain flash-flood inventories for the study area using Sentinel-1A SAR imagery is shown in Figure 1. This approach uses the concept of change detection that requires image pairs captured pre-and post-flash flood events and the same satellite track. The processing of the Sentinel-1 GRD imagery consists of the following main tasks: (1) updated satellite position and velocity information using the precise orbit files, and then, the Lee filter [53] and multi-looking were applied to remove the speckle in these images; (2) Radiometric calibration was used to remove radiometric bias and ensure values at pixels are the real backscatter of the reflecting surface; (3) Range-Doppler terrain correction was applied using shuttle radar topography mission digital elevation model (SRTM DEM) to remove images distortions and re-projected the resulting images to the UTM 48N projection of the study area.
Once the processing phase of these images were completed, co-registration between the preflash flood and post-flash flood images were performed, and subsequently, flash flood areas were detected. These flood areas were manually digitalized using ArcGIS. Finally, these flash flood results were randomly checked in the fieldwork phase using handhold GPS. Figure 2 shows flash flood areas detected by the above Sentinel-1A SAR imagery.

Artificial Neural Network for Flash Flood Modeling
A multilayer artificial neural network (ANN) is a supervised machine learning algorithm which imitates the characteristics of actual biological neural networks [54]. An ANN can be trained with input data (flash flood conditioning factors) with ground truth labels (flash-flood and non-flash-flood); the trained ANN model is then used to predict the output class labels of flash flood occurrences. Generally, the structure of an ANN is arranged into three connected layers: input, hidden, and output (see Figure 3). The first layer contains neurons, which are flash flood conditioning factors. The second layer, including individual neurons, perform the task of information processing to yield the class labels of flood susceptibility in the output layer. ...

Artificial Neural Network for Flash Flood Modeling
A multilayer artificial neural network (ANN) is a supervised machine learning algorithm which imitates the characteristics of actual biological neural networks [54]. An ANN can be trained with input data (flash flood conditioning factors) with ground truth labels (flash-flood and non-flash-flood); the trained ANN model is then used to predict the output class labels of flash flood occurrences. Generally, the structure of an ANN is arranged into three connected layers: input, hidden, and output (see Figure 3). The first layer contains neurons, which are flash flood conditioning factors. The second layer, including individual neurons, perform the task of information processing to yield the class labels of flood susceptibility in the output layer.

Artificial Neural Network for Flash Flood Modeling
A multilayer artificial neural network (ANN) is a supervised machine learning algorithm which imitates the characteristics of actual biological neural networks [54]. An ANN can be trained with input data (flash flood conditioning factors) with ground truth labels (flash-flood and non-flash-flood); the trained ANN model is then used to predict the output class labels of flash flood occurrences. Generally, the structure of an ANN is arranged into three connected layers: input, hidden, and output (see Figure 3). The first layer contains neurons, which are flash flood conditioning factors. The second layer, including individual neurons, perform the task of information processing to yield the class labels of flood susceptibility in the output layer.
...    The aim of training flash flood prediction model is to determine a mapping function f : X ∈ R D → Y C f : X ∈ R D → T ∈ R C where D denotes the number of input flash flood factors and C = 2 is the two output classes, no flood (C 1 = −1) and flood (C 2 = +1). The mapping function f can be briefly described in the following form [55]: where W 1 and W 2 are two weight matrices (see Figure 3) are bias vectors; f A denotes the log-sigmoid activation function given as follows: where j = 1, 2, . . . , N.
In the ANN learning phase, the weight matrices and the bias vectors are adapted via the framework of error backpropagation [56]. The Mean Square Error (MSE) is used as objective function as follows: where M is the total number of the samples in the training set; er i is output error; Y i,P and Y i,A are predicted and actual values, respectively. Notably, for not large data sets, the Levenberg-Marquardt algorithm (LM) [57,58] is a suitable method for training ANN structures. The advantage of the LM method is recognizable through its fast and stable convergence [59]. In this approach, the weights of an ANN model can be adapted by Equation (4) [57]: where J denotes the Jacobian matrix; I represents the identity matrix; λ is the learning rate parameter.

Firefly Algorithm (FA) for Optimizatizing Flash Flood Model
FA is a swarm-based algorithm proposed by Yang [60], which was inspired by the flashing communication of fireflies. The pattern of firefly flashes is unique where each firefly in the swarm is attracted to brighter ones, and meanwhile, it explores and searches for prey randomly. FA is considered as a global optimization method, in which, an advanced swarm intelligence is used to search and find the best solution, effectively [61]. Thus, FA has proven as a highly suitable tool for dealing with complex optimization problems in continuous space, including the problem of neural network training [62,63]. Recent studies have shown excellent performances of FA when applied in various domains [64][65][66][67]. In general, the FA method utilizes the following rules [68]: • All fireflies of a swarm are unisex; therefore, a firefly will be attracted to other fireflies without paying attention to their sex.

•
The attractiveness degree of a firefly is directly related to its brightness. The attractiveness will be decreased when the distance is increased. If no bright signal is received from other fireflies, the firefly will move randomly.

•
The brightness of a firefly is determined intern of cost function.
The FA pseudo code is illustrated in Figure 4 below:

Begin FA Establish the cost function f(x)
Create an initial swarm with n fireflies Relate the light intensity I to f(x) and determine the absorption coefficient γL The light intensity I(r) is computed using Equation (5) as follows: where Io represents the light intensity of the firefly source; γL is the light absorption coefficient; and r denotes the distance from the firefly source. The attractiveness degree β of a firefly in the swarm is estimated using Equation (6): Distance of any two fireflies xi and xj in the swarm in dimensional space (D) is defined using Equation (7) as follows: When a specific firefly xi gets bright signal from firefly xj, it will move to the ith firefly using Equation (8) below: where γL is the light absorption coefficient; β0 is the attractiveness at rij = 0;  denotes a trade-off constant; and  is a random number deriving from the Gaussian distribution.

Study Area
The study area (see Figure 5) covers two districts-Bac Ha and Bao Yen (BHBY)-which belong to Lao Cai Province in the northwestern area of Vietnam. BHBY occupies an area of about 1510.4 km 2 , between longitudes of 104°10′ E-105°37′ E and latitudes of 22°5′ N-22°40′ N. The altitude ranges between 38.9 m at the river valleys to 1878.69 m above sea level at the mountain range of Bac Ha. This is typically a mountainous region with a complex network of rivers. Two main rivers flow in the study area, the Hong River and Chay River. The first one, which bisects the province and flows through the study area with a length of approximately 28.7 km is the biggest river. The second one is the major river flowing from north to south, with an estimated length of 91.6 km. The light intensity I(r) is computed using Equation (5) as follows: where I o represents the light intensity of the firefly source; γ L is the light absorption coefficient; and r denotes the distance from the firefly source. The attractiveness degree β of a firefly in the swarm is estimated using Equation (6): Distance of any two fireflies x i and x j in the swarm in dimensional space (D) is defined using Equation (7) as follows: When a specific firefly x i gets bright signal from firefly x j , it will move to the ith firefly using Equation (8) below: where γ L is the light absorption coefficient; β 0 is the attractiveness at r ij = 0; α denotes a trade-off constant; and ω is a random number deriving from the Gaussian distribution.

Study Area
The study area (see Figure 5) covers two districts-Bac Ha and Bao Yen (BHBY)-which belong to Lao Cai Province in the northwestern area of Vietnam. BHBY occupies an area of about 1510.4 km 2 , between longitudes of 104 • 10 E-105 • 37 E and latitudes of 22 • 5 N-22 • 40 N. The altitude ranges between 38.9 m at the river valleys to 1878.69 m above sea level at the mountain range of Bac Ha. This is typically a mountainous region with a complex network of rivers. Two main rivers flow in the study area, the Hong River and Chay River. The first one, which bisects the province and flows through the study area with a length of approximately 28.7 km is the biggest river. The second one is the major river flowing from north to south, with an estimated length of 91.6 km. Since the BHBY is a typical mountainous area, it has a cold-dry climate, which often lasts from October to March. The other months from April to September correspond to the rainy season. According to the Lao Cai statistical yearbook from 2010-2016 (measured at the Bac Ha station) [69], monthly rainfall varied from 9.0 mm (March 2010) to 540 mm (August 2016) and the total rainfall per year was from 1280.2 mm (2015) to 1844.9 mm (2016). More than 80% of the total rainfall per year was received in the rainy season. The rainfall is concentrated especially in three months (June to August), with the total rainfall of these three months accounting for more than 50% of the yearly rainfall [69]. For the period of 2010-2016, the annual average temperature varied from 19.27 °C and 23.77 °C with the lowest monthly temperature being 10.6 °C in January 2014 (measured at the Bac Ha station) and the highest monthly temperature was 29.5 °C in June 2015 (measured at the Bao Yen station) [69].
Total population of the study area is 145,208 people in 2017 [69] and they mainly belong to ethnic minority groups that are highly vulnerable to natural hazards, especially flash floods, due to population growth and deforestation [70]. For instance, recent severe and torrential rainstorms caused by a tropical depression occurred on October 2017 in northern Vietnam (including the study area) created widespread flash floods and destroyed more than 16,000 houses.

Flood Inventory Map and Conditioning Factors
Prediction of flash-flood prone areas in this research is based on a statistical assumption that future-flash flooding areas are governed by the same conditions which generated flash-flooded zones in the present and the past [30]. Therefore, flash-flood inventories and their geo-environmental conditions (i.e., topological, climatic, and hydrological characteristics) in the past and present must be extensively studied and collected [20,28]. Since the BHBY is a typical mountainous area, it has a cold-dry climate, which often lasts from October to March. The other months from April to September correspond to the rainy season. According to the Lao Cai statistical yearbook from 2010-2016 (measured at the Bac Ha station) [69], monthly rainfall varied from 9.0 mm (March 2010) to 540 mm (August 2016) and the total rainfall per year was from 1280.2 mm (2015) to 1844.9 mm (2016). More than 80% of the total rainfall per year was received in the rainy season. The rainfall is concentrated especially in three months (June to August), with the total rainfall of these three months accounting for more than 50% of the yearly rainfall [69]. For the period of 2010-2016, the annual average temperature varied from 19.27 • C and 23.77 • C with the lowest monthly temperature being 10.6 • C in January 2014 (measured at the Bac Ha station) and the highest monthly temperature was 29.5 • C in June 2015 (measured at the Bao Yen station) [69].
Total population of the study area is 145,208 people in 2017 [69] and they mainly belong to ethnic minority groups that are highly vulnerable to natural hazards, especially flash floods, due to population growth and deforestation [70]. For instance, recent severe and torrential rainstorms caused by a tropical depression occurred on October 2017 in northern Vietnam (including the study area) created widespread flash floods and destroyed more than 16,000 houses.

Flood Inventory Map and Conditioning Factors
Prediction of flash-flood prone areas in this research is based on a statistical assumption that future-flash flooding areas are governed by the same conditions which generated flash-flooded zones in the present and the past [30]. Therefore, flash-flood inventories and their geo-environmental conditions (i.e., topological, climatic, and hydrological characteristics) in the past and present must be extensively studied and collected [20,28].
In this research, the flash-flood inventory map with 654 flash flood polygons was used (see Figure 5). The map was constructed based on the change detection of the Sentinel-1A SAR imagery as mentioned in Section 2.1. Although the data for this study is from 2017, however, flash floods are recurrent events; therefore, severe flash flood locations in the BHBY area were revealed. The next step is to determine flash-flood influencing factors, a crucial task. Literature review shows that it is still no consensus on which factors must be used, and in general, factors should be selected based on flash-flood characteristics and the availability of geospatial data in the study areas [28,71]. Accordingly, a total of 12 conditioning factors were considered in this study: elevation (IF1), slope (IF2), aspect (IF3), curvature (IF4), topographic wetness index (TWI) (IF5), stream power index (SPI) (IF6), toposhade (IF7), stream density (IF8), rainfall (IF9), normalized difference vegetation index (IF10), soil type (IF11), and lithology (IF12).
To prepare data for flash-flood modeling, a GIS database (see Figure 6)  In this research, the flash-flood inventory map with 654 flash flood polygons was used (see Figure 5). The map was constructed based on the change detection of the Sentinel-1A SAR imagery as mentioned in Section 2.1. Although the data for this study is from 2017, however, flash floods are recurrent events; therefore, severe flash flood locations in the BHBY area were revealed. The next step is to determine flash-flood influencing factors, a crucial task. Literature review shows that it is still no consensus on which factors must be used, and in general, factors should be selected based on flash-flood characteristics and the availability of geospatial data in the study areas [28,71]. Accordingly, a total of 12 conditioning factors were considered in this study: elevation (IF1), slope (IF2), aspect (IF3), curvature (IF4), topographic wetness index (TWI) (IF5), stream power index (SPI) (IF6), toposhade (IF7), stream density (IF8), rainfall (IF9), normalized difference vegetation index (IF10), soil type (IF11), and lithology (IF12).
To prepare data for flash-flood modeling, a GIS database (see Figure 6)   Next, a Python tool was programed by the authors to generate the flash-flood susceptibility map in the form of the indices produced by the flash-flood model in the ArcGIS environment. The compiled inventory database includes two class outputs: "flood" and "non-flood". As stated above, in this study, 654 flood locations have been recorded; therefore, 654 data samples of the "flood" label are extracted from the flood inventory map. Because flash-flood modeling in this research is based on machine learning classification, which is different to that of traditional flood modeling approaches; therefore, 654 data samples of non-flood areas are randomly generated from not-yet flood areas [73]. Herein, equal proportion of the samples is suggested to use for avoiding bias [73][74][75]. Consequently, a total of 1308 data samples are derived. Next, a Python tool was programed by the authors to generate the flash-flood susceptibility map in the form of the indices produced by the flash-flood model in the ArcGIS environment. The compiled inventory database includes two class outputs: "flood" and "non-flood". As stated above, in this study, 654 flood locations have been recorded; therefore, 654 data samples of the "flood" label are extracted from the flood inventory map. Because flash-flood modeling in this research is based on machine learning classification, which is different to that of traditional flood modeling approaches; therefore, 654 data samples of non-flood areas are randomly generated from not-yet flood areas [73]. Herein, equal proportion of the samples is suggested to use for avoiding bias [73][74][75]. Consequently, a total of 1308 data samples are derived.

The Proposed Metaheuristic-Optimized Neural Network Model for Flash Flood Susceptibility Prediction
This section provides description of the proposed flash flood prediction model that integrates the ANN machine-learning model and the FA metaheuristic approach improved by the Levenberg-Marquardt (LM) algorithm. The hybrid method of FA and LM, denoted as FA-LM, is proposed as the method for training the ANN model. After being trained, the FA-LM trained ANN, denoted as FA-LM-ANN, can assign class labels (either non-flash flood or flash flood) to each input information containing the aforementioned 12 conditioning factors.
The overall structure of the proposed model is depicted in Figure 8.

The Proposed Metaheuristic-Optimized Neural Network Model for Flash Flood Susceptibility Prediction
This section provides description of the proposed flash flood prediction model that integrates the ANN machine-learning model and the FA metaheuristic approach improved by the Levenberg-Marquardt (LM) algorithm. The hybrid method of FA and LM, denoted as FA-LM, is proposed as the method for training the ANN model. After being trained, the FA-LM trained ANN, denoted as FA-LM-ANN, can assign class labels (either non-flash flood or flash flood) to each input information containing the aforementioned 12 conditioning factors.
The overall structure of the proposed model is depicted in Figure 8.

Encoding the ANN Structure for Flash Flood Modeling
The structure of an ANN model is generally determined by its weight matrices W1 and W2. The size of the matrix W1 is NR × NI + 1 where NR and NI denote hidden neurons and input neurons, respectively. It is noted that the number of column of W1 is NI + 1 to include a vector of bias. In this analysis, NI = 12 which is the number of flash flood conditioning factors. The number of neurons in the hidden layer should be large enough to facilitate the learning and inferring complex mapping functions. However, the value of NR should not be too large since the resulting ANN model can be difficult to train and exceedingly complex model is highly susceptible to overfitting.
According to the recommendation of Heaton [76], NR is roughly set to be NR = 2NI/3 + NO, where NI = 12 (flash flood conditioning factors) and NO = 2 (output or flood susceptibility). Moreover, a value of NR that exceeds 1.5 × NI often results in longer training time without significant improvements in predictive accuracy. Based on such suggestions and several trial-and-error runs, NR for the ANN trained with the collected data set is chosen to be 9. Moreover, the size of the matrix W2 is NO × NR + 1. Notably, it is required that a solution of the FA-LM algorithm is coded in forms of a vector. Hence,

Encoding the ANN Structure for Flash Flood Modeling
The structure of an ANN model is generally determined by its weight matrices W 1 and W 2 . The size of the matrix W 1 is N R × N I + 1 where N R and N I denote hidden neurons and input neurons, respectively. It is noted that the number of column of W 1 is N I + 1 to include a vector of bias. In this analysis, N I = 12 which is the number of flash flood conditioning factors. The number of neurons in the hidden layer should be large enough to facilitate the learning and inferring complex mapping functions. However, the value of N R should not be too large since the resulting ANN model can be difficult to train and exceedingly complex model is highly susceptible to overfitting.
According to the recommendation of Heaton [76], N R is roughly set to be N R = 2N I /3 + N O , where N I = 12 (flash flood conditioning factors) and N O = 2 (output or flood susceptibility). Moreover, a value of N R that exceeds 1.5 × N I often results in longer training time without significant improvements in predictive accuracy. Based on such suggestions and several trial-and-error runs, N R for the ANN trained with the collected data set is chosen to be 9. Moreover, the size of the matrix W 2 is N O × N R + 1. Notably, it is required that a solution of the FA-LM algorithm is coded in forms of a vector. Hence, the two matrices W 1 and W 2 are vectorized and then concatenated to construct a solution. Accordingly, the total number of decision variables optimized by the FA-LM optimization is estimated as N R × (N I + 1) + N O × (N R + 1) and equal to 137.

Proposed Cost Function for Flash-Flood Modeling
During the searching process of the FA-LM optimization, to exhibit the appropriateness of each solution, a cost function must be defined. The cost function (CF) of the FA-LM algorithm is given as follows: where MSE TR and MSE VA denote the mean squared error (MSE) for the training dataset (80% of the total model construction samples) and the validating dataset (20% of the total model construction samples), respectively. The rationale of the cost function described in Equation (9) is to guide the FA-LM searching process to minimize the prediction error for both the training dataset and the validating dataset. The reason for the inclusion of validating data sample in the calculation of the cost function is to alleviate overfitting. It is noted that overfitting happens when the constructed model has a very good performance on the training set; however, its performance when predicting novel input data is very poor. Thus, it is important that the ANN model have good prediction accuracy in both training set and validating set.

The FA-LM Algorithm: A Hybridization of Metaheuristic Optimization and LM Backpropagation
The FA-LM optimization algorithm is employed in this study as the training algorithm. FA-LM is a combination of FA and LM backpropagation algorithms. The FA metaheuristic algorithm plays the role as the main optimization method. Based on the initially created population, this algorithm guides the population of ANN model structures to better solutions. Since the problem of constructing an ANN model from a data set is highly complex and features many local minima [53], the application of FA as metaheuristic approach can help the training process to avoid local convergence and reduce the possibility of local traps. It is noted that the LM algorithm has been implemented via the help of the MATLAB's Statistics and Machine Learning Toolbox [77]. In addition, the FA and the hybrid FA-LM algorithms have been programmed in MATLAB by the authors.
In addition, the LM backpropagation is used as a local search method at certain generations during the FA optimization process. Aiming at accelerating the optimization process as well as preventing the stagnation of the FA's population, the backpropagation with LM algorithm is performed with a randomly selected solution once in 10 generations. This integrated algorithm of FA-LM is illustrated via the pseudo code given in Figure 9. It is noted that the population size of the FA is 100 and the search domain of [−10, 10]. The population is then optimized by the FA-LM algorithm with the maximum number of generation (G MAX ) = 1000. The LM backpropagation is performed with a randomly selected member of the current population. For reducing the computational expense, the LM backpropagation is activated one times in 10 generations. The number of backpropagation training epochs is 1000 and the learning rate used is 0.01, respectively. After being the FA-LM optimization process is accomplished, the trained ANN model is ready for the task of spatial prediction of flash flood occurrences.
Set the range of solution RX = [−10, +10], population size PS = 100 Generate an initial population X within RX Define the cost function CF, locate the best-found solution Xbest Set the current generation iter = 1 and switching probability p = 0.

Training Results and Performance Assessment
As mentioned earlier, the dataset consisting of 1308 samples is used to construct and verify the ANN based flash flood susceptibility prediction model. This data set is randomly divided into two separated groups: data for model construction (70%) and data for testing (30%) [16,20,31,78,79]. The first group is further partition into two subsets of the training set (80% of the model construction samples) and the validating set (20% of the model construction samples), respectively. Moreover, it is noted that the 12 flood influencing factors have been converted from categorical classes (shown in Figure 7) into continuous values within the range of 0.01 and 0.99 using the approaches described in Tien Bui, et al. [80]. The process of this data conversion process is to fend off the situation where large values of flash-flood conditioning factors dominate other with small values. Accordingly, the statistical description of the flash flood influencing factors is provided in Table 2.

Training Results and Performance Assessment
As mentioned earlier, the dataset consisting of 1308 samples is used to construct and verify the ANN based flash flood susceptibility prediction model. This data set is randomly divided into two separated groups: data for model construction (70%) and data for testing (30%) [16,20,31,78,79]. The first group is further partition into two subsets of the training set (80% of the model construction samples) and the validating set (20% of the model construction samples), respectively. Moreover, it is noted that the 12 flood influencing factors have been converted from categorical classes (shown in Figure 7) into continuous values within the range of 0.01 and 0.99 using the approaches described in Tien Bui, et al. [80]. The process of this data conversion process is to fend off the situation where large values of flash-flood conditioning factors dominate other with small values. Accordingly, the statistical description of the flash flood influencing factors is provided in Table 2.
It is also worth noticing that to further facilitate the training phase of ANN, the data set is then normalized by the Z-score transformation [81]. The formula of the Z-score transformation is described in the following equation: where IF N and IF O denotes the normalized and the original influencing factor (IF), respectively. m IF and s IF are the mean value and the standard deviation of the IF, respectively. Additionally, to compute the predictive performance of the flash-flood model, the classification accuracy rate (CAR) for class i is calculated using Equation (11): where R i C and R i A are the number of samples in class i-th being categorized correctly and the total number of samples in class i-th, respectively. It is worth reminding that there are two class labels, flash flood and non-flash flood.
Performance of the flash-flood models, beside CAR, other statistical metrics can be used i.e., true positive rate (TPR), false positive rate (FPR), false negative rate (FNR), and true negative rate (TNR) [82,83]: where TP is true positive; TN is true negative; FP is false positive, and FN is false negative. In addition, the precision and recall, which are computed using Equations (13) and (14) below, can be used: In addition to the above performance measurement indices, the Receiver Operating Characteristic (ROC) curve [84] is also used to summary the overall performance of the flash-flood model and a better model is characterized by a high value of AUC.
The optimization process of the hybrid algorithm of FA and LM is illustrated in Figure 10. It can be seen from the figure that the proposed training algorithm can help the ANN model to converge quickly within the allowable number of optimization iteration. The predictive performance of the proposed FA-LM-ANN model is reported in Table 3. It can be seen that the FA-LM-ANN model has obtained good performances in both training (CAR = 92.188% and AUC = 0.985) and testing phase (CAR = 93.750% and AUC = 0.970). The model also achieves desiring values of Precision (0.938) and Recall (0.968) in the testing phase. The ROCs of the FA-LM-ANN are illustrated in Figure 11.   The final trained FA-LM-ANN model in this research is shown in Figure 12, where the total of 137 weight parameters have been searched and optimized using the proposed FA-LM algorithm. In addition, details of the predicted and actual output data in both the training and testing sets are illustrated in Figure 13. To simplify the presentation of the figure, the class labels of non-flood and flood have been encoded as 0 and 1, respectively. The mean and the standard deviation of the prediction deviation of the data in the training set are 0.039 and 0.320, respectively. For the data in the testing set, the mean and the standard deviation of the prediction deviation are 0.050 and 0.324, respectively.     The final trained FA-LM-ANN model in this research is shown in Figure 12, where the total of 137 weight parameters have been searched and optimized using the proposed FA-LM algorithm. In addition, details of the predicted and actual output data in both the training and testing sets are illustrated in Figure 13. To simplify the presentation of the figure, the class labels of non-flood and flood have been encoded as 0 and 1, respectively. The mean and the standard deviation of the prediction deviation of the data in the training set are 0.039 and 0.320, respectively. For the data in the testing set, the mean and the standard deviation of the prediction deviation are 0.050 and 0.324, respectively.  Figure 12, where the total of 137 weight parameters have been searched and optimized using the proposed FA-LM algorithm. In addition, details of the predicted and actual output data in both the training and testing sets are illustrated in Figure 13. To simplify the presentation of the figure, the class labels of non-flood and flood have been encoded as 0 and 1, respectively. The mean and the standard deviation of the prediction deviation of the data in the training set are 0.039 and 0.320, respectively. For the data in the testing set, the mean and the standard deviation of the prediction deviation are 0.050 and 0.324, respectively.

Model Comparison
For the purpose of result comparison, the performance of the proposed FA-LM ANN is benchmarked against those of the LM-ANN, FA-ANN, support vector machine (SVM) and classification tree (CT). The reason for selecting these models is that both SVM and CT have been successfully employed in flood susceptibility assessment [16,20,38,39,84] and other natural hazards such as landslides [36,38,[85][86][87]. These benchmark models are implemented in MATLAB environment via the Statistics and Machine Learning Toolbox [77]. The methods of ANN trained with the conventional backpropagation algorithm are employed in spatial prediction of natural hazards [37,39,88]. In addition, by comparing the performances of the ANN trained with the metaheuristic approach of FA and the proposed FA-LM ANN can help to point out the advantage of the new hybrid ANN's training algorithm.
To employ the LM-ANN, FA-ANN, SVM, and CT models, it is necessary to select their tuning parameters. In this section, the tuning parameters that lead to the best testing performance of models are selected. For the DT model, the minimal number of observations per tree leaf is selected to 1 as per default settings in MATLAB toolbox [77]. The crucial parameter of LM-ANN and FA-ANN is Nr (the number of hidden neurons). In the experiment, as suggested by Heaton [76], this parameter of these two ANN models is set to be 9 which is equal to Nr of the proposed FA-LM-ANN. In addition, the maximum number of training epochs = 5000 is used to train the LM-ANN model and the FA-ANN is optimized with a maximum number of iteration = 1000. For the SVM model, the regularization parameter and the RBF kernel parameter are selected based on the grid search as explained in Hoang and Bui [89].
The prediction results of the prediction models are summarized in Table 4. Considering the model performances in the testing phase, the proposed FA-LM-ANN model has achieved the highest values of CAR (93.750%), AUC (0.970), Precision (0.938), and Recall (0.968). The second-best model is SVM with CAR = 91.667%, AUC = 0.960, Precision = 0.909, and Recall = 0.968, followed by FA-ANN, CT, and LM-ANN. It can be noticed that there is an improvement in CAR when the ANN model is trained by the FA algorithm (91.667%) instead of the LM backpropagation (88.931%); however, the AUC value of the first approach (0.917) is worse than that of the second approach (0.937). In addition, Figure 14 provides the comparison of the convergence rates between the two ANN training

Model Comparison
For the purpose of result comparison, the performance of the proposed FA-LM ANN is benchmarked against those of the LM-ANN, FA-ANN, support vector machine (SVM) and classification tree (CT). The reason for selecting these models is that both SVM and CT have been successfully employed in flood susceptibility assessment [16,20,38,39,84] and other natural hazards such as landslides [36,38,[85][86][87]. These benchmark models are implemented in MATLAB environment via the Statistics and Machine Learning Toolbox [77]. The methods of ANN trained with the conventional backpropagation algorithm are employed in spatial prediction of natural hazards [37,39,88]. In addition, by comparing the performances of the ANN trained with the metaheuristic approach of FA and the proposed FA-LM ANN can help to point out the advantage of the new hybrid ANN's training algorithm.
To employ the LM-ANN, FA-ANN, SVM, and CT models, it is necessary to select their tuning parameters. In this section, the tuning parameters that lead to the best testing performance of models are selected. For the DT model, the minimal number of observations per tree leaf is selected to 1 as per default settings in MATLAB toolbox [77]. The crucial parameter of LM-ANN and FA-ANN is Nr (the number of hidden neurons). In the experiment, as suggested by Heaton [76], this parameter of these two ANN models is set to be 9 which is equal to Nr of the proposed FA-LM-ANN. In addition, the maximum number of training epochs = 5000 is used to train the LM-ANN model and the FA-ANN is optimized with a maximum number of iteration = 1000. For the SVM model, the regularization parameter and the RBF kernel parameter are selected based on the grid search as explained in Hoang and Bui [89].
The prediction results of the prediction models are summarized in Table 4. Considering the model performances in the testing phase, the proposed FA-LM-ANN model has achieved the highest values of CAR (93.750%), AUC (0.970), Precision (0.938), and Recall (0.968). The second-best model is SVM with CAR = 91.667%, AUC = 0.960, Precision = 0.909, and Recall = 0.968, followed by FA-ANN, CT, and LM-ANN. It can be noticed that there is an improvement in CAR when the ANN model is trained by the FA algorithm (91.667%) instead of the LM backpropagation (88.931%); however, the AUC value of the first approach (0.917) is worse than that of the second approach (0.937). In addition, Figure 14 provides the comparison of the convergence rates between the two ANN training approaches of FA-LM and LM. It can be observed from this figure that the convergence of the model training phase performed by FA-LM is faster than that performed by LM.   To further confirm the predictive capability of the proposed model, a ten-fold cross validation process is also performed in this section. Using the cross validation process, the training and testing phase of the prediction models are carried out 10 times. In each time, 90% of the data set is employed for model construction; 10% of the data set is reserved for model testing. The experimental outcomes are reported in Table 5   To further confirm the predictive capability of the proposed model, a ten-fold cross validation process is also performed in this section. Using the cross validation process, the training and testing phase of the prediction models are carried out 10 times. In each time, 90% of the data set is employed for model construction; 10% of the data set is reserved for model testing. The experimental outcomes are reported in Table 5  Overall, comparing with FA-ANN and LM-ANN, there are significant enhancements in terms of both CAR and AUC when the ANN is constructed by means of the hybrid FA-LM approach.

Establishment of the Flash Flood Susceptibility Map
Because both the training and testing results have pointed out that FA-LM-ANN is the best model for the dataset collected in the BHBY area, the model is then employed to compute the flash-flood susceptibility for each of all the pixels in the study area.
The predictive results of flash flood susceptibility are transformed to a grid format using the python tool (mentioned in Section 3. 2) and open in ArcGIS 10.4 software (ESRI Inc., Redlands, CA, USA). Based on these computed indices, the flash-flood susceptibility map (see Figure 15) was obtained and visualized by mean of five classes: very high, high, low, very low, and no. The thresholds for dividing these computed indices into the five classes were determined by using the natural break classification method [90].

Establishment of the Flash Flood Susceptibility Map
Because both the training and testing results have pointed out that FA-LM-ANN is the best model for the dataset collected in the BHBY area, the model is then employed to compute the flashflood susceptibility for each of all the pixels in the study area.
The predictive results of flash flood susceptibility are transformed to a grid format using the python tool (mentioned in Section 3. 2) and open in ArcGIS 10.4 software (ESRI Inc., Redlands, CA, USA). Based on these computed indices, the flash-flood susceptibility map (see Figure 15) was obtained and visualized by mean of five classes: very high, high, low, very low, and no. The thresholds for dividing these computed indices into the five classes were determined by using the natural break classification method [90].
Interpretation of the flash-flood susceptibility map shows that all flash flood locations are located in the two classes, very high and high, indicating that that the proposed FA-LM-ANN model has successfully determined flash flood prone areas. Interpretation of the flash-flood susceptibility map shows that all flash flood locations are located in the two classes, very high and high, indicating that that the proposed FA-LM-ANN model has successfully determined flash flood prone areas.

Conclusions
This research proposes a new methodology using Sentinel-1 SAR imagery and machine learning techniques for spatial prediction of flash flood hazards. The SAR imagery was used to detect flash flood locations, whereas the proposed FA-LM-ANN was used to establish the flash flood prediction model. The methodology was applied for the Bac Ha Bao Yen (BHBY) area, a most flood-prone area in Vietnam. Accordingly, the GIS database was established containing the information regarding historical cases of flash flood events and 12 flood-conditioning factors.
The advantage of the Sentinel-1 SAR imagery with the change detection method is the ability to capture and detect flash flood areas with high accuracy. However, flash floods often occur in a short time; therefore, this method is feasible for flash flood mapping if the Sentinel sensor captures the images at the time of flash flood occurrence. Regarding the proposed FA-LM-ANN, this artificial intelligence model is capable to meliorate the model performance. This is because FA is employed as a swarm intelligence method to optimize the parameter of ANN so that a decision boundary for classification of non-flood and flood locations can be identified accurately, whereas LM backpropagation serves as a local search method to increase the convergence of the swarm intelligence-based training algorithm.
Because the proposed FA-LM-ANN is constructed with 12 input neurons, nine hidden neurons, and one output neuron, which results in 119 weights, therefore, the search space of the FA has 119 dimensions. In other words, the coordination of each firefly consists of 119 parameters. The swarm of 100 fireflies was used with 1000 running iterations have resulted in 100,000 searches for possible combinations the weighs of the FA-LM-ANN model. Consequently, the high prediction capability of the proposed flash-flood model indicates that the hybridization of FA-a metaheuristic algorithm and the LM backpropagation has trained the model successfully.
Compared to benchmarks like LM-ANN, FA-ANN, SVM, and DT, the prediction result of the proposed model is better; therefore, it can be concluded that the proposed FA-LM ANN is a very promising tool to assist decision makers, especially local authorities, in developing effective flash flood countermeasures and land-use planning. Future extensions of the current study may include applying the newly constructed model for predicting flood risks in other study areas and enhancing the learning capability of the proposed model with other metaheuristic optimization algorithms.