Urban Flood Hazard Modeling Using Self-Organizing Map Neural Network

: Floods are the most common natural disaster globally and lead to severe damage, especially in urban environments. This study evaluated the efficiency of a self-organizing map neural network (SOMN) algorithm for urban flood hazard mapping in the case of Amol city, Iran. First, a flood inventory database was prepared using field survey data covering 118 flooded points. A 70:30 data ratio was applied for training and validation purposes. Six factors (elevation, slope percent, distance from river, distance from channel, curve number, and precipitation) were selected as predictor variables. After building the model, the odds ratio skill score (ORSS), efficiency (E), true skill statistic (TSS), and the area under the receiver operating characteristic curve (AUC-ROC) were used as evaluation metrics to scrutinize the goodness-of-fit and predictive performance of the model. The results indicated that the SOMN model performed excellently in modeling flood hazard in both the training (AUC = 0.946, E = 0.849, TSS = 0.716, ORSS = 0.954) and validation (AUC = 0.924, E = 0.857, TSS = 0.714, ORSS = 0.945) steps. The model identified around 23% of the Amol city area as being in high or very high flood risk classes that need to be carefully managed. Overall, the results demonstrate that the SOMN model can be used for flood hazard mapping in urban environments and can provide valuable insights about flood risk management. map demonstrated the high spatial heterogeneity of the city’s urban areas and described more precise details, based on the accuracy results. It indicated regions with high and low flood inundation probability. The areas with high flood inundation probability were located in the north, and some were located in the center of Amol city, with low flood inundation probability areas mostly located in the east, south, and also west of the study area. Based on the flood inventory and field survey, zones with the highest (0.92) and lowest (0.07) flood inundation probability were successfully identified by the SOMN model.


Introduction
Urban environments are vulnerable to flood damages due to the density of economic and social assets and amount of infrastructure, and there is increasing attention to implementing flood risk reduction measures. Flood impacts can be mitigated by improved prediction, awareness (early warning), and mapping. In flood studies, it has been widely accepted that absolute flood protection is impossible. Instead, growing attention has been given to the spatial prediction of flood risk. Future flood consequences can be limited through risk assessment and flood management measures such as changes in building codes and land uses, improved flood defenses, selective relocation of vulnerable

Description of the Study Area
The city of Amol in Mazandaran province in northern Iran (36°26′-36°29′ N; 52°19′-53°23′ E) was selected as the study site ( Figure 1). Amol is an old city with a population of 237,528 in 2017, making it the third-largest city in Mazandaran province [31]. It is located at an altitude of 59-137 m above sea level (a.s.l.), and it occupies an area of 27 km 2 , based on Landsat data for 2017. Amol lies on the Mazandaran plain, along the banks of the river Haraz, and it is surrounded mainly by agricultural land to the north, east, and west of the city and high mountains covered by forest to the south. The mean annual precipitation in Amol city is 680 mm. Over recent decades, there has been considerable urbanization in the region, and the changes in land use associated with this urbanization are affecting the risk of flooding in the city. The degradation of vegetation increases in impervious surface, and inadequate drainage networks are increasing surface runoff from precipitation, leading to extensive changes in hydrological processes affecting Amol city [32].

Methodology
A flowchart of the work is presented in Figure 2.

Urban Flood Inventory
The main reason for flood inundation is related to rainfall over the city. Water flows from the areas with higher elevation and joins in lower terrain in the rivers and channels. However, rivers and channels that are across the city have limited capacity for flood. Consequently, areas near rivers and channels are at risk of flooding. Flooded locations in Amol city were identified based on an inventory of inundated areas during the heavy rainfall in 2018-2019, a field survey using GPS (Garmin 76cx; Garmin, Olathe, Kansas, USA), and historical flood inundation data ( Figure 1). These points served as dependent variables for prediction models. Since the input data included an inventory showing areas under flooding during flood events, a flood inventory with a point base map as the dependent variable was considered in the analysis, with each point referring to an actual previously inundated area in Amol city. Only one point was mapped per each flooded area to reduce the effect of spatial autocorrelation between observations (especially when some points are considered for a flooded area), and to avoid uncertainties in mapping flooded boundaries. In preparation of the flood hazard map, we selected 118 flooded points (i.e., flood inundation inventory), which were divided into two groups: model training data (70% of flood inventory, n = 83) and model validation data (30% of flood inventory, n = 35). According to the literature, a 1:1 sampling strategy of presence to absence of floods was applied. Therefore, 118 non-flooded locations were randomly selected in flood-free areas [21,25]. Similar to the flood inundation inventory, non-flooded locations were randomly split into two groups: model training data (70% of non-flooded locations, n = 83) and model validation data (30% of non-flooded locations, n = 35).

Factors Influencing Urban Flood Inundation
There are no universal guidelines for selecting flood conditioning factors in urban areas. In this study, six factors were used as input to the model in order to map urban flood susceptibility, with factor selection based on the literature [14,21,33]. These factors were elevation, slope percent, distance from river, and distance from channel, curve number, and precipitation. A digital elevation model (DEM) with a resolution of 5 m was obtained from Amol city authority ( Figure 3a). It confirmed that the study area could be characterized as lowland, with elevation ranging from 59.3 to 137.3 m a.s.l. The slope percent factor plays a major role in flooding, as it affects the water velocity. In addition, flatlands or lowlands have gentle slopes that represent a constant threat of flooding [4,[34][35][36]. The slope map was extracted from the DEM of the study area in ArcGIS 10.3 (ESRI, Redlands, California, USA) to quantify topographical controls on hydrological processes. It was found that the slope varies from more than 25% in the north to <1% in the south and west of the study area ( Figure 3b). The distance to the river (Figure 3c) plays an important role in urban flood mapping as nearby areas are more affected by flooding. Distance from a channel (Figure 3d) strongly influences runoff conditions, as channels or drainage systems in urban environments collect surface water. Distance from a river and distance from a channel were calculated using the Euclidean Distance module in ArcGIS 10.3 and were estimated to range from 0 to 2322 m and from 0 to 1499 m, respectively. Curve number (CN), a parameter developed by the United States Soil Conservation Service (USCS), is a function of landuse treatments and hydrological conditions, antecedent soil moisture, and soil type. Land-use and hydrologic soil group (HSG) maps were used here to estimate the contribution of rainfall to runoff. The CN map for the study area ( Figure 3e) was extracted based on the land-use map, the HSG map, and a lumped CN value, using the ArcCN-runoff tool in ArcGIS software [15]. As shown in Figure 3e, CN takes values from 40 to 100. Different CN classes were given corresponding codes, with larger values indicating stronger runoff generation capability. The annual rainfall data of 15 rainfall gauges (including Alasht, Amol, Babolsar, Baladeh, Bandar-E-Amirabad, Galugah, Gharakhil, Kiyasar, Kajur, Noshahr, Polsefid, Ramsar, Sari, Dashte-E-Naz, and Siahbisheh) were obtained from the Iranian Meteorological Organization (IRIMO) to produce a mean annual precipitation map. The recorded annual precipitation varies from 672 mm in the east of the study area to 674 mm in the west ( Figure 3f).

Application of the SOMN Algorithm
The SOMN algorithm was first developed by Kohonen (1997) [37] and has been employed in diverse areas of research such as hydrological studies and the definition of hydrological homogenous regions [38,39]. With the SOMN method, it is not necessary to have knowledge about the relationships that exist among input and target variables. This fact helps the modeler to predict target variables in sets of data with unknown relationships. The model applies a topology-preserving transform from input variables with numerous dimensions and a sophisticated nature to a plain output layer [40]. Since SOMN is capable of mapping highly nonlinear high-dimensional input space in lower-dimensional spaces (i.e., usually one or two dimensions), it is commonly also used to visualize a high-dimensional space. For large-scale datasets, it is necessary to reduce dimensionality to the point where further analysis and exploration seems impossible or does not result in new insights. Therefore, since ''dimension reduction of a multivariate dataset'' is one of the advantages of SOMN, it can be highly effective at providing a reduced-dimension representation for spatial modeling of flood inundation. The Euclidean Distance (ED) module is used to specify the weight of the variables [39]. SOMN calculates the ED among the input cells, f, and neurons, M, and looks for the winning neuron (WN) through application of the nearest neighbor rule. The ED is calculated as: where = , , … , , = , , … , , = the ith component of the pth input vector x p , and wjk,i = the weight link of and the neuron located at (j,k) of the Kohonen layer. The WN (M) is calculated as: The SOMN updates the weight vector of the unit i using the so-called "self-organization" learning rule as: where t denotes time, a(t) is the learning rate and ranges between [0,1], and hci(rjk(t)) is the neighborhood kernel around the winner unit c with a neighborhood distance rjk(t). If a small learning rate is taken, the model will take a very long time to converge. On the other hand, if the learning rate is large, the model may oscillate and result in unstable learning, because it may step over a minimum.
In this study, a constant value of 0.6 was selected for a(t) based on a trade-off between speed and accuracy. The process of modeling is repeated until the maximum number of iterations (tmax) is reached or the change in the weight magnitudes is less than the specified threshold. SOMN selects accidental amounts for the initial weights. Then, it looks for the WN by employing the ED. The neuron that has the highest similarity to the input is chosen as the WN in this stage. Finally, SOMN tunes the weights of the WN considering the input vector [25]. This variation diminishes with the distance of the weights to the WN. The procedure is continued for a huge number of cycles to achieve a condition where the layer is unfolded.
The SOMN includes two phases, coarse-tuning (also termed rough-tuning) and fine-tuning [39]. The former is an unsupervised clustering learning process, while the latter is ''learning vector quantization (LVQ)'' based on the former phase. LVQ can be categorized as a supervised learning process [40]. LVQ implements information for the input set to tune the weight of the output maps. The main stages of the SOMN are explained below. In this study, the SOMN model was run for 20,000 training iterations (i.e., 10,000 iterations each in the rough-tuning and fine-training phases).

Accuracy Assessment
Performance assessment of the models is based on a confusion matrix, from which a variety of performance metrics can be derived. True positive (TP), true negative (TN), false positive (FP), and false negative (FN) are the main four elements of the confusion matrix. True positive and true negative indicate, respectively, the probability of correctly predicting presences (i.e., flooded areas) and absences (i.e., non-flooded areas) as observed in nature. Conversely, false positive and false negative indicate, respectively, the probability of incorrectly predicting true absences as presences (i.e., incorrectly predicting a non-flooded location as flooded) and true presences as absences. Formulating these four elements in different conjunction modes produces two main categories of performance metrics: cutoff-dependent and cutoff-independent [41]. The former relies on a predefined threshold value, based on which the final susceptibility map is categorized into two parts: high susceptibility and low susceptibility. By doing so, a variety of cutoff-dependent metrics can be calculated, each addressing different aspects of model behavior. From the different cutoff-dependent metrics, we selected five, namely the true positive rate (TPR), false positive rate (FPR), odds ratio skill score (ORSS), efficiency, and true skill statistic (TSS). Cutoff-independent metrics, as their name implies, are cutoff-invariant and are able to indicate the overall performance of a model. Among these, the receiver operating characteristic (ROC) is a widely used informative curve under which an area (AUC), ranging from 0.5 to 1, is calculated. The closer the AUC value is to 1, the better the performance of the model. The methodological details of the metrics used here are described below.
TPR (also known as recall or sensitivity) and FPR (also known as fall out or 1-specificity) are calculated as: Efficiency, also termed as overall accuracy, represents the overall success of the predictive model [11], and is calculated as: The ORSS metric is the odds ratio rescaled into −1 and +1. The odds ratio describes the odds of true prediction to the odds of false prediction [42] and is considered a useful metric for rare events [41]. Higher values of ORSS indicate that the model is inclined toward true predictions. The ORSS metric is calculated as: The TSS metric indicates the ability of a model to distinguish between the presence and absence localities [43] and is calculated as: The ROC curve, as the overall performance indicator of a model, is drawn by plotting 1specificity on the x-axis against sensitivity on the y-axis [44]. An AUC value of 0.5 indicates randomly driven results, while a value of 1 represents a perfect model. In this study, all the metrics listed above were calculated in two steps: training and validation. The performance results in the training stage represent the goodness-of-fit of the models, whereas those in the validation stage indicate the prediction power of the models. All performance assessments were carried out using the PMT (Performance Measure Tool) tool in ArcGIS 10.3 [11].

Urban Flood Hazard Map
The SOMN model produced a flood hazard map for Amol city by considering the point base data and independent variables that show flood inundation probability (Figure 4a). In fact, flood inundation probability refers to the probability of the flood occurrence of each pixel over the city which is extracted based on observation flood data and conditioning factors using the SOMN model in the current study. In order to simplify the interpretation of the results, the flood hazard map was reclassified into five classes: very low (0-0.2), low (0.2-0.4), medium (0.4-0.6), high (0.6-0.8), and very high (0.8-1) (Figure 4b). This flood hazard zonation map demonstrated the high spatial heterogeneity of the city's urban areas and described more precise details, based on the accuracy results. It indicated regions with high and low flood inundation probability. The areas with high flood inundation probability were located in the north, and some were located in the center of Amol city, with low flood inundation probability areas mostly located in the east, south, and also west of the study area. Based on the flood inventory and field survey, zones with the highest (0.92) and lowest (0.07) flood inundation probability were successfully identified by the SOMN model. The flood inundation probability map as a flood hazard map was classified into the following five classes, using the equal interval classification method in ArcGIS 10.5: very low, low, moderate, high, and very high zones, occupying 11.2%, 39.7%, 25.9%, 17.4%, and 5.8% of the study area, respectively. Due to extreme rainfall intensity in Amol city, in addition to the flooding and demolition of some houses, primary schools in the city were damaged during recent flood events, and children were put at risk. The SOMN model successfully identified the high flood inundation probability and confirmed the urban flooding records. The results showed that Navid, Mustafa, and Forooghdanesh primary schools, Ahmadi high school, and the Technical Training Center of Shahid Madani are at high probability of flood inundation.
With increasing intensity and frequency of floods in recent decades, accompanied by high urban densities and loss of soil infiltration areas, flooding is causing high economic losses in Iran, and flood management has become a severe urban challenge. Sustainable development for urban flood planning involves managing drainage systems, sewer networks, and permeable surfaces, and also land use. It could give economic and social benefits, prevent environmental damage to cities, and enhance economic, social, and environment sustainable development, working toward a higher quality of urban future. Figure 5 shows the ROC of the SOMN model in the training and validation steps. The AUC values indicate that the model achieved outstanding performance in terms of high goodness-of-fit and prediction power. The AUC value was higher in the training stage (Figure 5a) than in the validation stage (Figure 5b), which was expected given that the model was already accustomed to the training data during the modeling process. While the AUC values indicate the good overall performance of the model in correctly predicting the presence of the flooded locations, their usefulness in predicting the absence of the flooded locations (non-flooded points) is not clear, and the results may have been driven by random chance. This feature, together with some other aspects of the SOMN model's performance, was examined using cutoff-dependent metrics.  Table 1, the efficiency values showed the great success of the model inaccurately training and predicting the presence and absence locations, respectively, in the training and validation steps. Low values of FPR and high values of TPR confirmed the results of the efficiency metric, where evidently the model successfully trained and predicted the positive locations. However, based on the equations in Section 3.4, FPR and TPR are unable to provide information regarding model performance in predicting the absence locations. Additionally, although the efficiency metric can highlight the performance of the model in correctly predicting absences (i.e., true negatives), an incorrect prediction of positives as negatives (i.e., false negatives) is not anticipated in its equation. In contrast, TSS provides an overall value of the model's success by using all the elements of the confusion matrix. In this regard, the SOMN model showed an overall acceptable performance (0.7-0.8), according to the ranges proposed by Hosmer and Lemeshow (2000) [45] and Hosmer et al. (2013) [46]. The lower values of TSS in the training and validation steps compared with those of efficiency, FPR, and TPR are because its equation (Equation (7)) includes both FN and FP elements, which both concurrently penalize model performance and therefore result in a lower, yet true and representative, performance value. The ORSS values are quite transparent ( Table 1). The values are in line with the other metrics, as ORSS values close to 1 infer that SOMN has a high potential to produce true predictions and can be considered highly desirable or nearly perfect.

Conclusions
Flood hazard mapping in urban environments is challenging because of a lack of hydraulic and hydrological data, but accurate flood hazard prediction is necessary for urban planning and management. This study applied the self-organizing map neural network (SOMN) algorithm as a machine learning technique for urban flood hazard mapping based on observations of flooded data over Amol city and conditioning factors. The results obtained indicated how the neural network (SOMN) algorithm can be applied to urban flood hazard mapping when there are no detailed hydrologic and hydraulic datasets. In a policy term, the results highlighted that the distance to channels and population density are important factors in flood hazard mapping. Furthermore, the validation results demonstrated that both SOMN models represented reliable results without complex hydrodynamic information. Different evaluation metrics were used to scrutinize the goodness-of-fit and predictive performance of the model. The following conclusions were drawn: The SOMN model showed excellent performance in modeling flood hazard in the both training (AUC = 0.946, E = 0.849, TSS = 0.716, ORSS = 0.954) and validation (AUC = 0.924, E = 0.857, TSS = 0.714, ORSS = 0.945) steps. Therefore, it had an outstanding learning rate and predictive performance. SOMN is an accurate and robust model that can help urban policymakers formulate more efficient plans for controlling flood hazards. Approximately 23% of the study area of Amol city fell into high or very high flood hazard classes. Some primary schools (Navid, Mustafa, and Forooghdanesh), Ahmadi high school, and the Technical Training Center of Shahid Madani were shown to lie in high or very high flood hazard zones. Therefore, these parts of Amol city should be carefully managed. Due to a lack of hydrological and hydraulic data, we used just six predictive factors that influence flood inundation in urban areas. This was a major limitation of the study, and further studies focusing on the role of other factors, such as rainfall intensity, are required.