Flash Flood Susceptibility Mapping in Sinai, Egypt Using Hydromorphic Data, Principal Component Analysis and Logistic Regression

: Flash ﬂoods in the Sinai often cause signiﬁcant damage to infrastructure and even loss of life. In this study, the susceptibility to ﬂash ﬂooding is determined using hydro-morphometric characteristics of the catchments. Basins and their hydro-morphometric features are derived from a digital elevation model from NASA Earthdata. Principal component analysis is used to identify principal components with a clear physical meaning that explains most of the variation in the data. The probability of ﬂash ﬂooding is estimated by logistic regression using the principal components as predictors and by ﬁtting the model to ﬂash ﬂood observations. The model prediction results are cross validated. The logistic model is used to classify Sinai basins into four classes: low, moderate, high and very high susceptibility to ﬂash ﬂooding. The map indicating the susceptibility to ﬂash ﬂooding in Sinai shows that the large basins in the mountain ranges of the southern Sinai have a very high susceptibility for ﬂash ﬂooding, several basins in the southwest Sinai have a high or moderate susceptibility to ﬂash ﬂooding, some sub-basins of wadi El-Arish in the center have a high susceptibility to ﬂash ﬂooding, while smaller to medium-sized basins in ﬂatter areas in the center and north usually have a moderate or low susceptibility to ﬂash ﬂooding. These results are consistent with observations of ﬂash ﬂoods that occurred in different regions of the Sinai and with the ﬁndings or predictions of other studies.


Introduction
Flash floods are among the deadliest natural disasters in the world, responsible for 85% of inundations and a high death rate of more than 5000 people lost each year [1]. Egypt has experienced many flash floods with loss of life and serious damage to vital infrastructure and buildings, especially in the Sinai, the north coast and the Red Sea coast. Well-known examples are the 1979 flash flood in El-Quseir and Marsa Alam, which killed 19 and destroyed a coastal road along the Red Sea, the flash flood in Marsa Alam in 1991, the flash flood in Alexandria in 1993, which killed 21, and the flash flooding in Assiut in November 1994 resulting in loss of life and infrastructure [2]. Recently, in [26][27] October 2016, heavy rainfall in Ras Gharib on the Red Sea coast resulted in flash flooding that killed dozens and caused damage to infrastructure and property [3]. On 14 November 2019, heavy rains led to flooding in wadi El-Sukkari and further to the Idfu-Marsa Alam Road, fortunately without losses [4].
Sinai in particular is a flood-prone area where flash floods cause significant damage to infrastructure, displacement of populations and sometimes loss of life. An overview Arfa [4] used a ranking method of 13 morphometric parameters to derive the degree of flash flood hazard of basins between Marsa Alam and Abu Ghuson along the Red Sea coast. Prama et al. [9] derived a flood hazard map for the Dahab region in southern Sinai, by an unweighted combination of normalized morphometric parameters. El-Fakharany and Mansour [10] evaluated morphometric parameters and the occurrence of flash floods in wadi El-Aawag in southwestern Sinai.
The main aim of this research is to quantify the risk of flash flooding in the Sinai Peninsula, Egypt. Various methods are used for this: (1) hydro-morphometric features are derived that are relevant for estimating the sensitivity to flash flooding using satellite images and spatial analysis tools; (2) principal component analysis is applied to reveal relationships between the different catchment properties and to identify the most significant hydro-morphometric parameters; (3) the probability of flash flooding is estimated by logistic regression using observed events of flash floods as dependent variable and the significant principal components of the hydro-morphometric parameters as explanatory variables; (4) prediction results are cross-validated to assess and prove the robustness of the modeling approach; (5) the logistic model is used to generate a map of flash flood probability for the Sinai Peninsula. The novelty of the approach consists primary in generating a flood susceptibility map for the entire Sinai Peninsula, which has not been presented before; previous studies [9,10,[26][27][28]30,31,34] only considered only some local sub-basins. A complete flash flood susceptibility map for the Sinai Peninsula can be useful to authorities and decision makers for an overall impact assessment and can contribute to flash flood management and to the planning and implementation of mitigation measures. This study also aims to demonstrate how hydro-morphometric parameters can be used for flood risk assessment in Egypt through a combination of principle component analysis and logistic regression that is robust, reliable and validated. To date, most flood susceptibility studies conducted in Egypt have used a ranking method proposed by Davis [36] to derive flash flood hazard from hydro-morphometric data, by standardizing morphometric parameters, usually in the range of 1 to 5, and then combining and classifying them into groups ranging from lowest to highest risk level [25][26][27][28][29][30][31][32][33][34][35]. However, the number of parameters can vary, all parameters are treated with the same weight as if they have an equal impact on flooding, and the classification into hazard classes is done without rules or standards. In addition, the results are usually not validated.

Study Area
The Sinai Peninsula is located in the northeastern part of Egypt between 32.5-34.8 • E and 27.8-31.3 • N ( Figure 1). It is about 61,000 km 2 in size and its largest dimensions are about 385 km from north to south and 210 km from west to east. Geographically, Sinai can be divided into three parts. The northern part consists of broad coastal plains with fossil beaches and extensive sand dunes, some of which are more than 100 m high. The main part is the wadi El-Arish basin which descends from an altitude of more than 900 m to the Mediterranean Sea and forms the largest valley of the Sinai Peninsula. The center is highland mainly composed of two limestone plateaus, El-Tih in the south and El-Egma in the north, where the sources of the wadi Al-Arish arise. The southern part consists of high and rugged mountain ranges of igneous rock, reaching more than 2400 m, with Mount Catherine at 2642 m above sea level being the highest point in Egypt.
Sinai is characterized by a Mediterranean climate in the north and an arid to semi-arid climate in the center and south. In general, summer is very hot and dry, while most rain falls in winter with occasional heavy rainfall combined with thunderstorms. The amount of precipitation decreases from north to south. Most rain falls in a narrow strip along the Mediterranean Sea with values of more than 200 mm/year; further inland the rainfall varies from 100 to 200 mm per year in the north, while in the center and south this usually less than 100 mm per year. Since the potential evaporation demand far exceeds rainfall, there are no real streams or rivers but only ephemeral riverbeds, referred to as wadis, which are generally dry but discharge drainage water after heavy rainfall usually in winter. Wadi El-Arish is the main drainage system to the Mediterranean in the north and center of Sinai. There are several smaller wadis in the south, some of which are known for flash floods, such as Watir, Dahab and Kid in the east which flow into the Gulf of Aqaba and Ras Sudr, Werdan, Feiran, Sedr, Gharandal and Meiar in the west which flow into the Gulf of Suez [6]. Sinai is characterized by a Mediterranean climate in the north and an arid to semi-arid climate in the center and south. In general, summer is very hot and dry, while most rain falls in winter with occasional heavy rainfall combined with thunderstorms. The amount of precipitation decreases from north to south. Most rain falls in a narrow strip along the Mediterranean Sea with values of more than 200 mm/year; further inland the rainfall varies from 100 to 200 mm per year in the north, while in the center and south this usually less than 100 mm per year. Since the potential evaporation demand far exceeds rainfall, there are no real streams or rivers but only ephemeral riverbeds, referred to as wadis, which are generally dry but discharge drainage water after heavy rainfall usually in winter. Wadi El-Arish is the main drainage system to the Mediterranean in the north and center of Sinai. There are several smaller wadis in the south, some of which are known for flash floods, such as Watir, Dahab and Kid in the east which flow into the Gulf of Aqaba and Ras Sudr, Werdan, Feiran, Sedr, Gharandal and Meiar in the west which flow into the Gulf of Suez [6].

Hydro-Morphometric Parameters
An ASTER Global Digital Elevation Model (DEM) version 3 was downloaded from the NASA Earthdata website (Available online: https://search.earthdata.nasa.gov; accessed on: 15 September 2021). The DEM has a latitude and longitude resolution of 1 arc-second (~30 m), and the elevation data have a resolution of 1 m and an accuracy of approximately 10 m [37]. Topographic elevations in the Sinai range from zero to 2612 m above mean sea level as shown in Figure 2. ArcGIS spatial analysis tools are utilized to delineate watersheds and determine their hydro-morphometric parameters. The drain-

Hydro-Morphometric Parameters
An ASTER Global Digital Elevation Model (DEM) version 3 was downloaded from the NASA Earthdata website (Available online: https://search.earthdata.nasa.gov; accessed on 15 September 2021). The DEM has a latitude and longitude resolution of 1 arc-second (~30 m), and the elevation data have a resolution of 1 m and an accuracy of approximately 10 m [37]. Topographic elevations in the Sinai range from zero to 2612 m above mean sea level as shown in Figure 2. ArcGIS spatial analysis tools are utilized to delineate watersheds and determine their hydro-morphometric parameters. The drainage network is extracted from the DEM using the stream order method of Strahler [31] with a threshold of 4.5 km 2 for the upslope drainage area as the starting point of first order streams, which corresponds 5000 grid cells, a standard recommended by ArcGIS spatial analysis tools (Available online: https://pro.arcgis.com; accessed on 22 January 2022). Sub-basins are delineated based on stream orders and hydro-morphometric parameters are derived for each sub-basin using standard spatial analyses methods and equations as listed in Table 1. rameters are derived for each sub-basin using standard spatial analyses methods and equations as listed in Table 1.  Mean slope Spatial analysis where n is the number of stream orders in a basin, Ni is the number of stream segments of order i, and Li is the length of stream segments of order i.  Perimeter Spatial analysis - Basin length Spatial analysis [11] Form factor Compactness coefficient Elongation ratio R e = 2 √ A/π/L b [16] Drainage network S u [-] Stream order Spatial analysis Bifurcation ratio Stream frequency Length of overland flow Texture ratio Basin relief Spatial analysis Relief ratio Ruggedness number Mean slope Spatial analysis -Where n is the number of stream orders in a basin, N i is the number of stream segments of order i, and L i is the length of stream segments of order i. Three groups of parameters are considered. The basin geometry group includes six parameters: • Area (A): surface of a drainage basin, which is a prime determinant of the total discharge [31]; large catchments receive more precipitation and have a higher peak discharge compared to smaller catchments. Form factor (F f ): ratio of the width to the length of a catchment and indicative of the flood regime [11]; large form factors lead to shorter lag times and a higher peak discharge.

•
Compactness coefficient (C c ): ratio of the perimeter of the drainage basin to that of a circle of equal area; low values imply a shorter concentration time and a higher peak discharge [11,12]. • Elongation ratio (R e ): ratio of the diameter of a circle with the same area as the catchment area to the maximum catchment length [16]; low values mean less circular shape and longer flood concentration time.
The basin drainage network group includes eight parameters: • Stream order (S u ): highest stream order in a basin according to the method designed by [14]; it is an indicative parameter of the basin dimensions, channel size and stream discharge. • Stream number (N u ): total number of stream segments of all orders [15]; a high stream number is expected to imply faster peak flow. • Bifurcation ratio (R b ): average ratio between the number of streams of one order and those of the next higher order [12]; indicative of the complexity of a catchment, but according to [14] less so for the flow regime, although [12] considers flooding more likely in catchments with a higher bifurcation ratio.

•
Stream length (L u ): total length of all streams in a basin [11]; longer streams indicate a higher discharge producing capacity of a catchment area [39]. • Drainage density (D d ): length of streams per unit area; an indicator of infiltration and permeability of a drainage [11]. • Stream frequency (F s ): number of streams per unit area; although similar to drainage density, it has less hydrologic significance [11,12]. • Length of overland flow (L o ): the average length of overland flow is equal to the reciprocal of twice the drainage density [16]; low values indicate shorter flow paths, making the basin more prone to flash flooding. • Texture ratio (R t ): total number of first order streams per basin circumference; indicates coarse, medium, or fine textured topography [40].
The basin relief group includes four parameters: • Basin relief (R f ): height difference of the lowest and highest points of a basin and an essential indicator of surface runoff [16]. • Relief ratio (R r ): ratio of the basin relief to the basin length and a key element for understanding erosion and drainage [16]. • Ruggedness number (R n ): product of drainage density and basin relief; regions prone to flash flooding have higher ruggedness numbers, indicating high drainage density combined with steep slopes [38].

•
Mean basin slope (S): major factor controlling infiltration and surface runoff and the resulting runoff rate and concentration time.

Principal Component Analysis
Before creating a predictive model, principal component analysis (PCA) [41] is used as an exploratory data analysis to identify the relevant information in the hydro-morphometric data set. PCA linearly transforms the data into orthogonal uncorrelated variables known as principal components (PCs), which preserve the total variance in the original data where PC j are the principal components, a ij are the scores of the linear transformation, x i are the standardized hydro-morphometric parameters, and n is the number of parameters. Note that the hydro-morphometric parameters are standardized to remove any effect of scale and units of the observations by subtracting the sample mean and dividing by the sample standard deviation. It can be shown that the principal components are the eigenvectors of the correlation matrix of the data and that the associated eigenvalues give the variance explained by each eigenvector [41]. The eigenvectors are ordered in descending order of the eigenvalues and PCs with eigenvalues smaller than one are ignored because they contain less information than the original variables, which reduces the dimensionality of the data. The scores that relate the remaining PCs to the original parameters provide information about the impact and relevance of the original parameters on the overall information in the data set. In practice, correlation coefficients between the PCs and the original parameters are used to explain and interpret the strength of the relationships. Large (either positive or negative) correlation coefficients indicate that a parameter has a strong effect on that principal component. The interpretation is enhanced by Varimax rotation, which aligns the PCs in directions highlighting the relationships between the PCs and the observed data [41].

Logistic Regression
A logistic model is used to predict the probability of a flash flood where logit is the logistic function (natural logarithm of the odds), p is the probability, c 0 is the model intercept, c i are the model coefficients, y i are the predictors (explanatory parameters), and m is the number of predictors. The logistic model is used to predict the occurrence of a flash flood in a basin using observed characteristics of the basin as predictors. Since there are too many hydro-morphometric parameters to use as predictors, we will instead consider the significant PCs of the basin hydro-morphometric data as predictors. The logistic model predicts the flood probability which can be used to assess the susceptibility of a basin to flash flooding. The model coefficients are estimated by logistic regression. For this, we use observations of flash floods reported in the literature [5,6,[8][9][10]; basins where flooding has been observed are shown in Table 2. The observed probability of flooding is set to one for these basins, while for the other basins the probability is zero. Flash floods in Wadi El-Arish were excluded in the analysis because Wadi El-Arish consists of many sub-basins, while the exact location of the floods was usually not clearly observed or reported because Wadi El-Arish is so vast and sparsely populated.
The model coefficients are estimated by fitting the model to these observations using maximum likelihood, for which we use the glm generalized maximum likelihood fitting procedure of the R Stats package for statistical computing [42]. The goodness of fit is assessed by the deviance, a measure of the likelihood, and the quality of the model by the Akaike information criterion (AIC) [43], a trade-off between the goodness of fit and the complexity of the model. The significance of each predictor is verified by removing each predictor one at the time, re-estimating the model coefficients with the remaining predictors and comparing the resulting deviance and AIC with the original model. The reliability of the model is also verified by cross-validation where observed flood events are removed one by one, model coefficients are re-estimated by logistic regression with the remaining data and the flood probability is predicted for the removed event and compared to what was obtained with the original model.

Drainage Catchments and Hydro-Morphometric Data
The drainage network obtained from the DEM is shown in Figure 2 and consists of 112 sub-basins of different sizes and shapes, as shown in Figure 3. Large basins such as wadis El-Arish, Feiran, Dahab and Watir are subdivided to obtain a more or less uniform spatial distribution of the sub-basins over the area. Values of the hydro-morphometric parameters for each basin are given in Table S1 in the Supplementary Materials. An overview of the range of the hydro-morphometric parameters is given in Table 3, with minimum and maximum values and the mean and standard deviation necessary for standardization of the parameters in the PCA.

Principal Component Analysis
The results of the PCA are presented in Tables 4 and 5. Table 4 lists the first eight eigenvalues, and the variance explained by each component, and cumulative variance, both expressed as a percentage of the total variance contained in the data. Significant values are indicated in bold. Only the first four PCs have eigenvalues greater than one, while the fifth eigenvalue is lower but very close to one, and together these account for

Principal Component Analysis
The results of the PCA are presented in Tables 4 and 5. Table 4 lists the first eight eigenvalues, and the variance explained by each component, and cumulative variance, both expressed as a percentage of the total variance contained in the data. Significant values are indicated in bold. Only the first four PCs have eigenvalues greater than one, while the fifth eigenvalue is lower but very close to one, and together these account for 90% of the variation in the data. Therefore, the other PCs can be ignored. Table 5 lists the correlation coefficients between the first five PCs and the hydro-morphometric parameters after Varimax rotation; the corresponding scores are given in Table S2 in the Supplementary Materials. Significant values in Table 5 are shown in bold. The first component is highly correlated with several hydro-morphometric parameters: area (A), perimeter (P) and basin length (L b ) which are directly related to the size of a watershed, and with stream order (S u ), stream number (N u ), stream length (L u ) and texture ratio (R t ) which are also indirectly related to size. Thus, the first and most important principal component represents the effect of basin size and accounts for 37% of the variation in the data. The second component is strongly correlated with all relief parameters: basin relief (R f ), relief ratio (R r ), ruggedness number (R n ) and mean basin slope (S); this component accounts for 19% of the variation in the data. The third component, which accounts for 15% of the total variance, is strongly correlated with drainage density (D d ) and length of overland flow (L o ). This component thus represents the drainage capacity of a river basin. The fourth component accounts for 14% of the total variance and is strongly correlated with the form factor (F f ), compactness coefficient (C c ) and elongation ratio (R e ), so this component expresses the influence of the basin shape. The fifth component accounts for only 5% of the variance but is rather special in that it is only significantly correlated with the bifurcation ratio (R b ). Thus, this component represents the effect of stream bifurcation, which is apparently a unique basin property unrelated to any other hydro-morphometric parameter. Note that the stream frequency (F s ) is not significantly correlated with any of the PCs and thus contributes little to the information contained in the data. Principal component values for all basins are given Table S3 in the Supplementary Materials.

Logistic Regression
Results of the logistic regression with the PCs as predictors are given in Table 6. The table shows the estimated model coefficients, the standard error, z-value (coefficient divided by standard error) and the probability that the predictor is statistically significant, for which it is common practice to prescribe a value less than 0.05 for a normal distribution. Note that this is only the case for PC 1 and PC 4 and not for PC 3 and PC 5 , while PC 2 is very close to the threshold. However, the assumption of a normal distribution is not reliable if the sample size is small, as in this case. The next two columns in the table provide the deviation and AIC values, which indicate how well the model with the selected predictors fits the observations. The values corresponding to the intercept are for the null model, which is a logistic model with only an intercept and no predictors that is used as a reference to compare with other models. The values corresponding to the predictors are for excluding that predictor from the full model and the values on the last line are for the total full model. Both the deviance and AIC should be as small as possible. Comparison of the deviance and AIC obtained for the total model and for the null model shows a large difference, indicating that the predictors allow significant improvement in goodness of fit. Comparison of the deviance and AIC when one of the predictors is removed from the full model shows that all predictors are relevant and should not be removed from the model. The increase in deviance when one of the predictors is removed compared to the full model also indicates the importance of that predictor in the model. It follows that the order of importance of the predictors is: PC 1 , PC 4 , PC 2 , PC 5 and PC 3 , as given in the last column of Table 6. Results of the cross-validation where observed flood events are removed one by one, and the recalibrated model is used to predict the flood probability for the removed event are given in Table 7. This shows that after removing one of the observed flood basins, the model proves to be robust because the estimates of flood probability remain in the same range as predicted with the original model. The flood probability predicted by the logistic model for all basins is shown in Table S4 in the Supplementary Materials. A comparison between the flood probability predicted by the logistic model and the observations is shown in Figure 4.  The nine basins where flash floods were observed are shown in the upper part of the graph (p = 1) and the 103 basins where no floods were observed in the lower part (p = 0). The solid line represents the logistic model that fits the observations as closely as possible but has to compromise in the middle part of the graph where basins with observed flooding overlap with basins where no flooding has been observed. Since there are many more basins where no flooding has been observed, the logistic model is strongly conditioned by the non-flooding events, as can be clearly seen in the graph. This should be taken into account when evaluating the model results and selecting threshold values to identify the flood prone basins. The red dotted line in the graph represents the mean outcome predicted by the model given by the intercept, logit(p) = c 0 = −4.74 ( Table 6). Note that all basin where floods have been observed are on the right side of this line, while all basins predicted by the model on the left of this line have a near zero predicted flooding probability. Basins predicted to the right of the red line are thus prone to flooding with a probability that increases the further they are from this line. The blue dotted line in the graph shows the average outcome of the observations, logit(p) = ln(9/103) = −2.44. All basins plotted to the right of this line have a higher probability of flooding than is observed on average, and vice versa. Seven of the basins where flash floods have been observed are to the right of this line. Additionally, to the right of this line are 15 basins where no flooding has been observed, so these basins have features that indicate a higher probability of flash flooding than observed. The black dotted line represents logit(p) equal to zero (p = 0.5). There are only six basins with a predicted logit(p) greater than zero (p > 0.5) and thus very sensitive to flash flooding; five of these, wadis Kid, El-Aawag, Feiran upstream, Dahab and Watir, are basins where flooding has been observed and one, wadi Feiran downstream, where no flooding has been assumed but has similar characteristics to the other five.
The above considerations are used to classify the flash flood susceptibility of all basins. Flood sensitivity classes are defined as follows: logit(p) < −4.74 is low sensitivity to flooding, −4.74 < logit(p) < −2.44 is moderate sensitivity, −2.44 < logit(p) < 0 is high sensitivity and logit(p) > 0 is very high sensitivity. The resulting map with the sensitivity of all basins is shown in Figure 5.  (2)) using the principal components of the hydro-morphometric parameters as predictors.

Discussion
Results of the PCA (Table 5) indicate that the hydro-morphometric characteristics of the Sinai drainage basins can be combined into five groups that account for 90% of the variation of the data and have clear physical meaning. Ordered in decreasing importance in explaining the total variance, the first group includes all hydro-morphometric param-  (2)) using the principal components of the hydro-morphometric parameters as predictors.

Discussion
Results of the PCA (Table 5) indicate that the hydro-morphometric characteristics of the Sinai drainage basins can be combined into five groups that account for 90% of the variation of the data and have clear physical meaning. Ordered in decreasing importance in explaining the total variance, the first group includes all hydro-morphometric parameters related to size, the second group consists of parameters related to relief, the third group of parameters relates to basin shape, the fourth group of parameters relates to drainage capacity and the fifth group consists only of the bifurcation ratio. However, the importance of the basin characteristics on flash flood occurrence is not in order of explained variance in the data, but in a different order based on flash flood prediction, as shown by the logistic regression (Table 6). Most important in predicting flash flooding is the basin size, followed by drainage density, relief, bifurcation ratio and basin shape. The latter two are not statistically significant in the model regression, but nevertheless appear to be relevant in terms of likelihood combined with model complexity (AIC, Table 5), and therefore should not be removed from the model. The importance of a predictor is given by the regression coefficients (Table 5). It follows that basin size and drainage density are about two times more important than relief, stream bifurcation or basin shape.
Not surprisingly, the size of a basin is the most important factor in predicting flash flooding. In the case where local thunderstorms are more or less spatially random, the probability of an extreme thunderstorm in a large basin will be greater than in a small basin, increasing the risk of flash flooding in large basins. Equally important is the converse that less or no flash flooding is observed in small basins. The PC representing the drainage density is the second most important predictor of flash flooding, but surprisingly the regression coefficient is negative, so that the flood probability decreases with increasing drainage density, which is contradictory to what is commonly believed. The reason is that flash flooding and drainage density are examined here on a regional scale. Locally, higher drainage density may result in faster drainage, but this is not necessarily the case when comparing drainage densities of basins of different sizes and characteristics. The drainage density of basins expresses how channels and surrounding floodplains are spatially arranged, which is strongly determined by the shape, size and relief of the basin. A low drainage density can indicate large, compact basins with a strong relief, and on the other hand, a high drainage density can indicate small basins with flat elongated or dispersed floodplains. In the present case, the values of the drainage densities for all basins where flash flooding have been observed vary between 0.38 and 0.47, which is below average (Table 3); these basins are also large with a strong relief and compact shapes.
Relief is only the third most important factor after basin shape and drainage density, which is somewhat unexpected, as relief is usually considered one of the most important factors for flooding [18]. The fourth predictor relates to the bifurcation ratio, which appears to be a unique but minor factor in flash flood prediction that, however, is often ignored in other studies presented in the literature. The last and least important predictor relates to the shape of basins, where, as expected, compact basins are more prone to flooding than elongated bases.
Comparison of these results with results of similar studies in other countries or in Egypt [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33] is fruitless, as all these studies were performed on a much smaller scale than the current study, usually only one basin, and no analysis of variance such as principal components was applied to establish relationships between hydro-morphometric parameters and to identify key parameters related to the total variance, and importantly, no observations of occurring flash floods were used to reveal the predictive power of the parameters to flooding. Therefore, this study shows that a large set of hydro-morphometric parameters can be reduced to a much smaller set without loss of information, indicating redundancy in the data. So, hydro-morphometric parameters are not independent and do not add more information by their number. Therefore, in flood sensitivity analyses, it makes no sense to combine correlated factors that express similar characteristics, as is done in traditional ranking methods where all parameters are combined with the same weight.
The flash flood susceptibility map ( Figure 5) shows the spatial distribution into four categories: very high, high, moderate and low susceptibility. The very high sensitivity zone is located in the mountain ranges of the southern Sinai and consists of six major basins known for their flash flooding, such as wadis El-Aawag, Feiran and Kid and the upstream sub-basins of wadis Dahab and Watir, which in total represent an area of approximately 8000 km 2 , or 15% of the total area of Sinai. The high sensitivity zone is mainly in the center of the Sinai Peninsula and some scattered areas further north. It encompasses 16 river basins, covering a total area of approximately 15,000 km 2 or 28% of the Sinai area, including some upstream sub-basins of wadi El-Arish, sub-basins of wadi Dahab and Watir, and the basins of wadis Sedri, Garf and Werdan which drain into the Gulf of Suez. The basins in the north are wadi El-Beada in Bir Al-Abd, which drains to the Mediterranean, and two sub-basins of wadi El-Harish in El-Hasana and Quasisma, respectively, which may be the source of the flash floods that have been observed in this wadi. The moderate sensitivity zone is mainly located in the north of the Sinai Peninsula and includes 36 watersheds with a total area of about 21,000 km 2 or 39% of the Sinai area. These are usually smaller to medium sized basins located in flatter areas. The low-sensitive zone comprises 45 catchments with a total area of approximately 10,000 km 2 , or 18% of the Sinai area. These are usually very small basins on flat terrains with short drainage paths and few branches. Most are located in the north and center along the periphery of the Sinai Peninsula; some are also found in the southern part of Sinai. The resulting flash flood sensitivity map is largely consistent with flash flood observations that occurred in different regions of the Sinai and with the findings or predictions of other studies [6,9,10,31,34].
In particular, the map indicates the high probability of flash flooding in the mountainous basins of southern Sinai, as observed and reported in several publications, such as wadi Feiran [26], wadi Watir [8,31], wadi Dahab [9] and wadi El-Aawag [10]. The map also indicates the high or moderate sensitivity of some basins in southwestern Sinai draining to the Gulf of Suez as described in the literature, such as wadi Sedri [10], wadi Sudr and wadi Wardan [27]. By indicating the high sensitivity to flooding of some sub-basins in El-Hasana and Quasisma of wadi El-Arish, the map also sheds light on the possible origin of flash floods observed in El-Aris, as discussed by Moawad [7], Elewa et al. [28] and Abdel Ghaffar et al. [30]. Nevertheless, it is clear that these results can be improved as more accurate and detailed information on the characteristics of river basins and the occurrence of flash floods in the Sinai becomes available.
Flood susceptibility mapping presented by other studies conducted in Egypt or other countries [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33] mainly used a classification method consisting of standardization of the morphometric parameters, usually in the range of 1 to 5, and combining the resulting scores without any ranking or weighting as if all parameters have an equal effect on flooding. In addition, there is no validation of the results with field data regarding the occurrence of flash floods, which makes it impossible to verify such simplifying assumptions and approach. In contrast, the method presented in the present study, combining principal component analysis and logistic regression, proves to be robust, reliable and validated and therefore superior to what has been presented before.
The drainage network extracted from satellite data and the sub-division of large basins introduces some bias regarding the range and magnitude of the derived hydromorphometric parameters. Subdivision is, of course, necessary to get a map of the spatial distribution of flash flood susceptibility in the Sinai. A more detailed subdivision could improve the resolution and accuracy of such a map, but the uncertainty about the exact location of flash flood observations needed to optimize the logistic regression model could potentially cause more bias and uncertainty. In this regard, the current study is only a first attempt, and more research is needed to assess accuracy and improve results.

Conclusions
Thunderstorms in the Sinai Peninsula often lead to flash floods that can cause significant damage to infrastructure and sometimes even loss of life. Therefore, this study shows how to derive the susceptibility to flash flooding using hydro-morphometric characteristics of the watersheds. Subbasins of various sizes and shapes and their hydro-morphometric features are derived from a digital elevation model from NASA Earthdata. Principal component analysis reveals the relationships between the most important hydro-morphometric parameters and allows us to derive five significant principal components that explain 90% of the variation in the data and have clear physical significance: basin size, drainage density, relief, stream bifurcation and basin shape. This shows that hydro-morphometric parameters are not independent which makes traditional ranking methods to estimate flood sensitivity questionable.
The flash flood probability can be estimated by logistic regression using the significant principal components as predictors and the model coefficients estimated by fitting the model to flash flood observations using maximum likelihood. Cross-validation proves that the model is robust because the estimates of flood probability remain similar to those predicted with the original model. The model shows that the size of a basin is the most important factor in predicting flash flooding, followed by drainage density, relief, bifurcation ratio and basin shape. The logistic model can be used to classify all basins in Sinai into four classes: low, moderate, high and very high susceptibility to flash flooding. The resulting map indicates that the large basins in the mountain ranges of the southern Sinai have a very high susceptibility to flash flooding, several basins in the southwestern Sinai have a high or moderate susceptibility to flash flooding, some sub-basins of wadi El-Arish in the center have a high susceptibility to flash flooding, while smaller to medium-sized basins in flatter areas in the center and north usually have a moderate or low susceptibility to flash flooding. These results are consistent with observations of flash floods that occurred in different regions of the Sinai and with the findings or predictions of other studies.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/w14152434/s1, Table S1: Values of the hydro-morphometric parameters for each basin; Table S2: PCA scores after Varimax rotation; Table S3: Values of the principal components; Table S4: Predicted probability for flash flooding.