Flood Susceptibility Assessment Using Novel Ensemble of Hyperpipes and Support Vector Regression Algorithms

Recurrent floods are one of the major global threats among people, particularly in developing countries like India, as this nation has a tropical monsoon type of climate. Therefore, flood susceptibility (FS) mapping is indeed necessary to overcome this type of natural hazard phenomena. With this in mind, we evaluated the prediction performance of FS mapping in the Koiya River basin, Eastern India. The present research work was done through preparation of a sophisticated flood inventory map; eight flood conditioning variables were selected based on the topography and hydro-climatological condition, and by applying the novel ensemble approach of hyperpipes (HP) and support vector regression (SVR) machine learning (ML) algorithms. The ensemble approach of HP-SVR was also compared with the stand-alone ML algorithms of HP and SVR. In relative importance of variables, distance to river was the most dominant factor for flood occurrences followed by rainfall, land use land cover (LULC), and normalized difference vegetation index (NDVI). The validation and accuracy assessment of FS maps was done through five popular statistical methods. The result of accuracy evaluation showed that the ensemble approach is the most optimal model (AUC = 0.915, sensitivity = 0.932, specificity = 0.902, accuracy = 0.928 and Kappa = 0.835) in FS assessment, followed by HP (AUC = 0.885) and SVR (AUC = 0.871).


Introduction
Among the several natural disasters, floods are the more frequent and costly, as they lead to massive damage of life and property including biodiversity and ecological degradation through soil losses [1][2][3]. Floods may be defined as the overflowing of river water from its channel course and cause inundation of surrounding areas [4]. Floods are one of the most devastating natural phenomena occurring throughout the world, mainly caused by long period of rainfalls or snowmelt with the amalgamation of other adverse geoenvironmental conditions [5]. Recently, occurrences of floods have also been responsible for human interventions in the environment and have been occurring more frequently than before, basically due to the large scale environmental degradation through population growth, river side flood plain encroachment, urbanization, deforestation, and more [6,7]. Research studies have shown that by the end of 2050, more than 1.3 billion people will be living in flood risk areas [8], and experiencing hazardous flooding phenomena. Moreover, changing climatic conditions associated with huge amounts of rainfall within short time periods are also responsible for recurrent occurrence of floods, including flash floods [9,10]. Flood occurrence is seen as a risk when a large group of people is affected by the flood and it is responsible for loss of infrastructure, settlements, and more. Therefore, the frequency of flood risk has been largely impacting on the possible damages of flooding. It is also fact that the damages caused by flooding are not quantifiable in a precise form [11].
The scenario of floods in Asia is also devastating, as more than 90% of destruction is caused on this continent by flood hazards [12]. It is also mentioned here that tropical rivers are the more frequent in the occurrences of flood [13]. Developing countries like India have also noticed frequent flood hazards due to various favorable conditions related to flood occurrences. Studies revealed that India is the most badly flood affected country after Bangladesh [14]. In India, floods that occurred throughout the country in the year of 2000 were the most noteworthy as declared by government of India, as a total of approximately 5560.65 cores of economy were losses along with 2.21 cores people were affected [9]. In 2013, the Central Water Commission (CWC) of India estimated that 7.21 million hectares of land inundated and nearly 32 million people were affected by flooding [15]. Several areas of India are affected by devastating floods, among them Kerala flooding in August 2018 was most damaging as nearly 4.3 and 1.4 million people were affected and displaced, respectively, along with a death toll of 433 people. Moreover, flooding in Assam (2016), Bihar (2020), Mumbai (2005), and Uttrakhand (2013) were severe cases and are the more frequent flood prone areas. In the state of West Bengal, more than 50% of the area has been affected by devastating floods, particularly in the district of Maldah, Murshidabad, Birbhum, and Purba Bardhaman, and these districts are badly affected by flooding every year of various magnitudes and intensities. Therefore, the flood risk phenomena cannot be disregarded, and to mitigate this risk, proper identification and spatial prediction of flood prone areas is necessary through flood susceptibility (FS) mapping by using several methods.
Spatial analysis of environmental assessment has been carried out by using remote sensing (RS) [16][17][18][19][20][21] and geographic information system (GIS) [22,23] tools. Hence, in the last few decades, FS mapping has been carried out through several statistical methods. In recent years, various fuzzy [24][25][26][27][28], deep learning [29][30][31][32][33][34], multiple criteria decisionmaking (MCDM) [35][36][37], and statistical [38][39][40][41][42][43] models have used by researchers. The most used statistical methods in FS assessment are frequency ratio (FR) [38], analytical hierarchy process (AHP) [39], and more. Thereafter, with the advance of time and to fulfill the lacking of statistical methods, several machine learning (ML) algorithms have been widely used for FS assessment among research groups throughout the world. The frequently used ML algorithms for FS assessment are random forest (RF) [6], support vector machine (SVM) [40], functional tree (FT) [41], evidential belief function (EBF) [42], bio-geography based optimization (BBO) [9], and more. Afterward, the ensemble approach has been evolved to prepare more accurate prediction results of FS assessment. Researchers try to combine two or more statistical or ML algorithms to develop an ensemble approach for better prediction performance analysis. Research studies have shown that ensemble approaches have been more widely used to evaluate FS mapping than any stand-alone ML algorithms. Several ensemble approaches have been used for FS assessment, including ensemble of tree based ML algorithms [6], EBF-LR [42], and more.
In this research, we have chosen a sub-tropical river basin of Koiya River as a study area for FS assessment. Our main research objective is to prepare spatial prediction of FS maps for mitigation and sustainable management of flooding activities. Basically, this type of research study is not based on a specific time period, usually this type of work has analyzed the spatial prediction of flooding and not within a certain time period, until we have considered the climate change phenomena. If we considered prediction of FS under climate change activities, then it should be bounded by certain time period. However, the duration of flood time window (few days to few weeks) has been largely dependent on local geo-environmental factors i.e., duration and intensity of rainfall, percentage of vegetation cover, elevation and more. Thus, to progress in our work on FS assessment, here we have selected a total of eight flood susceptibility conditioning factors (FSCFs) based on the topography and hydro-climatological conditions, and these factors are land use land cover (LULC), soil type, rainfall, normalized difference vegetation index (NDVI), distance to river, elevation, topographic wetness index (TWI), and stream power index (SPI), as these condition have direct or indirect influence on the occurrences of flood. All of these factors are widely responsible for recurrent flooding and the associated risks. An inventory flood map was also prepared based on the historical data of 132 flood points, because an inventory map is the foremost condition for modelling FS assessment. The variables' importance for occurrences of flood was identified through the random forest (RF) algorithm. The present research work of FS modeling and mapping was carried out by using the two ML algorithms of hyperpipes (HP) and support vector regression (SVR); and the novel ensemble of HP-SVR approach was employed for better analysis of prediction result. Literature study has shown that the HP algorithm was used in landslide susceptibility analysis and several ensemble approaches with HP gave better result than single HP. Thus, the best of our knowledge and intensive studies have shown that the ensemble of HP-SVR has not been used before for FS mapping; therefore, this ensemble approach is the novelty in this research study. The respective model's output result was validated through five popular statistical indices including receiver operating characteristics (ROC) curve analysis.

Study Area
The present research work was carried out in a non-perennial river of Koiya river basin and it is a main tributary of the Mayurakshi River. The frequent occurrences of flood in the Koiya river basin are a very well-known phenomena, particularly during the peak monsoon month. In eastern India, peak monsoon month (maximum rainfall received during this time) generally varies from July to September due to monsoon-dominated climatic characteristics. The Kopai and Bakreshwar River join together to form Koiya River and flow through the Birbhum and Murshidabad districts. The spatial extension of this basin area is from 23 • 37 41 N to 23 • 57 52 N latitude and 87 • 16 54 E to 88 • 08 50 E longitude with an aerial coverage of 1433.69 km 2 . (Figure 1). The entire river basin area is dominated by productive agricultural land, except some portion of the upper course in the basin. The climate in this study area is dominated by tropical monsoon type, and the maximum and minimum temperature ranges from 12 • C to 17 • C and 35 • to 40 • C, respectively. Maximum rainfall occurs in the peak monsoon months (July to September) and average annual rainfall varies from 1200 to 1700 mm [42]. Elevation in this study area ranges from 4 to 145 m, therefore lower courses of the basin area are highly flood prone during the rainy monsoon season.

Methodology
The methodological flow chart of the present research on spatial prediction of FS assessment is shown in Figure 2 and was carried out by following several steps.

I.
Preparation of flood inventory map and several flood susceptibility conditioning factors (FSCFs): a total of 264, in which 132 historical flood points were collected from the multi-hazard district disaster management plan of Birbhum for the two respective year of 2017 and 2018 (http://www.birbhum.gov.in/DMD/MH_DM_Plan_Birbhum) and verified through Google earth satellite images. Alongside this, an extensive field survey was carried out to check flood level marker posts during the flood time. Afterward, 132 non-flood points were randomly selected throughout the river basin area with the help of the ArcGIS platform. Additionally, a total of eight FSCFs were chosen, namely land use land cover (LULC), soil types, rainfall, normalized difference vegetation index (NDVI), distance to river, elevation, topographic wetness index (TWI), and stream power index (SPI) based on the local geo-environmental conditions for further progress of our research work. II.
Multi-collinearity analysis was carried out among the selected factors by using tolerance (TOL) and variance inflation factor (VIF) techniques to reduce the bias. III.
Relative importance of eight variables and their sub-classes was analyzed through the mean decrease accuracy (MDA) method of the random forest (RF) algorithm and step-wise weight assessment ratio analysis (SWARA). IV.
Flood susceptibility modeling and mapping was done through hyperpipes (HP), support vector regression (SVR) ML algorithms, and their novel ensemble of HP-SVR.

Methodology
The methodological flow chart of the present research on spatial prediction of FS assessment is shown in Figure 2 and was carried out by following several steps.

Flood Inventory Map
In a FS analysis, the surface area can be categorized into two different zones, i.e., sites where floods have already occurred and sites where floods have not occurred but have a possibility of future occurrences. This type of phenomenon has been presented in Multi-collinearity analysis was carried out among the selected factors by using tolerance (TOL) and variance inflation factor (VIF) techniques to reduce the bias. III Relative importance of eight variables and their sub-classes was analyzed through the mean decrease accuracy (MDA) method of the random forest (RF) algorithm and step-wise weight assessment ratio analysis (SWARA).

IV
Flood susceptibility modeling and mapping was done through hyperpipes (HP), support vector regression (SVR) ML algorithms, and their novel ensemble of HP-SVR. V The prediction performance of the aforementioned three models was validated through the statistical methods of sensitivity, specificity, accuracy, receiver operating characteristics-area under curve (ROC-AUC), and Kappa coefficient analysis.

Flood Inventory Map
In a FS analysis, the surface area can be categorized into two different zones, i.e., sites where floods have already occurred and sites where floods have not occurred but have a possibility of future occurrences. This type of phenomenon has been presented in a flood inventory map. Thus, the preparation of a flood inventory map is the most prerequisite step for flood susceptibility assessment by using historical data of flood occurrences [43]. Past and present flood-induced inundation areas have been presented through flood inventory map within a particular river basin area [44]. It is also a very well-known fact that the utmost accuracy of FS depends on a good flood inventory map. Thus, to estimate the future flood occurrence prone areas, it is indeed necessary to carry out a detailed analysis of the historical flood data [45]. In this study, we have also prepared a flood inventory map ( Figure 1). A total of 264 (132 each for flood and non-flood) flood points have been used to do so. We randomly split the entire dataset into 70% (185) and 30% (79) for training and validation purposes, respectively, so that it covered throughout the study area. Historical flood points were collected from multi-hazard district disaster management plan, Birbhum (2017-2018), Google earth satellite image, field survey with a handle GPS (global positioning system) during the time of flood hazard, and by discussing with local people about the intensity of floods. Similarly, non-flood points were incorporated in the inventory map through random selection procedure using ArcGIS 10.4 software. Figure 3 shows some of the ground photographs during the flood time.

Data Preparation
The most vital step for preparation of a FS map is to choose several appropriate FSCFs. In general, FSCFs have been selected based on local geo-environmental factors, thus region-to-region occurrences of flood conditioning factors have also been varied accordingly [46]. Several research works have been done on FS analysis by using various factors like geological (tectonic), climatological, hydrological, geomorphological, and human interventions [19]. Therefore, keeping in view the above fact, we have also chosen eight appropriate FSCFs in this study, based on the previous literature survey and local geo-environmental conditions. These factors are namely land use land cover (LULC), soil type, rainfall, normalized difference vegetation index (NDVI), distance to river, elevation, topographic wetness index (TWI), and stream power index (SPI). All of these factors' thematic map were pre-processed and prepared based on several primary and secondary data sources in the ArcGIS 10.4 environment. The details about the data sources used in this study are presented in Table 1. The details about the FSCFs used in this study are discussed in the following section.

Data Preparation
The most vital step for preparation of a FS map is to choose several appropriate FSCFs. In general, FSCFs have been selected based on local geo-environmental factors, thus region-to-region occurrences of flood conditioning factors have also been varied accordingly [46]. Several research works have been done on FS analysis by using various factors like geological (tectonic), climatological, hydrological, geomorphological, and human interventions [19]. Therefore, keeping in view the above fact, we have also chosen eight appropriate FSCFs in this study, based on the previous literature survey and local geo-environmental conditions. These factors are namely land use land cover (LULC), soil type, rainfall, normalized difference vegetation index (NDVI), distance to river, elevation, topographic wetness index (TWI), and stream power index (SPI). All of these factors' thematic map were pre-processed and prepared based on several primary and secondary data sources in the ArcGIS 10.4 environment. The details about the data sources used in this study are presented in Table 1. The details about the FSCFs used in this study are discussed in the following section. The LULC of an area is largely influenced by surface runoff, infiltration rate, and evapotranspiration [47], and all of these factors directly or indirectly lead to flood occurrences. It is also known fact that there is a negative correlation between the high vegetation densities and frequency of flood occurrences [42]. In the present study, the LULC map was prepared using a Sentinel 2A satellite image, collected from the European Space Agency (ESA). The present LULC map was classified into eight classes, i.e., swamps, water body, arenaceous areas, aquatic spume, agricultural land, fallow land, agricultural fallow, dense forest, and degraded forest (Figure 4a). Among the eight LULC classes, agricultural fallow and agricultural land covered the most (87%) areas ( Figure 5a).

Soil types
Soil is one of the important factors for occurrences of flood [48]. The pattern of the surface runoff, infiltration rate, and associated inundation largely depends on soil texture [43].Furthermore, soil composition determines the water storage capacity, pattern of drainage channel, and permeability of water which mainly causes inundation [25]. The soil map (Figure 4b) in this area was prepared from the National Bureau of Soil Survey and Land Use planning (NBSSLUP) soil report. Table 2 shows the details about several soil types found in this study area. W040 soil type occupies the maximum area (40%), followed by W043 (28%) and W094 (14%) (Figure 5b).

Soil types
Soil is one of the important factors for occurrences of flood [48]. The pattern of the surface runoff, infiltration rate, and associated inundation largely depends on soil texture [43]. Furthermore, soil composition determines the water storage capacity, pattern of drainage channel, and permeability of water which mainly causes inundation [25]. The soil map (Figure 4b) in this area was prepared from the National Bureau of Soil Survey and Land Use planning (NBSSLUP) soil report. Table 2 shows the details about several soil types found in this study area. W040 soil type occupies the maximum area (40%), followed by W043 (28%) and W094 (14%) (Figure 5b).

Rainfall
Rainfall is the most relevant factor for occurrences of flood [49]. The magnitude of flood is largely dependent on the duration and intensity of rainfall. In this study, mean peak months of rainfall data were collected from the India Meteorological Department and India water portal website during the time from1984 to 2018. Subtropical monsoon-dominated eastern India received maximum rainfall during the months of June to September, i.e., peak monsoon months. The reason behind the selection of peak monsoon months is due to the presence of monsoon-dominated climatic characteristics. The time period (may be in a specific month or combination of few months) of maximum rainfall received in a monsoondominated climatic area are not the same every year, it varies from year to year. Thus, here we selected rainfall data for the time period of June to September, as eastern India received maximum rainfall during these peak months. The spatial distribution of rainfall map was prepared by using the inverse distance weighted (IDW) tool in the ArcGIS 10.4 platform. The Koiya river basin rainfall map was classified into six categories (Figure 4c). The rainfall map shows that middle of the northern part and the portion of lower areas received <385.92 mm rainfall and the part of western, southern, and south-eastern zones received >389.52 mm rainfall, in which the class of 387.13 to 388.28 was the maximum percentage (58.46%), followed by 388.29 to 389.52 and 385.93 to 387.12 ( Figure 5c).

Normalized Difference Vegetation Index (NDVI)
The intensity of flood depends on vegetation cover and basically dense vegetation minimized the flood occurrences. Therefore, for estimation of vegetation characteristics, NDVI is widely used as a vegetation indices tool [50]. The NDVI value ranges from 0 to 1, in which 0, 0.2 to 0.4, and >0.5 represent barren or water land, grass land, and forest cover land, respectively [51]. In this study, the NDVI map was prepared through Landsat 8 OLI satellite image using Equation (1). The remote sensing-based satellite data were collected during the cloud-free month, i.e., November, after the rainy season (July to September) when vegetation is in the status of maximum greenery. The NDVI map was classified into six types (Figure 4d) and the class of 0.331 to 0.395 occupied maximum (27.32%) area, followed by the classes of 0.270 to 0.330 and 0.208 to 0.269 (Figure 5d). The map shows that the NDVI value in the upper course of the river basin area is very low, i.e., <0.207, and middle and lower courses of the basin area are covered by agricultural land, grass land, and forest areas, as NDVI value is significantly higher in this area.
where NIR and R is the spectral reflectance of near infrared and red band, respectively.

Distance to River
The assessment of FS is significantly dependent on the distance to river factor [52]. It is a fact that the area close to the river is more prone to flood and the area far away from the river is less prone to flood [53]. In this study, the distance to river map was prepared by using Euclidean distance tool in ArcGIS 10.4 platform. The distance to river map (Figure 4e) was classified into six categories and the class of above 1000 m occupied the maximum (68.21%) area in this study (Figure 5e).

Elevation
Elevation is considered as one of the noteworthy factors for flood occurrences and has been used in many research works [54]. Elevation basically controls the natural flow of water [55]. It is well known that a high elevated area is less vulnerable to flood and a low elevation area is highly affected by flood [35,56]. The elevation map in this study area was prepared using shuttle radar topographic mission (SRTM) of digital elevation model (DEM) of 30 m spatial resolution in GIS platform. The elevation map was also classified into six types ( Figure 4f) and maximum (31%) area was covered by the class of 48 to 61m, i.e., the middle course of river basin area (Figure 5f). The elevation in the upper course area ranges between 61 m and142 m and in the lower course area elevation ranges from 4 m to 34 m. Thus, the lower course of river basin area is highly susceptible to occurrences of flooding.

Topographic Wetness Index (TWI)
The accumulation of flow water in terms of its spreading and depletion of surface water is represented through the TWI [45,57]. The saturation level of topography is indicated by TWI. A high value of TWI indicates the land is well saturated and prone to flood susceptibility and vice-versa [42]. The TWI map in this study area was prepared using Equation (2) and classified into six types (Figure 4g). The class of 7.767 to 10.406 covered the maximum (43.24%) area (Figure 5g). The TWI map shows that throughout the study area, the value of TWI is very low, i.e., < 12.061. In the middle and lower portions of basin area, there are some isolated places where TWI value is significantly higher, i.e., 15.499 to 24.188. Thus, these areas are saturated and influenced by flooding activity.
where, A s is the catchment area in m 2 and β is the gradient of the slope in radians.

Stream Power Index (SPI)
The water-induced surface runoff and associated erosional power are represented through SPI [7]. Higher and lower values of SPI indicate high and low erosional power, respectively, and the associated inundation phenomenon. The SPI map was prepared using Equation (3). The SPI map was also classified into six classes (Figure 4h) and the highest area (30.07%) is covered by the class of 3.706 to 6.500 (Figure 5h). The SPI map shows that the upper course of the basin, some isolated patches of southern portion, along with two sides of the middle river courses are more prone to erosional activities as the SPI value in these areas are significantly high (12.459 to 28.277). On the other side, the lower portion of basin area is dominated by low SPI value and indicates low erosional activities. Thus, due to erosional activities in the higher area of the upper course, sediment has been carried towards the down slope area, i.e., lower course, reduced the water holding capacity of river channel, and caused occurrences of flooding.

Multicollinearity (MC) Test
The multi-collinearity (MC) test is the linear relationship among the several variables and it is a popular statistical method [58]. The MC problem occurs when several independent variables are correlated among each other in a regression model [59]. The MC test was carried out among the chosen FSCFs to reduce the possible error in the FS modelling. In this study, we used tolerance (TOL) and variance inflation factor (VIF) techniques for assessment of the MC test. If the threshold value of VIF is > 5 and TOL is < 0.1, thenthere is a MC problem in a dataset [60]. The following Equations of (4) and (5) were used to calculate the TOL and VIF value, respectively.
where R 2 j indicates the coefficient of determination.

Relative Importance of Factors and Respective Sub-Class Factors
Several appropriate factors were used for FS assessment but not all the factors are equally responsible for the occurrences of flood. Therefore, identifying the relative importance of factors and their respective sub-class is indeed necessary for perfect evaluation of FS assessment. In this study, we used the random forest (RF) algorithm and the step-wise weight assessment ratio analysis (SWARA) method for identification of relative importance factors and sub-class factors, respectively.

Random Forest (RF)
RF is a popular ML algorithm which is based on the ensemble of binary decision tree, proposed by Breiman [61]. The bagging approach has been used to form a decision tree in the RF algorithm during the training phase [62]. The application of RF usually occurs in the case of classification, regression, and unsupervised learning. The advantage of the RF algorithm is that it has the ability to reduce generalization error more than any other ML model. Here, we applied the mean decrease accuracy (MDA) index within the RF model for identification of variables' importance because traditional statistical methods are not capable of handling large data sizes. The following equation was used to calculate the value of variables' importance through the MDA index [63].
where VI represents the relative importance of variables, E tj indicates the out-of-bag (OOB) error on tree t before permuting the values of X j , and EP tj indicates the OOB error on tree t after permuting the values of X j .

Step-Wise Weight Assessment Ratio Analysis (SWARA)
In this study, the importance of sub-class of respective variables was measured through SWARA weight, which was developed by [64]. It is one of the best decision analysis techniques and followed by stepwise weight assessment procedure. Expert opinion is essential for determining weightage in respective fields. Thus, the rank was given based on the expert's knowledge, experience, and proper understanding [65]. The highest and lowest rank are occupied by most important and lowest criterion, respectively [66]. The SWARA method was computed using the following equations [67,68].
where k j is the coefficient.
where w j is the recalculated weight. Finally, the relative weight of criteria was calculated by the following equation.

Hyperpipes (HP)
The algorithm of HP has the ability to perform the classification process in shortest time period with a huge number of variables [69]. The classification process in this algorithm is done based on the simple counts. In a broader way, the HP algorithm has been used in medical science [69], although literature studies also show that this algorithm has been used in various natural hazards susceptibility assessment like landslide [70]. It is a straightforward algorithm that builds a hyperpipe for each class in a given dataset. The function of the HP algorithm was run as follows [71]: I By using the training dataset, a single pipe was developed for each class and this pipe was matching with the respective class. II All the data were analyzed instance by instance. III If attribute value had not occurred yet, each instance value was attached to the respective pipe. IV Comparison of instance value and attribute value was done through class pipes. V Finally, the instances were selected with the respective class pipe for optimal match.
The class of sample counts that has the maximum diverse values can be attributed to a specific class in the full training dataset [70]. For example, the training dataset contains classes in a set and finds each value at least once, therefore every instance tested value accurately fits into that pipe and ultimately is classified by that respective pipe's class [72]. High recall rate can be found with this type of testing dataset in HP algorithm. Thus, various classes will be accurately recognized and the false alarm rate will also be equally identified as instances may also be falsely classified. Moreover, this algorithm is exceptionally fast with minimum memory footprints and especially simple. The illustration of HP can be explained as: in the dataset, all pipes are present as pipe1, pipe2, pipe3, and pipe4; and in that instance the number of matches are represent as 7, 0, 7, and 4, respectively; then the class of pipe3 is assigned as instance value [72].

Support Vector Regression (SVR)
The SVR model was proposed by Vapnik et al. [73] and it is a supervised ML algorithm. Basically, the SVR model was developed using the algorithm of support vector machine (SVM) classifiers [74]. The structure and control complex function was developed within a system through this algorithm. The advantage of SVR model is that it can maximize the nominal margin through regression task analysis [75]. Generally, the SVR model is applied when the training dataset is very complex, and this model solves this dataset through developing several curved margins [76]. The structural risk minimization (SRM) norm is an important parameter in the SVR model as it identifies the relationship between input and output variables [56]. Thus, SRM calculation is necessary in a SVR model and this can be calculated using Equations (10) and (11).
where the input data are represented through z = (z 1 , z 2 , . . . z n ) and the resultant value is shown by y b ∈ R l . In addition to this, v ∈ R l represents the weightage factor, c ∈ R l represents the constant number of the mathematical function, and l represents the data size in the respective model. ∅(z) represents the irregular function to map the input dataset.
To define v and c, the following equation can be used and was developed based on the SRM principles: where P is the penalty factor which balances the model flatness and its risk, ζ b , ζ * b indicates loose variables, and ε represents the optimized performance of the model [77,78]. The following equation was used to solve the optimization problem through Lagrangian function: (12) in which the Lagrangian multipliers are represented by δ b , δ * b , β b and β * b Subsequently, SVR can be calculated by: where the kernel function expressed through m(z, z b ) = φ(z), φ(zb) .

Ensemble of HP-SVR
The ensemble approach may be defined as the combination of several single methods of statistical or ML algorithms. This ensemble model gives better overall prediction performance of the model's output result. The ensemble model always gives a more optimal result than any single stand-alone ML model. Thus, several research studies have shown that various ensemble methods have been used in different natural hazards susceptibility assessment like slope stability [79], landslide [80], flood susceptibility assessment [6], and more. Therefore, in this study, we also used a novel ensemble of the HP and SVR models to get utmost prediction accuracy in the outcome result. The ensemble of these two single models was carried out in the statistical programming package R.

Accuracy Assessment
Validation and accuracy assessment is an important task for any kind of susceptibility modeling. Without validation, the output result does not have any implication in reality. Therefore, in this study we also used five popular statistical methods, namely sensitivity, specificity, accuracy, ROC-AUC, and Kappa coefficient analysis. In this study, accuracy assessment was used to assess the number of pixels in flood and non-flood areas. The following four indices were used for measuring accuracy, true positive (TP), true negative (TN), false positive (FP), and false negative (FN) [44]. ROC-AUC analysis is the most important tool for validation of a model and has been widely used among research groups. The graphical representation of ROC curve is expressed through X and Y axis and is represented as sensitivity (TP) and 1-specificity (FP), respectively. The value of AUC ranges from 0 (poor performance) to 1 (good performance) [81]. On the other hand, Kappa coefficient index method has also been used here to validate the respective models. The value of Kappa coefficient ranges from −1 (unreliable) to +1 (reliable) [82]. The following equations have been used to calculate the several statistical methods used in the validation purposes for this study.

Multi-Collinearity (MC) Analysis
It is a well-known fact that the MC test is indeed necessary for improving the accuracy assessment by removing the bias among the variables for any kind of susceptibility modeling. Thus, in this research study, the MC test was carried out among the several variables to select suitable factors for FS modeling. The MC test was estimated through TOL and VIF techniques and eight appropriate parameters were selected for FS modelling. The result of TOL and VIF in the present research study ranged from 0.493 to 0.969 and 1.032 to 2.029, respectively, which has shown that the MC result is within the permissible threshold and free from MC problems. The lowest and highest TOL value were found in soil (0.493) and distance to river (0.969). Similarly, highest and lowest VIF value were found in soil (2.029) and distance to river (1.032) FS conditioning factors. The result of the MC test in all the variables is shown in Table 3.

Relative Importance of the Variables and Their Sub-Classes
Based on the literature studies and local geo-environmental factors, eight flood conditioning factors were selected and their MC test was carried out for FS modeling. It is also important to know which variables and their respective sub-classes are more important for occurrences of flood. Therefore, to know that, here we applied the method of mean decrease accuracy (MDA) of the RF algorithm and SWARA weight for relative importance of factors and their sub-classes respectively. Table 4 shows the result of relative importance of several factors identified through MDA methods. Thus, the output result shows that distance to river (0.91) is the most important factor for occurrences of flood followed by rainfall (0.84), LULC (0.66), SPI (0.54), soil (0.43), TWI (0.36), NDVI (0.28), and elevation (0.18). Therefore, the above factors are very much responsible for the occurrences of flood in this study area. In the previous paragraph, we described the relative importance of several factors, but it is also necessary to know the importance of each sub-class of several respective factors on flood occurrences. Therefore, here we analyzed the weightage of each factor's sub-classes by using the SWARA method, and the result is shown in Table 5. The result of the SWARA method shows that aquatic spume with weightage of 0.45 is most responsible for flood followed by agricultural land (0.16) and fallow land (0.12) in the LULC factors. On the other hand, arenaceous area, dense forest, and degraded forest with nil value of SWARA weightage are much less responsible for flood. In the case of soil type, W044 (0.

Spatial Assessment of Flood Susceptibility Mapping
The present research work of FS assessment was carried out using two ML algorithms of SVR and HP and one ensemble approach of SVR-HP in the Koiya river basin of Bengal Basin. The aforementioned model's output maps are presented in Figure 6. In the purpose of better understanding the spatial distribution and their variation, all of these model's output maps were classified into five categories by using Jenk's natural breaks methods in the ArcGIS 10.4 platform. These classifications are very low, low, moderate, high, and very high, and have been symbolized with the same color for every zone in the respective maps prepared through ML algorithms. All of these maps have shown near regularity among the several susceptibility zones prepared for FS mapping. The FS map prepared through the SVR model is presented in Figure 6a and respective aerial coverage of five zones, i.e., very low, low, moderate, high, and very high are173.01 (

Evaluation of Validation Performance
The validation and accuracy assessment of predictive performance is indeed necessary for optimal analysis of the respective model's result. In this study, evaluation of validation performance was carried out using the most popular five statistical methods namely sensitivity, specificity, accuracy, AUC, and Kappa coefficient analysis. The aforementioned statistical methods were quantitatively analyzed and were the most consistent for validation and accuracy assessment of respective model's result. Accuracy is determining the quality of the information derived from several sources, generally remote sensed data. Thus, in the purpose of modeling and mapping of susceptibility analysis, where remote sensed data were used for management and decision-making, accuracy measurement is very much important. The success and prediction result of all the models was carried out by using training (flood points) and testing (non-flood points) dataset. The AUC tool is one of most important tools used for validation performance through measuring the diagnostic capability of the model. The graphical construction of the AUC analysis for the training and testing dataset of aforementioned three ML models is presented in Figure 8a,b respectively. In addition, the quantitative results of the aforementioned five statistical indices are presented in Figure 9. The AUC value for the HP-SVR, HP, and SVR models in the training dataset is 0.915, 0.885, and 0.871 ( Figure 8a (Figure 9b) for training and testing dataset respectively. From the above discussion, it is stated that the ensemble model of HP-SVR is the most optimal model for assessment of FS spatial prediction analysis, as it is occupied with the highest value in aforementioned five validation methods, followed by HP and SVR.

Evaluation of Validation Performance
The validation and accuracy assessment of predictive performance is indeed necessary for optimal analysis of the respective model's result. In this study, evaluation of validation performance was carried out using the most popular five statistical methods namely sensitivity, specificity, accuracy, AUC, and Kappa coefficient analysis. The aforementioned statistical methods were quantitatively analyzed and were the most consistent for validation and accuracy assessment of respective model's result. Accuracy is determining the quality of the information derived from several sources, generally remote sensed data. Thus, in the purpose of modeling and mapping of susceptibility analysis, where remote sensed data were used for management and decision-making, accuracy measurement is very much important. The success and prediction result of all the models was carried out by using training (flood points) and testing (non-flood points) dataset. The AUC tool is one of most important tools used for validation performance through measuring the diagnostic capability of the model. The graphical construction of the AUC analysis for the training and testing dataset of aforementioned three ML models is presented in Figure 8a,b respectively. In addition, the quantitative results of the aforementioned five statistical indices are presented in Figure 9. The AUC value for the HP-SVR, HP, and SVR models in the training dataset is 0.915, 0.885, and 0.871 ( Figure 8a) and in the testing dataset is 0.882, 0.858, and 0.849 (Figure 8b), respectively. In the case of sensitivity, the values of HP-SVR, HP, and SVR models in the training and testing dataset are0.932, 0.922, and 0.897 (Figure 8a (Figure 9b) for training and testing dataset respectively. From the above discussion, it is stated that the ensemble model of HP-SVR is the most optimal model for assessment of FS spatial prediction analysis, as it is occupied with the highest value in aforementioned five validation methods, followed by HP and SVR.

Discussion
In the last decade, machine learning and fuzzy models have received considerable attention [83][84][85][86][87], because these methods are so applicable to modeling of spatial data [88][89][90][91][92]. The FSCFs' value evaluation is crucial for planners and policy makers, with a small allocation of resources, for better usage of resources, and optimum productivity. An estimation of the value of spatial FSCF modeling is carried out means of bivariate models when their weighting assignment is evaluated by means of frequency ratio (FR), weights-of-evidence (WOE), logistic regression (LR), analytic hierarchy process (AHP), etc. However, before using models for spatial modeling and mapping, all FSCFs would be given a pre-analyzed weighting or significance assignment [2,53]. A variety of experiments are available in which the relative significance of variables was determined using various techniques before spatial modeling. Different strategies have been evaluated on the same set of FSCFs to pick the best from the collection, but the validity of FSCFs is influenced and identified by the kind of topography and data used to extract such FSCFs. Therefore, due to different variables involved in planning FSCF, the efficiency of models is impaired. Based on the topographical, hydro-climatic, and multi-collinearity analysis we selected eight appropriate FSCFs for this study area. The selection of a system for determining relative significance was therefore not rigid or confined to any particular model. The random forest model was used for determining the relative value of each conditioning factor in the current work, along with SWARA weight method which was also used to identify importance of factor's sub-classes on FS assessment. The results show that the distance to river, rainfall, NDVI, and LULC are of the utmost importance compared to the other factors. As rivers provide the key source of fluvial floods, especially on plains such as the Koiya River mouth, this frequently causes havoc [42]. The intensity of heavy and long rainfall causes a high rate of surface runoff in a downward direction, i.e., towards river channel. Thus, this phenomenon is responsible for increasing the stream's water level and reducing the water holding capacity within the stream channel courses. As a result, associated overflow of water starts from the stream channel and inundation occurs to the nearby surrounding areas due to the present of low-lying area. At the same time, rainfall and LULC are also found as significant predictive factors for flooding due to the nearly flat surface topography in the region analyzed and higher rainfall [42]. LULC and NDVI also play a significant role for FS as vegetation density reduces surface runoff, percolates water to sub-surface areas through tree roots, and minimizes the incidence of flood occurrences. On the other side, other variables i.e., soil, elevation, TWI and SPI have also impacted on FS analysis. The characteristics of soil texture determined the surface flow and erosional activity, which is responsible for reducing the river depth and associated flooding condition. TWI and SPI determined the spatial spreading of surface runoff [45]. In a case study on FS analysis in Koiya river, both statistical, i.e., evidential belief function (EBF) and logistic regression (LR), and ensemble of EBF-LR models performed very well, with their AUCROC system values being 0.890, 0.882, and 0.906 respectively, including additional dependent matrices, assessing their performance [42]. In the current study, we selected HP, SVR, and their ensemble of HP-SVR for modelling and mapping of FS. The validation result showed that AUC value ranges from 0.871 to 0.915 and 0.849 to 0.882 in training and validation elements for all models, respectively. The performance of the ensemble model is excellent, i.e., 0.915 (training) and 0.879 (testing), that is why we are proposing this model for flood susceptibility modeling. In a research study on landslide susceptibility analysis based on HP algorithms, it has been shown that ensemble of HP model with AdaBoost (0.933), Bagging (0.950), Dagging (0.937), and Real AdaBoost (0.968) gave much better performance in AUC analysis than a single HP model (0.895) [71]. Thus, keeping this in view, here, an ensemble of HP-SVR was applied in FS analysis to get better predictive performance. Various differences in both findings using the same models can emerge due to the different topographic structure and the choice of FSCFs [2,93]. The efficiency of the models used during training and validation in this analysis has improved considerably. In the light of projections of global climate change, the use of such advanced tools with high precision to forecast areas potentially impacted by floods is highly important. In the future, however, there will be a rise in the occurrence of inundations owing to changes in rainfall manifestations and the rain will be more torrential [9]. It should be noted that, due to deforestation practices carried out in large areas in the plain region of the Indian Bengal Basin and also due to the expansion of built-up areas causing further conversion to impenetrable areas, the severity of floods will increase in the future [94]. The practices of deforestation and rising intensity and frequency of flooding would have a significant consequence on the potential bed load transport as well. In the event of a flood, sediment transport may be particularly hazardous to the properties and infrastructure components affected by the event. It is therefore of considerable importance that the torrential catchments vulnerable to sediment transport around the Koiya basin should be established in future research work.

Conclusions
As one of the worst flood-affected regions in India, the eastern fringe of the Bengal basins still aging behind proper policies designed to deal with the damages incurred by the threat. The aim should be to reduce the losses in relation to environmental sustainability hydraulic ventures, the development of areas to be used for identification and settlement areas, updates, precision and verifiable areas subject to flood, their vulnerability, etc., in order to conduct the successful decision-making process in the region that is critically affected by the flood occurrence. In the present study, both machine training models, SVR and HP, and their ensemble performed very well, and the AUCROC system, including additional dependent matrices, assessed their performance. The AUC value in the training and validation stages ranged from 0.871 to 0.915 and 0.849 to 0.882, respectively for all the models. The performance of the ensemble model was excellent, i.e., 0.915, which is why we are proposing this model for flood susceptibility modeling. The findings of this analysis can be used effectively by the National Disaster Management Authority, India to improve the precision of flood prediction and warning. The findings of this study can be a valuable guide for central and local authorities in terms of strategic implications, which should give priority to flood-related areas. Data Availability Statement: All flooding data, shapefile and code will be made available on request to the correspondent author's email with appropriate justification.