Flood Risk Assessment of Global Watersheds Based on Multiple Machine Learning Models

: Machine learning algorithms are becoming more and more popular in natural disaster assessment. Although the technology has been tested in ﬂood susceptibility analysis of several watersheds, research on global ﬂood disaster risk assessment based on machine learning methods is still rare. Considering that the watershed is the basic unit of water management, the purpose of this study was to conduct a risk assessment of ﬂoods in the global fourth-level watersheds. Thirteen conditioning factors were selected, including: maximum daily precipitation, precipitation concentration degree, altitude, slope, relief degree of land surface, soil type, Manning coe ﬃ cient, proportion of forest and shrubland, proportion of artiﬁcial surface, proportion of cropland, drainage density, population, and gross domestic product. Four machine learning algorithms were selected in this study: logistic regression, naive Bayes, AdaBoost, and random forest. The global susceptibility assessment model was constructed based on four machine learning algorithms, thirteen conditioning factors, and global ﬂood inventories. The evaluation results of the model show that the random forest performed better in the test, and is an e ﬃ cient and reliable tool in ﬂood susceptibility assessment. Sensitivity analysis of the conditioning factors showed that precipitation concentration degree and Manning coe ﬃ cient were the main factors a ﬀ ecting ﬂood risk in the watersheds. The susceptibility map showed that fourth-level watersheds in the global high-risk area accounted for a large proportion of the total watersheds. With the increase of extreme hydrological events caused by climate change, global ﬂood disasters are still one of the most threatening natural disasters. The global ﬂood susceptibility map from this study can provide a reference for global ﬂood management.

and the scope of their impact is broader [6]. The flood risk in a basin is often related to the local precipitation characteristics, the underlying surface conditions, and the adaptability of the basin to disasters [7,8]. Widespread increases in heavy precipitation events have been observed, even in places where total amounts have decreased [9,10]. For example, statistically significant increases in the occurrence of heavy precipitation have been observed across Europe and North America [11,12]. An increase in population and the rapid development of urban construction also increase the risk of flooding in a basin [13,14]. Areas with more population density, more agricultural land, or more concentrated river networks are often more prone to flood disasters. Therefore, carrying out river basin flood risk assessment on a global scale is important for reducing flood disasters and watershed management [8,15].
Various methods have been used to identify and evaluate flood-susceptible areas. For example, some studies have used multi-criteria decision analysis methods, including the analytic hierarchy process and the expert scoring method [16][17][18]. These methods are based on expert knowledge and are susceptible to uncertainty [19]. Physically based models such as VIC and MIKE models at a regional scale, and other hydrological models at the continental and global scale have also been used to study floods, and have shown great advantages in regional or global flood process research [8,[20][21][22][23]. Recently, machine learning methods such as artificial neural networks (ANN), support vector machines (SVM), and decision trees (DT) have been applied to flood hazard assessments, which can identify and evaluate flood-prone areas based on the training and testing of large amounts of data [24][25][26][27]. By learning the relationship between flooding occurrence and the explanatory factors from the historical flooding records, the machine learning models avoid the subjective determination of weights [28]. The physical model uses a simplified parameter to characterize the physical law, which can simulate a natural or time-continuous phenomenon. This is its advantage over machine learning models. However, in the face of the complexity of global floods, flood models at a global scale often require a large number of model parameters, repeated model debugging, and long computing time [19,[21][22][23][26][27][28]. Machine learning methods may be a good choice for faster access to global flood hazard assessments.
However, current flood risk assessment based on machine learning methods is always concentrated in a single watershed. For example, Tehrany et al. used SVM with different kernel types in flood susceptibility mapping in Kuala Terengganu, Malaysia [29]. Zhao et al. applied a semi-supervised machine learning model in urban flood susceptibility assessment in Beijing, China [28]. In these machine learning models, the input training samples are only attributes of a flood occurrence point or non-occurrence point in the basin. In fact, the occurrence of floods is due to the comprehensive properties of the basin, rather than the attributes of a single sample point. In addition, machine learning methods have rarely been used for global flood risk assessment, so it is necessary to conduct flood risk assessments for global watersheds.
In this study, a machine learning model for global flood risk assessment was built based on 60,863 fourth-level watersheds, four machine learning methods including logistic regression, naive Bayes, AdaBoost, and random forest, and 13 conditioning factors. Based on this, a flood disaster susceptibility map of the global watersheds was constructed to provide reference for global watershed management and flood disaster identification.

Global Fourth-Level Watersheds
The basin is the basic unit of hydrological management. Many institutions around the world have obtained global multi-level watershed mapping through different technical means [30,31]. Due to the remote sensing topographic data error, low spatial resolution, and small ground fluctuation, it is difficult to obtain real digital river networks in plains areas based on digital elevation data and GIS technology, which has a series of adverse effects on the subsequent calculation and evaluation [32].
For plains areas, we adopted the "stream burning" method. Specific steps included: (1) When the Google Earth image was enlarged to the finest resolution, the center line of the river was drawn manually according to the real river image using the line drawing function in the Google Earth.
(2) The original DEM (Digital Elevation Model) was modified based on the correct digital rivers obtained from Google Earth using the stream burning method. (3) After revising the DEM, the correct digital river network was rebuilt by the standard hydrological processes of ArcGIS (D8 method).
Based on the topological relationship of the first-level to fourth-level rivers, we coded the river networks from outlet to source, from large to small, and from coarse to fine. We used the end points of the rivers to obtain the watershed boundaries. The watershed inherited the code of the corresponding river. The global first-level to fourth-level rivers and corresponding watersheds dataset has been published on the Figshare data platform (https://doi.org/10.6084/m9.figshare.8044184.v3) [33]. The global watershed classification included 60,863 fourth-level watersheds, as shown in Figure 1. Google Earth image was enlarged to the finest resolution, the center line of the river was drawn manually according to the real river image using the line drawing function in the Google Earth. (2) The original DEM (Digital Elevation Model) was modified based on the correct digital rivers obtained from Google Earth using the stream burning method. (3) After revising the DEM, the correct digital river network was rebuilt by the standard hydrological processes of ArcGIS (D8 method). Based on the topological relationship of the first-level to fourth-level rivers, we coded the river networks from outlet to source, from large to small, and from coarse to fine. We used the end points of the rivers to obtain the watershed boundaries. The watershed inherited the code of the corresponding river. The global first-level to fourth-level rivers and corresponding watersheds dataset has been published on the Figshare data platform (https://doi.org/10.6084/m9.figshare.8044184.v3) [33]. The global watershed classification included 60,863 fourth-level watersheds, as shown in Figure 1.

Flood Disaster Inventory
Accurate analysis of flood susceptibility requires a precise flood inventory map that shows the locations of flood occurrences [34]. There are several existing flood databases, such as the International Disaster Database (EM-DAT) and the Global Active Archive of Large Flood Events, and in other studies at the global scale, flood extent observations or detailed reconstructions of 2D hydraulic models have been used [22,35]. Various non-conventional sources of information (such as amateur videos, photographs, news reports, etc.) also provide data for the reconstruction of flood events [36]. The flood disaster inventory data in this study were derived from the Global Active Archive of Large Flood Events, Dartmouth Flood Observatory, University of Colorado [37]. The database is supported by NASA, the Japanese Space Agency, and the European Space Agency, and is widely used worldwide. The archive has recorded a large number of flood disaster data since 1985, mostly from news, governmental, instrumental, and remote sensing sources, and provides accurate geographical locations of flood disasters. Therefore, we selected this database as the flood disasters sample data.
For this study, 4730 flood disaster data from January 1985 to March 2019 were selected. Based on ArcGIS 10.5 software, the flood sample dataset and the global fourth-level watershed dataset were superimposed and analyzed, and 3335 watersheds with flood disasters were obtained. The distribution of the flood sample points is shown in Figure 1. Sample points where no flood has occurred also have a great influence on the model results. However, we were unable to obtain sample

Flood Disaster Inventory
Accurate analysis of flood susceptibility requires a precise flood inventory map that shows the locations of flood occurrences [34]. There are several existing flood databases, such as the International Disaster Database (EM-DAT) and the Global Active Archive of Large Flood Events, and in other studies at the global scale, flood extent observations or detailed reconstructions of 2D hydraulic models have been used [22,35]. Various non-conventional sources of information (such as amateur videos, photographs, news reports, etc.) also provide data for the reconstruction of flood events [36]. The flood disaster inventory data in this study were derived from the Global Active Archive of Large Flood Events, Dartmouth Flood Observatory, University of Colorado [37]. The database is supported by NASA, the Japanese Space Agency, and the European Space Agency, and is widely used worldwide. The archive has recorded a large number of flood disaster data since 1985, mostly from news, governmental, instrumental, and remote sensing sources, and provides accurate geographical locations of flood disasters. Therefore, we selected this database as the flood disasters sample data.
For this study, 4730 flood disaster data from January 1985 to March 2019 were selected. Based on ArcGIS 10.5 software, the flood sample dataset and the global fourth-level watershed dataset were superimposed and analyzed, and 3335 watersheds with flood disasters were obtained. The distribution of the flood sample points is shown in Figure 1. Sample points where no flood has occurred also have a great influence on the model results. However, we were unable to obtain sample points that had not experienced flood disasters from the existing database. The general method was based on existing data; non-flooding sample points were randomly selected in the remaining unrecorded flood areas, but this method often leads to false identification [24][25][26][27]. After all, the existing database cannot accurately record all flood disaster samples. We referred to previous studies and added conditions to samples that had not experienced flooding [28]. Through literature and previous studies, it was found that deserts or ice fields were less likely to have flood or flood damage [1,3,37]. Therefore, 1500 watersheds were randomly selected in the desert and ice fields. Although these sample data made the conditions of non-flooding more severe, they helped to improve the flood control standards of the basin accordingly. Values of 1 and 0 were assigned to indicate the existence and absence of flood disaster, respectively. These data samples were randomly divided into a training dataset (70%) and a testing dataset (30%) for the machine learning model.

Flood Conditioning Factors
Identifying the conditioning factors is a key step for flood susceptibility assessment. Thirteen conditioning factors were selected in this study by reviewing previous studies and investigating the mechanisms of flood, including maximum daily precipitation (MDP), precipitation concentration degree (PCD), altitude, slope, relief degree of land surface (RDLS), soil type (ST), Manning coefficient (MC), proportion of forest and shrubland (PFS), proportion of artificial surface (PAS), proportion of cropland (PC), drainage density (DD), population, and gross domestic product (GDP) [19,[27][28][29]. These data are calculated based on data such as digital elevation and land use. It is worth noting that, unlike previous studies, this study used the average (or major) value of the conditioning factor for each watershed as input data. That is, based on the spatial statistics module of ArcGIS, we calculated the average (or major) value of the raster data corresponding to the watershed. When training and testing, these data were normalized according to the following formula: where f is the original value of a certain conditioning factor and f max is the maximum value of the factor.

Maximum daily precipitation
Precipitation is the direct factor affecting the occurrence of floods. The NCEP reanalysis data is a complete, comprehensive dataset produced by the National Centers for Environmental Prediction, which contains global precipitation data with a spatial resolution of 2.5 • and a time resolution of 6 h [38]. NCEP reanalysis data are widely used worldwide. Based on the dataset, the MDP from 1985 to 2017 was calculated, and the data were resampled to the resolution of 0.01 • by the resample module of ArcGIS. The resampling algorithm was "NEAREST," which minimized changes to pixel values since no new values were created. The global distribution of average MDP for each fourth-level watershed is shown in Figure 2a.

Precipitation concentration degree
PCD is an indicator that reflects the distribution of precipitation over time. The more concentrated precipitation is, the higher the frequency of occurrence of heavy precipitation [39]. Taking the year as the calculation period, according to the principle of vector analysis, the precipitation was decomposed into vectors in the x and y directions. The precipitation in a month was the length of the vector, and the azimuth of the corresponding month was the direction of the vector. The azimuth angle was 360 • throughout the study period, and the azimuth distribution of each month was evenly distributed (Table 1). PCD can be calculated according to the following formula: where R is the total precipitation, r i is the precipitation of the i-th month, and θ i is the azimuth corresponding to the month. It was found that PCD reflected the concentration of total precipitation in a certain research period, and the value ranged from 0 to 1. If the precipitation was concentrated in a certain month, the ratio of the length of its composite vector to the total amount of precipitation was 1, and the PCD was the maximum. If the monthly precipitation was equal, the modulus of the composite vector was 0, and the PCD was the minimum.
Based on the NCEP reanalysis data, the global average PCD data from 1985 to 2017 were obtained. The average PCD data were then resampled to the resolution of 0.01 • by ArcGIS with the NEAREST algorithm. The global distribution of average PCD for each fourth-level watershed is shown in Figure 2b.

Altitude
Altitude is also one of the most important factors affecting flood disasters [40]. In general, altitude and flood are inversely related; that is, floods are more likely to occur in areas with lower altitudes.

Slope
Slope is an important geomorphological feature that triggers flood disasters [19,41]. Slope directly affects the generation of surface runoff and the infiltration of precipitation, and river basins with large slopes in mountainous areas are often more prone to flood disaster. A global slope map with 90 × 90 m pixel size was created in ArcGIS. Based on the spatial statistics module of ArcGIS, the average slope of each watershed was obtained. The global distribution of average slope for each fourth-level watershed is shown in Figure 2d.

Relief degree of land surface
The RDLS is the difference between the highest altitude point and the lowest altitude point in the watershed. It represents an important indicator of regional topographical features and landform types.  6. Soil type Soil depth, soil texture, and soil porosity are the main factors affecting surface runoff. They mainly affect runoff generation by changing the infiltration characteristics and water holding characteristics of the soil. Different soil types indicate different soil properties, so soil type was selected as a conditioning factor. Soil data were derived from the Harmonized World Soil Database v1.0 of FAO (Food and Agriculture Organization of the United Nations) [42]. The Harmonized World Soil Database is a 30 arc-second raster database with over 15,000 different soil mapping units that combines existing regional and national updates of soil information worldwide. A global ST map was built in ArcGIS with a resolution of 0.01 • . Based on the spatial statistics module of ArcGIS, the major soil type of each watershed was obtained. The global distribution of the main ST for each fourth-level watershed is shown in Figure 2f.

Manning coefficient
For a long time, the Manning equation has been applied in the analysis of river flow resistance. In practice, the Manning coefficient is usually used to reflect the resistance characteristics of different underlying surface conditions, which are the key parameters affecting flow concentration and flood evolution. According to the reference, different Manning coefficient values are assigned to different land use characteristics, and the average Manning coefficient of each fourth-level watershed was calculated by the following formula [43,44]: where m i denotes the Manning coefficient of the land use type i, p i denotes the area ratio of the land use type i to the watershed, and n is the number of land use types in the watershed.
The land use data in this paper were derived from the land use raster data of globe30, which is produced by the Chinese government using remote sensing data (www.globeland30.com) [45]. Ten types of land use were included in the data: land surface waters, wetlands, woodlands, grasslands, shrubs, artificial surface, arable land, glaciers and permanent snow, tundra, bare land.
According to the calculation results and ArcGIS, the Manning coefficient distribution map of the global fourth-level watershed was obtained, as shown in Figure 2g

Drainage density
Drainage density is a basic feature of a river system, which has effects on peak flows when rainfall occurs in a watershed [46]. Drainage density refers to the ratio of total river length to watershed area in the basin. In this study, river lengths were not calculated for rivers below level four. Based on the river network data in Section 2.1.1, the drainage density of each fourth-level basin was calculated. The global distribution of the DD for each fourth-level watershed is shown in Figure 2k.

Population
The distribution of population is directly related to flood disasters, and the consequences of disasters caused by floods in densely populated areas are also greater. Considering that the most accurate demographic data is released by national governments, based on governmental data, we revised the population data released by the World Bank and FAO to obtain more accurate global population density distribution data [47,48]. The population of each fourth-level watershed was obtained in ArcGIS, as shown in Figure 2l.

Gross Domestic Product
GDP is also closely related to flood disasters. Areas with higher GDP tend to have a stronger ability to adapt to disasters, but the consequences of disasters may be greater. Based on GDP data from the World Bank and ArcGIS, the GDP of each of fourth-level watershed in the world was obtained, as shown in Figure 2m [49].

Logistic Regression
Logistic regression (LR) was developed by Cox and is a non-linear regression model for solving binary problems [50,51]. Its dependent variable has a value between 0 and 1, so it can explain the probability of certain phenomena occurring. The logistic regression model is widely used in flood risk assessment. When the result of the logistic regression is closer to 1, the probability of a flood disaster is greater. The basic form of the logistic regression model is: where p is the probability of flooding disaster, βi (i = 0, 1, 2, …, n) represents the regression coefficient of the model, and xi (i = 0, 1, 2, …, n) represents different conditioning factors.

Logistic Regression
Logistic regression (LR) was developed by Cox and is a non-linear regression model for solving binary problems [50,51]. Its dependent variable has a value between 0 and 1, so it can explain the probability of certain phenomena occurring. The logistic regression model is widely used in flood risk assessment. When the result of the logistic regression is closer to 1, the probability of a flood disaster is greater. The basic form of the logistic regression model is: where p is the probability of flooding disaster, β i (i = 0, 1, 2, . . . , n) represents the regression coefficient of the model, and x i (i = 0, 1, 2, . . . , n) represents different conditioning factors.

Naive Bayes
The naive Bayes (NB) model is a classification algorithm based on the Bayesian theorem and feature condition independent hypothesis [52,53]. The Bayesian conditional probability formula is: Suppose the samples of the model are: (x 1 1 , x 1 2 , . . . , x 1 n , y 1 ), (x 2 1 , x 2 2 , . . . , x 2 n , y 2 ), . . . , (x m 1 , x m 2 , . . . , x m n , y m ); that is, there are m samples, each sample has n features, and the model output has k categories, defined as C1, C2, . . . , Ck. The prior probability and conditional probability of the naive Bayes can be obtained from the samples as follows: When making predictions, it is necessary to calculate k conditional probabilities according to Equation (11), and then find the category corresponding to the largest conditional probability as the judgment result. Equation (11) can be obtained according to Equations (9) and (10): In flood risk assessment, when the conditional probability of a flood occurrence is greater than the flood non-occurrence, the test sample is judged to be flooding, otherwise, no flooding.

AdaBoost
The adaptive boosting algorithm (AB) is an iterative algorithm [54,55]. The core idea is to train different weak classifiers for the same training set, and then combine these weak classifiers to form a stronger final classifier. The model mainly includes the following steps.
First, the weight distribution of the training data is initialized, and each training sample is given the same weight at the beginning: The initial distribution of the training sample set is: The algorithm iteratively calculates t = 1, . . . , T, and selects the weak classifier h with the smallest error rate as the t-th basic classifier Ht. The error of the weak classifier on the distribution D t is: The weight of the weak classifier in the final classifier is: The weight distribution D t+1 of the training sample is updated: Finally, the weak classifiers are combined according to the weight α t of the weak classifiers to obtain the final strong classifier:

Random Forest
Random forest (RF), developed by Breiman, is an ensemble machine learning algorithm which uses a large number of classification or regression trees to make a prediction [56]. In this study, the response variable was modeled using a regression tree. It generates different sets of samples by sampling with replacement, and generates multiple corresponding regression tree training models, and then determines the data classification according to the voting results of multiple classifiers.
In the training of the regression tree, rules based on the response variables are established to classify the observations until the prediction has the smallest possible node deviation. The rule of regression trees is a collection of linear partitions of observed data that together create a nonlinear decision surface. One of the main problems with regression trees is that they tend to overfit the training data, and therefore perform poorly when given unknown data [57]. Random forest is a way to address this weakness. When an individual regression tree is trained in the random forest algorithm, a portion of the input records and predictor variables are randomly selected as input to the training. A set of regression trees is created after multiple sampling exercises, and each set of regression trees is only a training result for a randomly selected subset. It is obviously not advisable to use a full sample to train decision trees, because full sample training ignores the laws of local samples.

Evaluation Methods
In order to evaluate the effects of the four models, we selected the four indicators: precision (P), recall (R), F-score (F), and the area under the ROC (Receiver Operating Characteristic) curve (AUC) to evaluate the model [28]. These indicators can represent the ability of the model to identify flood hazard risks.
where The ROC curve is a curve with the FP rate as the X-axis and TP as the Y-axis. AUC represents the area under the curve, and the higher the AUC value, the better the model performance.
In addition, based on the standard machine learning integration model Weka 3.8 software, we also compared the calculation time of the four models.

Sensitivity Analysis of Conditioning Factors
Assessing the contribution of different conditioning factors to flood disaster risk is important for the management of floods. This study used the sensitivity of AUC to conditioning factors to analyze its contribution to flood disaster. The sensitivity analysis was based on the Jackknife test, which is accepted to have a high capability for a broad range of practical problems [58]. The percentage of relative decrease (PRD) of the AUC was used to evaluate the contribution of each conditioning factor, as follows [19]: where AUC all represents the AUC value when predicted with all conditioning factors. AUC i and PRD i are, respectively, the AUC value and the percentage of relative decrease of AUC when the i-th factor has been removed from the prediction process. The larger the value of PRD, the greater the effect of a factor on the result. In order to evaluate the consistency of the PRD rankings in different models, we calculated the SD (standard deviation) of the PRD rankings of each conditioning factor in the four models. The smaller the standard deviation, the higher the consistency of the ranking.
where R i is the ranking of a conditioning factor in the model and?R is the average ranking among the four models.

Model Analysis
The evaluation indicators of the four machine learning models for the prediction results of the testing dataset (30%) are shown in Table 2. It was found that for precision, whether in a flood zone or a non-flood zone, the RF was the best, and the values of P were 0.979 and 0.927, respectively. AB had the lowest P value in the flood zone evaluation, which was 0.929, and NB had the lowest P value in the non-flood zone evaluation, which was 0.728. For recall, RF's performance was also the best, with values of 0.966 and 0.954 in the flood zones and non-flood zones, respectively. NB had the lowest R value in the flood zone evaluation, which was 0.838, and AB had the lowest R value in the non-flood zone evaluation, which was 0.844. For F-score, RF performed best, while NB had the lowest F for both flood and non-flood evaluation. Figure 3 shows the ROC curves of the four models in the evaluation flood zone. It was found that the AUC value of RF was the largest. For the simulation time, the calculation time of the four models was relatively short, and they took only a few seconds to calculate. In summary, the four models performed well, while the RF model performed best in assessing global flood risk.

Global Flood Susceptibility Map
A flood susceptibility map is important for spatial flood prediction and watershed management. Through four machine learning models and conditioning factor datasets of the global fourth-level watersheds, the characteristics of flood disasters in the global fourth-level watersheds were

Global Flood Susceptibility Map
A flood susceptibility map is important for spatial flood prediction and watershed management. Through four machine learning models and conditioning factor datasets of the global fourth-level watersheds, the characteristics of flood disasters in the global fourth-level watersheds were calculated, and, finally, the flood susceptibility map was obtained. The flood vulnerability map shows the flood susceptibility level of the global fourth-level watersheds based on the flood disaster data from January 1985 to March 2019. Based on the natural segmentation method of ArcGIS, the risk level of the flood susceptibility map was divided into five categories, including: lowest, low, medium, high, and highest, as shown in Figure 4. This segmentation method grouped data according to the inherent characteristics of the data, based on the principle of minimizing intra-group differences and maximizing inter-group differences for data sets.

Assessment of Sensitivity of Conditioning Factors
In flood risk analysis, it is very important to choose the appropriate conditioning factors. This study analyzed the contribution of the 13 selected conditioning factors to the flood susceptibility It was found that although the global flood susceptibility maps obtained by different machine learning models were slightly different, the identification of high-risk areas was basically the same. This is similar to previous related research, that is, different machine learning methods have little effect on flood susceptibility maps [19,27,28].
According to the flood susceptibility map, we obtained the flood risk situation of each fourth-level watershed. Table 3 shows the number of fourth-level watersheds and percentage of area for different susceptibility levels. Among the four models, the area with high flood risk accounted for nearly 50% of the global fourth-level watersheds. Floods around the world are still very severe, with high-risk areas in southern North America, northern and eastern South America, most of Europe, Southeast Asia, Central Africa and northeastern Australia. Northern North America, southwestern South America, the deserts of northern Africa, northwestern China and central Australia are low-risk areas for floods.

Assessment of Sensitivity of Conditioning Factors
In flood risk analysis, it is very important to choose the appropriate conditioning factors. This study analyzed the contribution of the 13 selected conditioning factors to the flood susceptibility model. According to Equation (21), machine learning models with different conditioning factors were constructed, and the PRDs of AUCs were calculated. The PRD of each conditioning factor is shown in Figure 5. It can be seen that the PRD results of different machine learning models were slightly different. In the LR and NB models, the MC and PCD were assessed as the most important factor, compared to GDP and PCD in the AB model, and altitude and PCD in the RF model. The standard deviation of the PRD rankings of each conditioning factor is shown in Figure 6. It was found that the standard deviations of PCD, ST, population, and PC in the four models were small, indicating that the PRD rankings of these conditioning factors were basically the same for these four models. The PRD rankings of GDP, PFS, and MDP showed large differences.

Discussions
With the continuous development of machine learning algorithms, their applications in the field of hydrology are becoming more and more extensive [27][28][29][59][60][61]. Flood susceptibility maps, as an important basis for watershed planning and management, have also evolved from traditional human judgment to statistical analysis methods based on big data. However, research on global flood In general, the MC and PCD were relatively important factors in the four models, while the DD and slope had little effect on the results. This is different from previous research results for a certain watershed, mainly because the average slope of the watershed was used as a topographic parameter in the global flood risk assessment, which may lead to the extinction of the extreme values.

Discussions
With the continuous development of machine learning algorithms, their applications in the field of hydrology are becoming more and more extensive [27][28][29][59][60][61]. Flood susceptibility maps, as an important basis for watershed planning and management, have also evolved from traditional human judgment to statistical analysis methods based on big data. However, research on global flood susceptibility maps is still relatively rare. Most flood susceptibility maps at this stage are for a specific basin. There are two problems with this method: 1. Selection of flood sample points. In a certain basin, accurately defining the criteria and location of flood disasters is very difficult, because general flood disasters are often large-scale [29].
2. Data set of conditioning factors. When conducting a risk assessment for a single watershed, the input conditioning factors are often the values of the flood sample points. In fact, the occurrence of a flood disaster at any location within the watershed is not only affected by the conditioning factors of the flood point, but by the influence of factors in the confluence area above the point.
These problems are likely to produce erroneous results in the judgment of flood susceptibility in a watershed. Although machine learning provides us with an analytical method, this method cannot violate the principles of hydrology.
This study obtained results for flood disaster risk assessment in the global fourth-level watersheds. However, global flood risk assessment obtained by machine learning methods is only a static result rather than a dynamic one, and the physically based model has an advantage. A physically based model can often give more detailed information on flood hazards, such as flow and submerged range, while current machine learning studies focus on qualitative assessment of flood hazards [21][22][23]. These qualitative evaluations can only provide limited reference for watershed management. In the future, the combination of machine learning and physical models will be better able to provide flood disaster assessment information.
The performance of machine learning models is highly dependent on the accuracy of labeled flood inventories. Apart from labeled flood inventories, the distribution and size of non-flood sample data affect model accuracy [27,28]. Future studies should explore the effective utilization of both flood and non-flood inventories and other massive information to improve the accuracy of results.
Due to the huge amount of global data, the conditioning factors selected in this paper may not have fully contained all the key factors. In addition, when making predictions, the average value of the conditioning factors of the watershed was selected as the independent variable, which may have caused some error in the result. The selection of non-flood sample points may also have caused an overestimation of flood susceptibility level. In the study of flood susceptibility maps, how to obtain accurate flood and non-flood sample points, and how to obtain the flood disaster conditioning factors corresponding to the points is still a problem worth exploring in the future.

Conclusions
Based on the four machine learning models of logistic regression, naive Bayes, AdaBoost, and random forest, this paper conducted flood risk assessment on global fourth-level watersheds and obtained global flood susceptibility maps. The results show that the random forest model performed best for prediction. According to the susceptibility map, fourth-level watersheds in the global high-risk area account for a large proportion. As the extreme hydrological events caused by climate change increase, this threat may not be relieved in the near future. Global flood disaster is still one of the most threatening natural disasters. Sensitivity analysis of conditioning factors showed that precipitation concentration degree and Manning coefficient were the main factors affecting flood risk in watersheds. The methods and ideas of this study can provide reference for flood management worldwide.