Flood Hazard Analysis Based on Rainfall Fusion: A Case Study in Dazhou City, China

work


Introduction
A flood disaster is a complex phenomenon closely related to the natural environment and the human social system [1].The hazard and frequency of floods and their impacts are expected to increase, especially in the low latitudes of Africa and Asia [2-5], owing to climate change, population growth, and economic development.Although many structural and non-structural measures have been proposed to reduce flood damage, the effects of flood hazards remain impactful at the county level [6].As one of the countries most threatened by flood disasters, China suffered 125 major disasters from 1950 to 2004, with 1.465 billion people affected and USD 11.675 billion in losses [7], which seriously restricts the sustainable development of China's economy [8,9].Liu et al. [10] counted the number of flood disasters in seven major river basins in China in the past 150 years and found that each basin had witnessed more than 20, especially the Yangtze River and Yellow River basins.Since the 21st century, the significant damage and loss caused by flood disasters has still been a severe problem to be solved.Thus, it is urgent to improve the accuracy of the research related to flood disasters and to develop scientific and practical measures for flood control and disaster reduction [11][12][13].
Flood disaster research based on historical statistical data often uses statistical methods, such as time series and regression analysis, to clarify the distribution law and develop flood disaster trends for effective countermeasures [14,15].The early warning and assessment of flood disasters have always been a hotspot in flood research [16,17].Researchers try to use remote sensing technology to quickly obtain flood disaster information, such as the degree [18], time [19], dynamic characteristics of floods [20], and flooded social attributes such as affected houses, population, crops, and infrastructure [21][22][23].Then, combined with historical disaster data and GIS technology, the scope, nature, and degree of a flood disaster hazard are analyzed by using an artificial neural network [24,25], random forest [26,27], and analytic hierarchy process [28,29].
A flood hazard analysis, as the essential element of flood control assessment, is crucial for the prediction and early warning of regional flood disasters [30].Moreover, identifying the flood factors, such as rainfall, topography, and the distribution of river networks, is the most critical step of a flood hazard analysis [31,32].Yalcin et al. [33] selected seven natural factors, including the annual rainfall, watershed size, basin slope, gradient of the main drainage channel, drainage density, land use, and soil types for the assessment of a flood hazard.Stefanidis et al. [34] added three anthropogenic factors related to human intrusion and flood control technology, increasing the number of considered flood hazard factors to 10.Moreover, multi-criteria analysis methods were proposed by Ologunorisa [35], Mansor et al. [36], and Sanyal and Lu [37] to combine the factors related to a flood hazard.However, none of them considered the importance of each factor, which is necessary for an accurate flood hazard analysis.
Rainfall significantly impacts a flood hazard analysis, especially heavy rainfall for a short period, as it is the most critical and direct cause of a flood disaster [38,39].The rainfall data currently used for a flood hazard analysis is mainly collected from gauge observation and satellite estimation [40][41][42][43].However, due to the uneven distribution of the rainfall gauges and the low accuracy of satellite data, the accuracy of precipitation for a flood hazard analysis is insufficient [44][45][46][47].Therefore, obtaining highly accurate and wide-ranging rainfall data is a meaningful way to improve the ability of a flood hazard analysis, which could be beneficial in reducing the impact of flood disasters.
Rainfall data fusion is an effective and feasible way to improve the accuracy of satellite rainfall data [48][49][50].The commonly used fusion methods can be divided into two categories.One is simple correction methods, such as Mean Bias Correction (MBC) [51], Double-Kernel Smoothing (DS) [52], and Linear Regression (LR) [53]; another is local correction methods, such as Optimum Interpolation (OI) [54], Kalman Filter (KF) [55], and Geographically Weighted Regression (GWR) [56].Liu et al. [57] applied the LR method to fuse the TRMM 3B42 data and gauge rainfall data in China, and the accuracies of this fusion rainfall data were improved at the monthly and annual scales.Li et al. [58] fused the rainfall data from TRMM 3B42 and gauges in Australia by using the DS method.The cross-validation results showed that the DS method could improve the accuracy of the fusion rainfall.Duan et al. [59] employed LR, GWR, and KF to fuse the TRMM 3B42 and the gauges in Sichuan Province, China.The results showed that LR had the best merging effect at both the daily and monthly scales, while the KF presented the highest accuracy.It can be found that the optimal fusion method is not unique for TRMM 3B42 rainfall data in various study areas, due to the different climatic characteristics.Therefore, exploring the appropriate method to fuse rainfall data in a specific area with a few gauges is necessary.Moreover, the selection of hazard factors and their evaluation is critical for aflood hazard analysis.
This paper aims to conduct a flood hazard analysis based on the fused rainfall data and nine hazard factors, to solve the low accuracy rainfall and incomprehensive flood hazard evaluation in Dazhou City, Sichuan Province, China.This study employs four fusion methods, OI, GWR, KF, and LR, for fusing the TRMM 3B42 satellite rainfall and the gauge rainfall.After clarifying the applicable fusion method, it can be applied to obtain the fusion rainfall, solving the defects of uneven distribution, limited range, and low precision.
Moreover, we propose a system for determining the weights of flood hazard factors and performing a flood analysis in this paper.Considering that few researchers have studied the impact of the instantaneity of rainfall on flood hazard, we set up four rainfall durations (3 h, 6 h, 9 h, and 12 h) according to the time scale of the TRMM 3B42 data and then carried out our flood hazard analysis.This study provides a new method for the hazard assessment of frequent flood disasters and is of great significance for the acquisition of guidelines for disaster management.

Study Area
Dazhou City, located in the eastern part of Sichuan Province, between the longitude of 106 • 39 -108 • 32 and the latitude of 30 • 19 -32 • 20 , is one of the major cities in southwest China, as shown in Figure 1.It is called the "East Gate" of Sichuan Province, with a population of more than five million, due to its advantaged location at the junction of Sichuan Province, Chongqing City, Hubei Province, and Shanxi Province.Moreover, it lies in the Chengdu-Chongqing Economic Belt, in the upper reaches of the Yangtze River.The topography of Dazhou City decreases from the Daba Mountain in the northeast to the hilly areas in the southwest, with an elevation range from 2458.30 to 222 m.The terrain in Dazhou City is classified into three types, i.e., mountainous areas (which account for 70.70% of Dazhou City's area), hilly areas (28.10%), and flat areas (1.20%).As the climatic boundary between north and south China, Dazhou City has a subtropical humid monsoon climate with frequent droughts and floods in summer and continuous rains in autumn.It has a forest coverage of 44.34% and annual average precipitation of about 1200 mm.It is hot and rainy simultaneously, and the annual average temperature is between 14.7 and 17.6 • C. Due to the complex terrain, the regional climate varies greatly.The low mountains, hills, and valleys below 800 m have a mild climate and four distinct seasons; the low and middle mountain areas with an elevation of 800 to 1000 m are cool and damp; and the middle mountains above 1000 m have a long cold period with insufficient sunlight and heat resources.

Rainfall Data
The satellite rainfall product used in this study is TRMM 3B42, with a spatial resolution of 0.25 • and temporal resolution of 3 h, provided by the National Aeronautics and Space Administration (NASA) Goddard Earth Sciences (GES) Data and Information Services Center (DISC) (https://disc.gsfc.nasa.gov/datasets(accessed on 6 August 2021)) [60].This data range covers the period from 1998 to 2019 [61,62].The process of the original TRMM 3B42 data with HDF format is accomplished using the MATLAB program.
Rainfall gauge data of 3 h precipitation observations (from 1986 to 2012) of the seven rainfall stations over Dazhou City are provided by the China Meteorological Administration (CMA).According to the rain gauge distribution (Figure 1), the XuanHan station located in the center was chosen as the test point.The observed rainfall from the remaining six stations is interpolated using the inverse distance weighting method (IDW) [63,64], to generate grid gauge data, which has the exact spatial resolution as the grid satellite data.Moreover, the latitude and longitude coordinates of the rainfall gauges in the study area are counted to facilitate single-point data fusion.

Historical Flood Points
To evaluate the accuracy of the flood hazard analysis results, we collected historical flood data from 1952 to 2019 from the Dazhou Water Authority.According to the flood information, a total of 2254 historical flood points are shown in Figure 2, by counting the flood disaster locations and vectorizing them on ArcGIS.

Methodology
Figure 3 shows the schematic diagram of the flood hazard analysis system in this study.Firstly, nine flood hazard factors are chosen and calculated.Secondly, four rainfall fusion methods are applied to fuse the TRMM 3B42 and gauge rainfall.Thirdly, random forest (RF), linear weighted sum method (LWS), and certain factor (CF) model are used to form a system to conduct the flood hazard analysis.Finally, fused rainfall and flood hazard analysis results are comprehensively evaluated.

Flood Hazard Factors Calculation
The appropriate factors are critical for flood hazard analysis [65], and nine representative ones are chosen for this study, as shown in Table 1, according to the previous studies [59,[66][67][68].They are three topographic factors: Digital Elevation Model (DEM), Elevation Standard Deviation (ESD), and Depression Point Density (DPD); five hydrological factors: Flow Accumulation (FA), Soil Permeability (SP), Land Use (LU), Distance to River (DR), and Topographic Wetness Index (TWI); and one precipitating factor: maximum precipitation (MAP), of which 3, 6, 9, and 12 h are denoted as MAP-3 h, MAP-6 h, MAP-9 h, and MAP-12 h, respectively.The spatial distribution of these factors, except MAP, in this study is shown in Figure 4, and the calculations are described in the following contents.(1) Topographic factors DEM (m): the flow trends run from areas with higher altitudes to lower areas, making the higher probability of flood events for flat terrain.The DEM data used in this study comes from Geospatial Data Cloud (https://www.gscloud.cn(accessed on 6 August 2021)), with a spatial resolution of 30 m.
ESD: this factor can sensitively reflect the study area's topographic fluctuation.The areas with smaller ESD values are less conducive to the flow, which could easily form waterlogging.Thus, it is a negative factor for flood hazard.The ESD values are calculated within a circle of 0.25 • radius for each grid by the Spatial Analyst Tools in ArcGIS software, based on the DEM map, and the formula is: where C is the circle with a radius of 0.25 • corresponding to each grid; i is the grid within the circle C; z i and z, respectively, represent the elevation value of the i-th grid and the mean elevation of all grids within the circle C; and n c is the number of grids within the circle C. DPD: this can intuitively reflect the degree of depression filling in the region.According to the definition of depression, the greater DPD is, the higher flood hazard is.We collected the DPD map of Dazhou City by using ArcGIS Spatial Analyst Tools to process DEM data, and the formula is as follows: where n i is the number of depressions in the i-th grid; S i is the area of this grid.
(2) Hydrological factors FA: this refers to the accumulation of surface runoff in each grid.The greater FA means more accumulated surface runoff, so it is easier to form waterlogging.Accordingly, FA has a positive effect on flood hazard.FA map was obtained through the flow direction and accumulation calculation with ArcGIS Hydrology Analyst Tools.
SP: soil permeability of various soil textures is coded by numbers, which come from Harmonized World Soil Database (HWSD) [69], as shown in Table 2.A larger number for SP indicates the more potent the soil infiltration capacity is, and the less likely it is to accumulate water.LU: there are apparent differences in runoff conditions under different land uses, which are often distinguished by runoff coefficients ranging from 0 to 1.The values of the positive indicator LU in humid areas are always greater than those in arid areas.This is according to "Standard for Design of Building Water Supply and Drainage" (GB 50015-2019) [70] and "Standard for Design of Outdoor Wastewater Engineering" (GB50014-2021) [71], as shown in Table 3. DR: the water level in the river channel is generally high during the flood season, and the heavy rainfall causes it to overflow the embankment, leading to floods.The areas close to the river are susceptible to flooding and have a high degree of danger, so the DR has a negative effect on the flood hazard assessment.In this paper, we applied ArcGIS to extract the river channels in the study area and used the Euclidean distance method to calculate the distance from each grid to the rivers.TWI: this can clearly show the change of the runoff coefficient under different terrain conditions and provide an important way to explore the spatial distribution of soil moisture.The greater TWI is, the higher the surface runoff of the grid, and the more prone it is to floods.Thus, TWI is a positive factor.We obtained the TWI map, according to its definition formula, through the grid calculator tool in ArcGIS: where α is the catchment area corresponding to the unit contour length in each grid; β is the slope.
(3) Precipitating factors MAP: heavy rainfall in a short period is the most crucial factor in triggering flooding.This study set up four rainfall duration cases, including 3 h, 6 h, 9 h, and 12 h.MAP-3 h, MAP-6 h, MAP-9 h, and MAP-12 h, accordingly, are selected as flood-inducing factors for the corresponding duration.Specifically, MAP-3 h was the average of the maximum rainfall from 1986 to 2019 (the rainfall data from 1986 to 1997 is observed, the part from 1998 to 2012 is the fusion rainfall in this study, and the data from 2013 to 2019 is taken from the TRMM 3B42 dataset).MAP-6 h, MAP-9 h, and MAP-12 h are obtained by accumulating MAP-3 h.In order to avoid the situation where the maximum value of some factors affects the final superposition results of all factors in the flood hazard analysis, the factors mentioned above are separately standardized to [0,1].

Rainfall Data Fusion Methods
This study applies four frequently used fusion methods, LR, KF, GWR, and OI, to fuse the TRMM 3B42 and the observed rainfall.

Linear Regression (LR) Method
LR is a regression analysis method that uses the least squares function to establish a relationship model between one or more independent variables and dependent ones [72,73].Accordingly, in the linear fusion experiment of TRMM 3B42 and gauge rainfall, we, firstly, conduct a correlation analysis between the two datasets to establish their linear regression model.Then, the coefficients in the regression model are obtained by the least squares method to correct the TRMM 3B42 in Dazhou City.The basic steps are as follows.
Assuming that the linear regression equation between TRMM 3B42 and gauge rainfall is: Combining Equations ( 4) and ( 5), we obtain: where A and B are regression coefficients; P g is gauge rainfall data; P s is TRMM 3B42 in the corresponding grid; r is the correlation coefficient; P g and P s are the mean of the gauge and TRMM series, respectively; and σ Pg and σ Ps are the mean square errors of them.The calculation formulas are as follows: where P gi is the gauge rainfall of the i-th ground station; P si is TRMM 3B42 data of the grid corresponding to the i-th ground station; and n is the number of ground stations.

Kalman Filter (KF) Method
KF is an optimal recursive data processing algorithm, which has been widely used in various fields, such as image recognition, fingerprint recognition, and the control of intelligent robot systems [74,75].KF method assumes that the optimal estimation of the point is related to the previous point and uses the statistical estimation method to obtain the optimal estimation by analogy.The advantage of KF is that it requires little memory and runs efficiently.The time update equations in KF can be expressed as follows: where xk−1 is the posterior state estimate of the point k−1; xk is the prior state estimate of the point k; w k is the process noise and is assumed to have zero mean value; A is the state transition matrix; B is the matrix that transforms the input into a state; and u k−1 is measurement noise.
where P k−1 is the posterior covariance estimate of the point k−1; P k is the prior covariance estimate of the point k; and Q is the covariance of process excitation noise.
As for the KF measurement update equations: where H is the transformation matrix from state variables to observations; R is the covariance of measurement noise; K k is the filter gain matrix; xk is the posterior state estimate of the point k; z k is the input of filtering, i.e., measured values; and P k is the posterior covariance estimate of the point k.

Geographically Weighted Regression (GWR) Method
GWR is a robust algorithm successfully used in fusing rainfall [56].It assumes that the rainfall gauge P g and the satellite precipitation P s are set to measure a constant quantity: P s = X + V s (16) where V g and V s are random errors existing in observation and detection; V g ~N(0, σ g 2 ); and V s ~N(0, σ s 2 ).Moreover, assuming that the estimated value X of the actual rainfall X is linearly related to the P g and P s , X is an unbiased estimate of X: X = w g P g + w s P s (17) where w g and w s are the weights of two rainfall data.The estimated error X and its mean square error E( X) can be expressed as:

Optimal Interpolation (OI) Method
Choosing a dataset as the optimal values' background field is the OI method's first step [76,77].We selected satellite rainfall data with high spatial coverage and denoted it as F. The gauge rainfall is taken as the observation value due to the high accuracy, and we denoted it as O. Accordingly, the optimal rainfall, denoted as A, obtained by the optimal interpolation method can be expressed as: where k is the efficient grids in the divided analysis range, which is the circles with a radius of 0.25 • ; W i is the weight assigned to the estimation of the bias between the observations and the background values.The error variance of the optimal values at the analysis point k is: where T k is the actual rainfall of point k.
It is assumed that the errors of the observation and the initial background field are not correlated, that is: Combining Equations ( 20)-( 22): where (σ is the error variance of the initial background field; τ ) are the error covariance of the initial background field and the observation filed, respectively.

Linear Weighted Sum (LWS) Method
As an evaluation function method, the LWS method assigns the corresponding weight coefficients to each target according to its importance and then performs optimization for its linear combination [78,79].We applied the LWS method to construct the flood hazard analysis system in this study.The specific formula of the linear weighted sum method is: where y is the comprehensive evaluation value of the system or the object being evaluated; n is the number of factor variables; x i is the i-th factor; and w i is the corresponding weight coefficient.

Certain Factor (CF) Model
CF model is a probability function, first proposed by Shortliffe and Buchanan [80], to clarify the hierarchical state weights of various factors that affect the occurrence of an event [81,82].It has been widely used in landslide-susceptibility zoning by combining it with GIS technology, and its formula is: CF =    P a −P s P a (1−P s ) , P a ≥ P s P a −P s P s (1−P a ) , P a < P s (25) where P a is the conditional probability that an event (flood) occurs in the classification state a of the related factors.In practical applications, it is often expressed as the ratio of the number of flood occurrence points to the area of the classification state a. P s is the probability of historical floods occurring in the study area; that is, the ratio of total flood disaster points to the area of the study area.
The value range of CF is [−1, 1].CF > 0 means that the event has a high possibility of happening, and when CF < 0, it means that the event is less likely to happen.When CF = 1, this event must happen, whereas CF = −1 is the opposite.If CF = 0, it is impossible to judge whether the event has occurred.

Flood Hazard Analysis System
Random forest (RF) is a classification method that equips multiple decision trees for effective classification through ensemble learning [26,83,84].In this study, the RF model with default hyperparameters was applied to obtain the weights of various factors.Then, the LWS method was used to construct a flood hazard analysis system, as shown in Figure 4. Specifically, several important parameters of the RF model and their default values in the scikit-learn Python library are the number of decision trees (n_estimators = 100), the minimum number of samples required to split an internal node (min_samples_split = 2), the maximum depth of each decision tree (max_depth = none, i.e., the nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples), and the number of features to consider when looking for the best split (max_features = 'sqrt', sqrt is the total number of features).
Moreover, the holdout validation method was used to evaluate the analytical performance of the RF model.As for the flood disaster data in Dazhou City, we extracted 2254 historical flood occurrence points (Figure 2), which are divided into 676 (30%) test samples and 1578 (70%) training samples.The training of the RF classification model requires at least two or more classification attributes.Accordingly, 2254 non-flood occurrence points were randomly generated in the study area through ArcGIS and divided into 676 test samples and 1578 training samples.Then, the CF value of the chosen factors at each point is obtained by using the CF model.Moreover, the RF model established in Python is used to classify the flood hazard factors to obtain the weight of each factor under the four rainfall duration setups (3 h, 6 h, 9 h, and 12 h).Finally, the flood hazard analysis system in ArcGIS is established, dividing the study area's grids into five zones, low, sub-low, medium, sub-high, and high, using the Natural Break Method.

Statistical Metrics
For the evaluation of the accuracy and overall errors of the fusion results of three-hour rainfall data from 1998 to 2012, three popular metrics, correlation coefficient (R), mean bias (MBIAS), and root-mean-square error (RMSE), are adopted in this paper: For the performance evaluation of the proposed flood hazard system, the probability of corrected analysis (PCA), refers to the ratio of correctly detected floods to total observed floods: where N t is the number of all sample points; and N c is the number of points with correct analysis results.

Performance Evaluation of Four Fusion Methods
Figure 5 shows the performance of the four fusion methods (LR, KF, GWR, and OI) in the fusion experiment of three-hour TRMM 3B42 and gauge rainfall from 1998 to 2012.It can be found that the KF method has the lowest R-value, only 0.46.It means that the fusion rainfall obtained by this method has the worst correction with the observations.The GWR method has the highest R of 0.62, which is slightly better than the LR and OI, with an R-value of 0.56.A previous study [59] validated that the TRMM 3B42 has an R of 0.2 in this area.Considering the MBIAS between the fusion rainfall from the four methods and the gauge rainfall, the LR method has the lowest value, 0.20.However, the other three are obviously inferior due to the higher MBIAS, which is around 4.00.The MBIAS value of the OI method is slightly lower than that of GWR, and the highest value is from KF.As for the RMSE, it has a similar performance to the MBIAS.LR obtains the fusion rainfall with the lowest RMSE (0.10).KF, GWR, and OI have significantly higher RMSE, about 220.Therefore, the LR method is chosen for fusing the satellite and gauge rainfall in Dazhou City.

Factor Weight Values from RF Model
As for the performance of the flood risk analysis system (Section 3.2) in designing the weights for the nine factors (Section 2.2.2), the PCA values (Equation ( 29)) during the training period under four rainfall durations, i.e., 3 h, 6 h, 9 h, and 12 h, are 0.824, 0.816, 0.818, and 0.821, respectively.During the testing period, the PCA values are 0.802, 0.805, 0.811, and 0.789, respectively, with acceptable reductions.Overall, the system's performance is satisfactory, and, accordingly, the weight analysis results are considered reasonable.
The weight values of the nine flood hazard factors under the four rainfall durations obtained by the proposed flood hazard analysis system are shown in Figure 7.We can see that all the highest weights are assigned to DEM in the four duration cases, and it is the only one of the nine factors with an importance ratio that exceeds 20%.Thus, DEM is the factor that contributes the most to the flood hazard.Two factors that contribute more to the hazard of flooding, followed by DEM, are DR and LU, and the importance of DR is slightly higher than that of LU.In descending order of importance, the flood hazard factors with a weight ratio of about 10% are TWI, MAP, DPD, and ESD.Their contribution to the hazard of flooding is moderate.The weights of SP and FA are relatively low, especially those of FA.  1.

Flood Hazard Analysis
Figure 8 shows the spatial distribution of flood hazard in the 3 h, 6 h, 9 h, and 12 h rainfall duration cases in Dazhou City, based on the hazard index value obtained by the LWS method, with the weight value of each factor obtained above.The studied areas with high flood hazard are mainly concentrated in the south of Xuanhan, including the southern part of Tongchuan, the western part of Kaijiang, the eastern part of Qu, and the central and eastern parts of Dazhu.The relatively dangerous areas with sub-high hazard are mainly distributed in the south of Dachuan, such as the western part of Kaijiang, the southern part of Dachuan, the southeastern part of Qu, and the central and southeastern parts of Dazhu.However, the moderately dangerous areas are scattered, such as Tongchuan, the central part of Wanyuan, the southern part of Xuanhan, the mountainous areas at the junction of Qu and Dazhu, and the western part of Dachuan.The remaining areas belong to safe and relatively safe areas, mainly distributed in the eastern, northern, and northeastern parts of Xuanhan and the western and eastern parts of Wanyuan.Moreover, with increased rainfall duration, the area and distribution of each hazard zone did not change significantly.Comparing the results of flood hazard distribution with various flood factors, it is not difficult to realize that the areas with high flood hazard tend to be closer to rivers, with lower terrain and land use with larger runoff coefficients.This is well consistent with the weight assignment of each factor.To verify the rationality of flood hazard analysis results of the proposed system, we counted the number of the collected historical flood points distributed in five flood hazard zones, as shown in Figure 9.It can be found that there is little difference in the numbers of historical flood points located in five hazard zones in the four rainfall duration cases.A similar performance is that the number of historical flood points in the high-hazard zone is the largest, accounting for more than 34%.It is followed by the sub-high hazard zone, about 30% or more.This means that approximately 70% or more of the flood points are distributed in relatively high flood hazard zones.More than 93% of flood points are located in the medium-hazard area and above.The low and sub-low hazard zones have the fewest historical flood points at less than 7%.

Discussion
(1) Applicability of the rainfall fusion method Since rainfall has great spatial and temporal variability, it has strong regional characteristics [85,86].Whether the method used in this paper is applicable in other regions still needs further experimental verification.Moreover, the impact of ground observation density on the performance of the fusion method remains to be verified.In addition, this study only selected two rainfall datasets, which may lead to uncertainties in the fused results.Although the liner regression method is proved the best among the four fusion methods, it needs further experimental verification when we apply the fusion methods to other areas with different gauge distribution and rainfall datasets.
(2) Impact of rainfall duration In several cases with different rainfall durations, it can be found that, with the increase in rainfall duration, both the weight values of MAP and DR show an increasing trend, while that of DEM, LU, and SP are decreasing.There are no apparent changes in the weights of other factors.These indicate that the changes in precipitation duration impact the weights of flood hazard factors, but the limited impact makes it not decisive for the division of factor importance.These finds are consistent with other studies [87,88], which indicate that the impact of the rainfall duration factor on the flood hazard analysis needs to be further addressed.
(3) Performance of the flood hazard analysis We plotted the Receiver Operating Characteristic (ROC) curves of the flood hazard in the four rainfall duration cases (3 h, 6 h, 9 h, and 12 h), as shown in Figure 10.Calculating the area under the ROC curves (AUC) of various cases, and they are about 0.847 (0.835-0.858), 0.853 (0.841-0.864), 0.856 (0.845-0.868) and 0.859 (0.847-0.870).All AUC values are in the range of 0.7-0.9,indicating that the proposed system has a great performance in predicting the flood hazard and can be used to predict whether flooding will occur in the study area.(4) Insights for the management of flood hazard Several previous studies have addressed an end-to-end approach to realize the scientific problem to the practical response, such as early warning [89,90], evacuation planning [91][92][93], and flood memory understanding [94].As the results show in Figure 8, the flood hazard areas in Dazhou are mainly concentrated near the river and the northwest of Wanyuan.The lower dangerous areas are concentrated near the Qu River in Qu County and the area near the river in Dazhu.The safe areas are mainly distributed in the east, south, and southwest of Wanyuan, most of Xuanhan, and the north of Tongchuan.It could provide managerial information for city planning, which can reduce the hazard for the people and buildings to some extent.However, the resolution of this hazard map and the hazard analysis results of this study still need to be improved in future work.
In short, the flood hazard analysis system proposed in this paper has a satisfactory application in Dazhou City and is of great significance to flood hazard assessment and management.
(5) Limitation of this study Firstly, the satellite rainfall data used in this study has a spatial scale of 0.25 • due to the chosen period (1998-2019) not having finer datasets such as Integrated Multi-satellitE Retrievals for GPM (IMERG) or Global Satellite Mapping of Precipitation (GSMaP) [95].The application effect on a high-resolution spatial scale needs to be further tested and analyzed in future work.In addition, to improve the ability to detect and predict flood disasters, conducting the rainfall fusion at a finer temporal resolution, such as half an hour or one hour, would be necessary.
Secondly, we mainly optimize and improve the rainfall factor, one of the nine selected flood hazard factors, to improve the accuracy of hazard analysis.However, this paper does not consider risk factors related to social and economic factors, which should be addressed in our future research.

Conclusions
For the flood hazard analysis, high data quality and the reasonable weight assignment of the relevant factors are very important.In this paper, taking Dazhou City as an experiment case, we obtained the fused rainfall data with higher precision by applying four rainfall fusion methods.Then, it was regarded as one of the important factors of the flood hazard analysis.The following conclusions can be drawn from this study: (1) For the fusion experiment of three-hour TRMM 3B42 and gauge rainfall from 1998 to 2012 in Dazhou City, the LR method has an outstanding performance, with the lowest MBIAS (0.20) and RMSE (0.10) and an acceptable R (0.56).
(2) The RF model has a great performance in clarifying the weights of the nine hazard indicators, which is reflected in the fact that the PCA values of the cases corresponding to four rainfall durations, i.e., 3 h, 6 h, 9 h, and 12 h, which are all greater than 0.80 during the training period and are maintained at about 0.8 during the testing period.
In addition, the most critical flood hazard factor is DEM (topography), followed by DR, while the least important one is FA.(3) The distribution of the flood hazard in Dazhou City, obtained by the proposed system using the higher-precision fusion rainfall and other eight flood indicators, is reasonable.More than 70% of historical flood points were in high and sub-high hazard zones, and the areas under the ROC curves were all within an acceptable range, 0.7-0.9, which further reflects the reliability of the flood hazard analysis results.
This paper optimizes and improves the quality of the rainfall, a precipitating factor, to improve the accuracy of the flood hazard analysis.In addition, a reasonable and effective flood hazard analysis system is proposed, to contribute to flood disaster assessment and management, especially for data-sparse areas.Moreover, the results of this paper can provide profound information for high-resolution flood risk assessment and management in Dazhou City in future work.

Figure 1 .
Figure 1.The location, DEM, rainfall station, and river net of the study area.

Figure 2 .
Figure 2. Distribution of historical flood points in Dazhou City.

Figure 3 .
Figure 3. Schematic diagram of flood hazard analysis system.

Figure 5 .
Figure 5. Fusion data accuracy analysis of four methods.As described in Section 2.2.1, four precipitating factors, MAP-3 h, MAP-6 h, MAP-9 h, and MAP-12 h, are obtained based on the fusion rainfall using the LR method.The spatial distribution of MAP-3 h is shown in Figure 6a, and MAP-6 h, MAP-9 h, and MAP-12 h, acquired by accumulating MAP-3 h, are shown in Figure 6b-d.It can be found that the maximum precipitation during the various time scales in Dazhou City is mainly concentrated in northern Wanyuan and central Xuanhan.The highest MAPs are always in northern Wanyuan.The southern Qu and Dazhu also have relatively high MAP, gradually becoming less significant as the time scale increases.

Figure 7 .
Figure 7. Weights of each flood hazard factor, as listed in Table1.

Figure 9 .
Figure 9. Number of historical flood points in each hazard zone under four rainfall durations.

Figure 10 .
Figure 10.Receiver Operating Characteristic (ROC) curves of the flood hazard for the four rainfall duration cases.

Table 1 .
Nine flood hazard factors in this study.

Table 2 .
Soil permeability of various soil textures.

Table 3 .
Runoff coefficients of various land use types.