Evaluation of Flood Prediction Capability of the WRF-Hydro Model Based on Multiple Forcing Scenarios

: The Weather Research and Forecasting (WRF)-Hydro model as a physical-based, fully-distributed, multi-parameterization modeling system easy to couple with numerical weather prediction model, has potential for operational ﬂood forecasting in the small and medium catchments (SMCs). However, this model requires many input forcings, which makes it di ﬃ cult to use it for the SMCs without adequate observed forcings. The Global Land Data Assimilation System (GLDAS), the WRF outputs and the ideal forcings generated by the WRF-Hydro model can provide all forcings required in the model for these SMCs. In this study, seven forcing scenarios were designed based on the products of GLDAS, WRF and ideal forcings, as well as the observed and merged rainfalls to assess the performance of the WRF-Hydro model for ﬂood simulation. The model was applied to the Chenhe catchment, a typical SMC located in the Midwestern China. The ﬂood prediction capability of the WRF-Hydro model was also compared to that of widely used Xinanjiang model. The results show that the three forcing scenarios, including the GLDAS forcings with observed rainfall, the WRF forcings with observed rainfall and GLDAS forcings with GLDAS-merged rainfall, are optimal input forcings for the WRF-Hydro model. Their mean root mean square errors (RMSE) are 0.18, 0.18 and 0.17 mm / h, respectively. The performance of the WRF-Hydro model driven by these three scenarios is generally comparable to that of the Xinanjiang model (RMSE = 0.17 mm / h).


Introduction
Flood disaster is one of the common natural disaster, which often causes loss of life and property [1,2]. The risk of extreme floods in large basins of China has been substantially reduced owing to the improvement of flood forecasting skills in recent years [3]. However, flash flood forecasting and prevention of the small and medium catchments (SMCs) remains an urgent problem [4,5]. The flash floods taking place in the SMCs are characterized by short routing time, which makes flood prediction difficult [6]. Additionally, flood simulation and forecasting for SMCs face more challenges due to sparse observation and inadequate information of field data. The Chenhe catchment has a typical temperate continental monsoon climate with mean annual temperature of 12.1 °C, ranging from −1.2 °C in January to 26.5 °C in July. This catchment is prone to frequent severe floods especially during summer and autumn seasons (i.e., flood season, from June 1 to October 31). Rainstorms with high intensity and short duration occur in summer and low intensity and long duration in autumn. According to the USGS 24-category Land Use Categories, vegetation types of the catchment include Savanna (60.9%), Grassland (12.3%), Mixed Forest (8.9%) and Deciduous Broadleaf Forest (7.9%), interspersed with other types (e.g., Shrubland, Cropland, Woodland and Pasture). The surface soil is loamy based on the USGS soil classifications of 16 categories.
The Chenhe gauging station located at the catchment outlet provides rainfall and discharge data. The other eight gauging stations only provide rainfall data. Observations collected from the gauging stations during the flood seasons between 2003 and 2011 are used to produce hourly rainfall and streamflow records using linear interpolation. We adopted 19 typical flood events in this study, and their properties (e.g., duration) are shown in Table 1. The spin-up period of the WRF-Hydro model is set from 1 June to the beginning of the first flood event each year, where its minimum duration is around 30 days in 2005 and the maximum is around 95 days in 2006. Note that the initial condition of the WRF-Hydro model in the spin-up period is identical to that of the WRF model driven by the The Chenhe catchment has a typical temperate continental monsoon climate with mean annual temperature of 12.1 • C, ranging from −1.2 • C in January to 26.5 • C in July. This catchment is prone to frequent severe floods especially during summer and autumn seasons (i.e., flood season, from June 1 to October 31). Rainstorms with high intensity and short duration occur in summer and low intensity and long duration in autumn. According to the USGS 24-category Land Use Categories, vegetation types of the catchment include Savanna (60.9%), Grassland (12.3%), Mixed Forest (8.9%) and Deciduous Broadleaf Forest (7.9%), interspersed with other types (e.g., Shrubland, Cropland, Woodland and Pasture). The surface soil is loamy based on the USGS soil classifications of 16 categories.
The Chenhe gauging station located at the catchment outlet provides rainfall and discharge data. The other eight gauging stations only provide rainfall data. Observations collected from the gauging stations during the flood seasons between 2003 and 2011 are used to produce hourly rainfall and streamflow records using linear interpolation. We adopted 19 typical flood events in this study, and their properties (e.g., duration) are shown in Table 1

The WRF Model
The WRF model, a new generation of mesoscale NWP model, has been widely used in the past 20 years [36,53]. It can produce high-resolution (1-10 km) simulations of meteorological variables such as rainfall [33,34]  The WRF-Hydro model is a distributed, physical-based and multi-parameterization model with the combined infiltration-excess and saturated-excess runoff module. This model adopts high-resolution routing modules (i.e., overland routing, interflow routing, base-flow routing, channel routing and reservoir or lake routing) to allow multi-scale grids (i.e., LSM grids of kilometers and routing grids of hundreds of meters). Some state variables need to be transmitted between LSM and routing grids when the model is running. For example, soil moisture is mapped from LSM grids onto routing grids. After routing calculation on the fine-resolution grids, the redistribution of soil moisture is aggregated onto LSM grids. The disaggregation-aggregation methodology is described in [23].
Canopy interception of rainfall is described by the water balance equation, and water permeation in subsurface layers is calculated using the Richards Equation [17,48]. Lateral subsurface flow calculated by quasi three-dimensional flow is a function of topography, saturated soil depth and saturated hydraulic conductivity [17,48]. The ponded water comes from three sources: infiltration excess, exfiltration from saturated soil layer and water exchange between grids. Once the ponded water depth exceeds the retention depth, the excess flows freely as surface runoff according to the Shallow Water Wave Equations [59]. A conceptual water bucket model is used to calculate the water storage change under subsurface layers. The Diffusion Wave Equations are used to describe channel routing, where the shape and roughness of each stream order are predefined but can be calibrated. The Manning's Equation is used in both Shallow Water Wave Equations and Diffusion Wave Equations to simulate friction action on the land surface and channel. More details are available in [17].

Data and Model Settings
The coarse digital elevation and other ancillary data (e.g., land use and soil type) on LSM grids are from the WPS, while high-resolution terrain data required on routing grids are provided by Geospatial Data Cloud site, Computer Network Information Center, Chinese Academy of Sciences (http://www.gscloud.cn/). Other high-resolution fields on routing grids such as flow direction and channel network were obtained through ArcGIS. Note that the channel network was extracted from the fine-scale terrain data with stream definition threshold of 320, thereby yielding a stream below every 20 km 2 of contributing area in accordance with the actual channel network. The spatial resolutions of the LSM and routing grids are 1 km and 250 m, respectively. The time steps of the terrain and channel routing simulations are 15 s [17]. Main physical parameterizations of the model in this study are shown in Table 3.

Calibration of Uncoupled WRF-Hydro Model
The calibration of the uncoupled WRF-Hydro model (version 5.0.3) was only performed in the flood seasons of the first four years (2003)(2004)(2005)(2006) due to high computational cost. For model calibration, the model was only driven by one scenario: GLDAS forcings with observed areal precipitation. Namely, the observed areal precipitation was obtained using the Inverse Distance Weight (IDW) method [61] from rain gauge data, and the other forcings were extracted from the GLDAS product. It is because other spatial interpolation methods (e.g., ordinate kriging method) are not suitable for SMCs due to sparse observation [24], and the GLDAS as a reanalysis product has good performance in evaporation simulation in China [49,50]. Distributed hydrology soil and vegetation model [12] Overland flow D8 method [60] Baseflow Exponential storage-discharge function [17] Channel routing Diffusive wave [59] In this study, six model parameters were calibrated including scaling factor on subsurface layer depth (ZSOILFAC), the bucket model exponent of baseflow (GWEXP), referring soil permeability (REFKDT), multiplier on maximum retention depth (RETDEPRTFAC), multiplier on Manning's roughness for overland flow (OVROUGHRTFAC) and multiplier on Manning's roughness for channel (MANNFAC). In fact, the soil moisture in the model depends mainly on the subsurface depth, while the predefined depths may be not suitable for the SMCs. The ZSOILFAC is introduced to facilitate the use of variable depths of the four subsurface layers at same scale rather than fixed as 0.05, 0.20, 0.45 and 0.80 m in [25], thereby influencing soil moisture. The specific meaning and function of these six parameters are shown in Table 4. The calibration of these parameters adopted the Manual Stepwise Approach [24,25] that selects the best parameter value according to objective function. Two kinds of comprehensive objective functions in terms of percent bias (PB) and Nash-Sutcliffe efficiency coefficient (NSE) were adopted in view of the Compromise Programming Method [62] as follow.
where f PB and f NSE represent the objective functions for PB and NSE, respectively, α is the balancing factor and its value takes 4 in accordance with [62], n the number of events in calibration period and k the event number. The f PB is used to gain the optimum values of ZSOILFAC, GWEXP, REFKDT and RETDEPRTFAC since these parameters have an impact on flood volume, and the f NSE the optimum values of OVROUGHRTFAC and MANNFAC since they influence the hydrograph shape. Figure 2 shows the sensitivity analysis for these six parameters, and the calibration ranges and steps of the parameters are shown in Table 5. As a result, the optimum solutions are 0.2, 5.0, 0.7, 0.1, 0.2 and 0.8 for ZSOILFAC, GWEXP, REFKDT, RETDEPRTFAC, OVROUGHRTFAC and MANNFAC, respectively.

The Xinanjiang Model
The Xinanjiang model [63], a conceptual semi-distributed hydrologic model, is widely used in flood forecasting for humid and semi-humid watersheds in China [64][65][66][67]. It has four major components: runoff generation, evapotranspiration, separation of runoff components and flow concentration with main inputs of the model including observed rainfall and pan evaporation [63,68]. The Major model parameters are shown in Table 6. Note that the Xinanjiang model was utilized as a tool for investigating the flood prediction capability of the WRF-Hydro model owing to the satisfactory performance and application in Chenhe catchment.

Forcing Scenarios Design
It is very significant for the WRF-Hydro model to choose the adequate and appropriate forcings at first. Currently, there are three prominent methods to obtain the forcing data. Firstly, all forcings derive from one source directly such as the WRF outputs and the Global Forecast System [24,28,[69][70][71]. Secondly, forcings are from the combination of several products [19,25,72]. For example, reference [25] acquired rainfall, air temperature, air pressure and air humidity from observation, wind speed and incoming shortwave radiation from merged products, and incoming shortwave radiation from the GLDAS. Lastly, forcings are provided by the WRF model using the fully coupled WRF-Hydro modeling system [25,26].
In this study, we adopted the second method to generate the forcings of the uncoupled WRF-Hydro model. Eight forcing scenarios (Table 7) were designed to find out the appropriate forcings and to assess the performance of the WRF-Hydro model in SMCs. In fact, the forcings required in this model are classified as rainfall and the remaining forcings. Rainfall products in this study include IDW product, GLDAS-derived rainfall (Gr), WRF-derived rainfall (Wr), GLDAS-merged rainfall (Gm) and WRF-merged rainfall (Wm). The Successive Corrections Method [42][43][44][45] was adopted as a precipitation merging method, where its weight function takes the Cressman Weight Function [73], the number of iterations 5, and influence radius 100 km. The remaining forcings contain GLDAS, WRF and ideal forcings. The values of ideal forcings [17] generated by the WRF-Hydro model are shown in Table 8. The abbreviation of each experiment; 3 All forcings in this scenario derive from the GLDAS product; 4 All forcings in this scenario derive from the WRF outputs; 5 It can be generated by the WRF-Hydro model when there are not any forcings available in a catchment, and more detail in Table 8.

Evaluation Metrics
The accuracy of the rainfall, evapotranspiration (ET) and streamflow are characterized by five assessment metrics: percent bias (PB), root mean square error (RMSE), correlation coefficient (RR), Nash-Sutcliffe efficiency coefficient (NSE) and Shannon entropy (SE).
where s i is the simulated results for each time step i, o i is the observed value, n is the total number of time series, → x is a discrete random variable recording all different values in the grids, p(x k ) is the frequency of value of x k appearing in the grids and N is the length of the → x . The PB and RMSE measure the errors of simulated results, and the RR and NSE quantify the degree-of-fit between simulation and observation. The SE is regarded as a measure for spatial variability and the higher its values are, the more complex related spatial information is. Note that the SE is only calculated within the Chenhe catchment according to the spatially distributed rainfall or ET with 0.1 mm accuracy.

Results and Discussion
To evaluate the quality of precipitation, we first compared the five rainfall products (i.e., IDW, Gr, Wr, Gm and Wm) using the metrics PB, RMSE and SE. Then, the WRF-Hydro-derived simulated ET of the three scenarios (i.e., G + I, W + I and I + I) was analyzed through the PB, RMSE, RR and SE to understand the impact of different forcings (without rainfall). Finally, we compared simulated streamflow of the eight scenarios (i.e., G + Gr, G + I, W + Wr, W + I, G + Gm, W + Wm and I + I and XAJ) at the outlet via the same metrics.

Evaluation of Five Rainfall Products
As mentioned above, we regarded the IDW product as observed areal precipitation. The cumulative rainfall of the IDW, Gr, Wr, Gm and Wm is shown in Figure 3a, and the PB of the last four products in Figure 3b. The Gr and Wr show relatively poor performance in terms of the PB (Figure 3b). Significant negative errors are observed in 89.5% of events of the Gr, with mean rainfall only 44.4 mm approximately 60% of the observed rainfall (74.7 mm). This is arguably due to information loss in the coarse-scale grids. While an overestimation of rainfall is found in the Wr for 68.4% of events, with mean rainfall up to 97.5 mm nearly 1. To evaluate the quality of precipitation, we first compared the five rainfall products (i.e., IDW, Gr, Wr, Gm and Wm) using the metrics PB, RMSE and SE. Then, the WRF-Hydro-derived simulated ET of the three scenarios (i.e., G + I, W + I and I + I) was analyzed through the PB, RMSE, RR and SE to understand the impact of different forcings (without rainfall). Finally, we compared simulated streamflow of the eight scenarios (i.e., G + Gr, G + I, W + Wr, W + I, G + Gm, W + Wm and I + I and XAJ) at the outlet via the same metrics.

Evaluation of Five Rainfall Products
As mentioned above, we regarded the IDW product as observed areal precipitation. The cumulative rainfall of the IDW, Gr, Wr, Gm and Wm is shown in Figure 3a, and the PB of the last four products in Figure 3b. The Gr and Wr show relatively poor performance in terms of the PB (Figure 3b). Significant negative errors are observed in 89.5% of events of the Gr, with mean rainfall only 44.4 mm approximately 60% of the observed rainfall (74.7 mm). This is arguably due to information loss in the coarse-scale grids. While an overestimation of rainfall is found in the Wr for 68.4% of events, with mean rainfall up to 97.5 mm nearly 1.  We further compared the hourly-scale rainfall series of four products before and after merging with observation for all events ( Figure 4). As shown in Figure 4a,b, the Gr has large negative bias (PB = −0.406), while the Wr has positive bias (PB = 0.306) compared to observation. The Wr has higher RMSE (1.550 mm/hr) than the Gr (1.055 mm/hr), confirming that raw rainfall of the WRF model overestimates the hourly areal mean rainfall [36], even though the Wr has finer spatial resolution. Once merging with gauging data, the Gr improves with RMSE-value soaring from 1.055 to 0.026 mm/hr, and the Wr also has a satisfying performance with RMSE-value increasing from 1.550 to 0.337 mm/hr. Compared with raw products, the quality of rainfall from the GLDAS and WRF is improved through the merging. The Gm performs better than the Wm possibly resulting from the coarser spatial resolution. We further compared the hourly-scale rainfall series of four products before and after merging with observation for all events ( Figure 4). As shown in Figure 4a,b, the Gr has large negative bias (PB = −0.406), while the Wr has positive bias (PB = 0.306) compared to observation. The Wr has higher RMSE (1.550 mm/h) than the Gr (1.055 mm/h), confirming that raw rainfall of the WRF model overestimates the hourly areal mean rainfall [36], even though the Wr has finer spatial resolution. Once merging with gauging data, the Gr improves with RMSE-value soaring from 1.055 to 0.026 mm/h, and the Wr also has a satisfying performance with RMSE-value increasing from 1.550 to 0.337 mm/h. Compared with raw products, the quality of rainfall from the GLDAS and WRF is improved through the merging. The Gm performs better than the Wm possibly resulting from the coarser spatial resolution. After comparing the rainfall and time series, we investigated the spatial distributions of cumulative rainfall on the highest rainfall day, e.g., September 18 for event 030916 and August 20 for event 100820 ( Figure 5). The IDW product is able to capture the general features of rain distribution (e.g., storm center), while its quality normally depends on the density and location of rain gages. The Gr has a smooth spatial distribution nearly without any peaks or depressions and with lower SEvalues compared to others (5.67 and 6.62 for event 030916 and 100820, respectively), while the Wr has higher values of SE (9.08 and 10.00 for event 030916 and 100820, respectively). After merging with observation, the volume and spatial variability of rainfall products have been improved (Figure 5be,g-j), owing to assimilation of field observations. As a result, the quality and reliability of the Gm and Wm outperform those of Gr and Wr to flood simulation of the WRF-Hydro model in terms of the spatial distributions. After comparing the rainfall and time series, we investigated the spatial distributions of cumulative rainfall on the highest rainfall day, e.g., September 18 for event 030916 and August 20 for event 100820 ( Figure 5). The IDW product is able to capture the general features of rain distribution (e.g., storm center), while its quality normally depends on the density and location of rain gages. The Gr has a smooth spatial distribution nearly without any peaks or depressions and with lower SE-values compared to others (5.67 and 6.62 for event 030916 and 100820, respectively), while the Wr has higher values of SE (9.08 and 10.00 for event 030916 and 100820, respectively). After merging with observation, the volume and spatial variability of rainfall products have been improved (Figure 5b-e,g-j), owing to assimilation of field observations. As a result, the quality and reliability of the Gm and Wm outperform those of Gr and Wr to flood simulation of the WRF-Hydro model in terms of the spatial distributions.
To further illustrate the impact of merging method on the spatial distributions of the rainfall products, we also calculated the PB and RMSE of Gr, Gm, Wr and Wm based on the cumulative rainfall on the highest rainfall day at 9 rain gauges for all events (Figure 6). The PB-distributions of the Gm and Wm narrow around 0-line, and the majority of RMSE-values decline from above 50 mm (68.4% and 63.2% for the Gr and Wr, respectively) to below 50 mm (84.2% and 68.4% for the Gm and Wm, respectively). Therefore, the accuracy and spatial variability have been improved after merging with observations, at least for 9 stations. Consequently, the IDW, Gm and Wm are more suitable than the others for hydrological simulation of the WRF-Hydro model.
Gr has a smooth spatial distribution nearly without any peaks or depressions and with lower SEvalues compared to others (5.67 and 6.62 for event 030916 and 100820, respectively), while the Wr has higher values of SE (9.08 and 10.00 for event 030916 and 100820, respectively). After merging with observation, the volume and spatial variability of rainfall products have been improved (Figure 5be,g-j), owing to assimilation of field observations. As a result, the quality and reliability of the Gm and Wm outperform those of Gr and Wr to flood simulation of the WRF-Hydro model in terms of the spatial distributions. To further illustrate the impact of merging method on the spatial distributions of the rainfall products, we also calculated the PB and RMSE of Gr, Gm, Wr and Wm based on the cumulative rainfall on the highest rainfall day at 9 rain gauges for all events (Figure 6). The PB-distributions of the Gm and Wm narrow around 0-line, and the majority of RMSE-values decline from above 50 mm (68.4% and 63.2% for the Gr and Wr, respectively) to below 50 mm (84.2% and 68.4% for the Gm and Wm, respectively). Therefore, the accuracy and spatial variability have been improved after merging  with observations, at least for 9 stations. Consequently, the IDW, Gm and Wm are more suitable than the others for hydrological simulation of the WRF-Hydro model.

Evaluation of Daily WRF-Hydro-derived ET in Three Scenarios
Except rainfall, the remaining forcings also have an effect on the streamflow simulation of the WRF-Hydro model, and they are mainly utilized to calculate potential evaporation through Penman-Monteith Equation, thereby affecting the simulated ET in the model. Therefore, the difference of the forcings (without rainfall) can be indirectly quantified through the ET comparison. To make the results comparable, we analyzed the daily ET of three scenarios, G + I, W + I and I + I, using pan evaporation data collected at Heiyukou (HYK) Station. Note that the pan evaporation only provides a reference for evaluating simulation since it only represents the one-point water surface evaporation rather than actual ET. The hourly cumulative simulated ET from the model was transformed into daily ET to keep consistent with the observation. Figure 7a shows the daily mean ET-volume of observation, G + I, W + I and I + I, and Figure 7b corresponding PB of the three scenarios at the grid containing HYK station for each event. The simulated ET of G + I has a smaller PB mean-value (1.063) than those of other scenarios, confirming that the GLDAS-derived forcings (without rainfall) have good skill in simulating ET despite relatively coarse spatial distribution of GLDAS products (3 h, 0.25°). The simulated ET of W + I has intermediate performance with mean PB of 1.690, indicating it is suitable for study catchment to use the highresolution forcings (without rainfall) of WRF model (1 h, 1 km) to calculate ET, especially when the catchment area is small covered by a few GLDAS grids. The simulation of scenario I + I performs poorly compared to the others in terms of the PB (6.497), which overestimates ET clearly. It is attributed to the oversimplified generalization scheme of ideal forcings (Table 8). It implies that the forcings (without rainfall) from the GLDAS and the WRF are recognized as the suitable ones according to the PB of the ET volume when the rainfall is identical.

Evaluation of Daily WRF-Hydro-Derived ET in Three Scenarios
Except rainfall, the remaining forcings also have an effect on the streamflow simulation of the WRF-Hydro model, and they are mainly utilized to calculate potential evaporation through Penman-Monteith Equation, thereby affecting the simulated ET in the model. Therefore, the difference of the forcings (without rainfall) can be indirectly quantified through the ET comparison. To make the results comparable, we analyzed the daily ET of three scenarios, G + I, W + I and I + I, using pan evaporation data collected at Heiyukou (HYK) Station. Note that the pan evaporation only provides a reference for evaluating simulation since it only represents the one-point water surface evaporation rather than actual ET. The hourly cumulative simulated ET from the model was transformed into daily ET to keep consistent with the observation. Figure 7a shows the daily mean ET-volume of observation, G + I, W + I and I + I, and Figure 7b corresponding PB of the three scenarios at the grid containing HYK station for each event. The simulated ET of G + I has a smaller PB mean-value (1.063) than those of other scenarios, confirming that the GLDAS-derived forcings (without rainfall) have good skill in simulating ET despite relatively coarse spatial distribution of GLDAS products (3 h, 0.25 • ). The simulated ET of W + I has intermediate performance with mean PB of 1.690, indicating it is suitable for study catchment to use the high-resolution forcings (without rainfall) of WRF model (1 h, 1 km) to calculate ET, especially when the catchment area is small covered by a few GLDAS grids. The simulation of scenario I + I performs poorly compared to the others in terms of the PB (6.497), which overestimates ET clearly. It is attributed to the oversimplified generalization scheme of ideal forcings (Table 8). It implies that the forcings (without rainfall) from the GLDAS and the WRF are recognized as the suitable ones according to the PB of the ET volume when the rainfall is identical. with observations, at least for 9 stations. Consequently, the IDW, Gm and Wm are more suitable than the others for hydrological simulation of the WRF-Hydro model.

Evaluation of Daily WRF-Hydro-derived ET in Three Scenarios
Except rainfall, the remaining forcings also have an effect on the streamflow simulation of the WRF-Hydro model, and they are mainly utilized to calculate potential evaporation through Penman-Monteith Equation, thereby affecting the simulated ET in the model. Therefore, the difference of the forcings (without rainfall) can be indirectly quantified through the ET comparison. To make the results comparable, we analyzed the daily ET of three scenarios, G + I, W + I and I + I, using pan evaporation data collected at Heiyukou (HYK) Station. Note that the pan evaporation only provides a reference for evaluating simulation since it only represents the one-point water surface evaporation rather than actual ET. The hourly cumulative simulated ET from the model was transformed into daily ET to keep consistent with the observation. Figure 7a shows the daily mean ET-volume of observation, G + I, W + I and I + I, and Figure 7b corresponding PB of the three scenarios at the grid containing HYK station for each event. The simulated ET of G + I has a smaller PB mean-value (1.063) than those of other scenarios, confirming that the GLDAS-derived forcings (without rainfall) have good skill in simulating ET despite relatively coarse spatial distribution of GLDAS products (3 h, 0.25°). The simulated ET of W + I has intermediate performance with mean PB of 1.690, indicating it is suitable for study catchment to use the highresolution forcings (without rainfall) of WRF model (1 h, 1 km) to calculate ET, especially when the catchment area is small covered by a few GLDAS grids. The simulation of scenario I + I performs poorly compared to the others in terms of the PB (6.497), which overestimates ET clearly. It is attributed to the oversimplified generalization scheme of ideal forcings (Table 8). It implies that the forcings (without rainfall) from the GLDAS and the WRF are recognized as the suitable ones according to the PB of the ET volume when the rainfall is identical. We further compared daily-scale simulated ET series of the three scenarios with pan evaporation at HYK Station, respectively (Figure 8). The daily ET of the G + I has a better performance than the others with narrower spread and smaller bias in terms of PB, RMSE and RR (Figure 8). Scenario W + I performs moderately, and most points (89.2% versus 81.9% in G + I) are above the 45 • line. The PB of the W + I (1.259) is twice greater than that of the G + I (0.511), and RR of the W + I (0.581) is close to that of the G + I (0.580). The ET from scenario I + I is overestimated for nearly all points (98.8%) despite its narrower distribution (range: 2.4 mm/day). The overestimations of ET at the grid containing HYK station are observed in the three scenarios, possibly because the potential evaporation is overestimated at this grid when the model is driven by these forcings (without rainfall). Compared with scenario W + I and I + I, scenario G + I produces the best simulated ET, and GLDAS-derived forcings (without rainfall) can serve as good ones to the WRF-Hydro model for the study catchment. We further compared daily-scale simulated ET series of the three scenarios with pan evaporation at HYK Station, respectively (Figure 8). The daily ET of the G + I has a better performance than the others with narrower spread and smaller bias in terms of PB, RMSE and RR (Figure 8). Scenario W + I performs moderately, and most points (89.2% versus 81.9% in G + I) are above the 45° line. The PB of the W + I (1.259) is twice greater than that of the G + I (0.511), and RR of the W + I (0.581) is close to that of the G + I (0.580). The ET from scenario I + I is overestimated for nearly all points (98.8%) despite its narrower distribution (range: 2.4 mm/day). The overestimations of ET at the grid containing HYK station are observed in the three scenarios, possibly because the potential evaporation is overestimated at this grid when the model is driven by these forcings (without rainfall). Compared with scenario W + I and I + I, scenario G + I produces the best simulated ET, and GLDAS-derived forcings (without rainfall) can serve as good ones to the WRF-Hydro model for the study catchment. In order to learn the spatial pattern of simulated ET in the three scenarios, we took events 030916 and 100820 on the highest observed ET day (i.e., September 22 for event 030916 and Auguat 21 for event 100820) for example. As shown in Figure 9, the spatial distribution of ET in scenario W + I unanimously has the highest SE (4.494 in 030916 and 3.953 in 100820), suggesting that the forcings (without rainfall) of the WRF model outputs capture the ET distribution. The G + I simulations have lowest SE (2.719 in 030916 and 3.133 in 100820) compared to the others, arguably due to the coarse resolution of the GLDAS data. The ET is overestimated over the whole catchment when using scenario I + I to drive the WRF-Hydro model even if this scenario yields moderate SE (4.091 in 030916 and 3.385 in 100820). This implies that the overestimation of ET for the I + I may occur not only at Heiyukou station, but also catchment-wide due to the generation pattern of the ideal forcings (Table  8). Consequently, the forcings (without rainfall) derived from the GLDAS and WRF show good skills in ET simulation according to error, correlation and spatial variability and are identified as good data set for the hydrological simulation of the WRF-Hydro model in the Chenhe catchment. In order to learn the spatial pattern of simulated ET in the three scenarios, we took events 030916 and 100820 on the highest observed ET day (i.e., September 22 for event 030916 and Auguat 21 for event 100820) for example. As shown in Figure 9, the spatial distribution of ET in scenario W + I unanimously has the highest SE (4.494 in 030916 and 3.953 in 100820), suggesting that the forcings (without rainfall) of the WRF model outputs capture the ET distribution. The G + I simulations have lowest SE (2.719 in 030916 and 3.133 in 100820) compared to the others, arguably due to the coarse resolution of the GLDAS data. The ET is overestimated over the whole catchment when using scenario I + I to drive the WRF-Hydro model even if this scenario yields moderate SE (4.091 in 030916 and 3.385 in 100820). This implies that the overestimation of ET for the I + I may occur not only at Heiyukou station, but also catchment-wide due to the generation pattern of the ideal forcings (Table 8). Consequently, the forcings (without rainfall) derived from the GLDAS and WRF show good skills in ET simulation according to error, correlation and spatial variability and are identified as good data set for the hydrological simulation of the WRF-Hydro model in the Chenhe catchment.

Evaluation of Streamflow for the Eight Scenarios
We first compared the streamflow simulations of the WRF-Hydro model among the seven forcing scenarios (G + Gr, G + I, W + Wr, W + I, G + Gm, W + Wm and I + I) ( Figure 10 and Table 9), and then compared the three best results of the WRF-Hydro model (i.e., scenarios G + I, W + I and G + Gm) with the result of the Xinanjiang model (i.e., scenario XAJ) (Figures 11 and 12). Note that the calibration of the WRF-Hydro model was only preformed in scenario G + I, and the parameters after calibration were adopted in the other scenarios.

Evaluation of Streamflow for the Eight Scenarios
We first compared the streamflow simulations of the WRF-Hydro model among the seven forcing scenarios (G + Gr, G + I, W + Wr, W + I, G + Gm, W + Wm and I + I) ( Figure 10 and Table 9), and then compared the three best results of the WRF-Hydro model (i.e., scenarios G + I, W + I and G + Gm) with the result of the Xinanjiang model (i.e., scenario XAJ) (Figures 11 and 12). Note that the calibration of the WRF-Hydro model was only preformed in scenario G + I, and the parameters after calibration were adopted in the other scenarios. Table 9. The mean values of four metrics shown in Figure 10.   The mean values of four metrics shown in Figure 10. Scenarios G + I, W + I and G + Gm perform well, indicated by smaller PB (0.009-0.063) and RMSE (<0.2 mm/hr) and narrower spreads than other scenarios using the WRF-Hydro model, similar to that of scenario XAJ (Figure 10). These results indicate that scenarios G + I, W + I and G + Gm have good and similar performances when regarding them as forcings to drive the WRF-Hydro model.

G + Gr G + I W + Wr W + I G + Gm W + Wm I + I XAJ
To analyze how different forcings (without rainfall) influence the streamflow pattern, we also compared scenarios G + I, W + I and I + I ( Figure 10). For scenario I + I, the RR and NSE are moderate, while the 84.2% of PB-values are below 0 and the RMSE-spread is wider than corresponding values of the G + I and W + I ( Figure 10 c and 10 d). The underestimation of streamflow of scenario I + I (PB = -0.291) is largely due to the overestimation of ET (PB = 6.497), further revealing that different forcings (without rainfall) have a non-ignorable impact on streamflow reproduction. The flood hydrographs of scenarios G + I, G + Gm and W + I indicate that the WRF-Hydro model may be better at describing the rising limb of flood hydrographs. To further investigate this phenomenon, the degree-of-fit of flood hydrograph in rising and falling limbs was calculated through the NSE where the rising limb is from the start to the flood peak time and the rest is falling limb (Figure 12). The WRF-Hydro model has wider NSE-distribution than the Xinanjiang model for rising and falling limbs. The NSE of the rising limb in around half of the events (52.6%, 47.4% and 63.2% for The Xinanjiang model was adopted to assess the flood prediction capacity of the WRF-hydro model, and their performance difference is mainly caused by the two points: model structure and input-output data. On the one hand, the conceptual Xinanjiang model adopts the three-layer soil moisture model [63] to calculate the ET, which aims at water balance in hydrological simulation. The saturated-excess runoff module is another characteristic of this model [63]. As a semi-distributed model, it adopts the tension water capacity curve [63] to represent the spatial inhomogeneity of soil moisture. By contrast, the WRF-Hydro model simulates the ET with the Noah or Noah-MP LSM taking account of water and energy balance [48]. This model adopts the combined infiltration-excess and saturated-excess runoff module for runoff calculation. The orthogonal grids are used in the LSM and routing grids to represent the spatial distribution of hydrometeorological variables and parameters. On the other hand, the inputs of the Xinanjiang model only include precipitation and pan evaporation. Its outputs contain ET, the discharge at the outlet of the watershed and the areal mean soil moisture. It is not easy for the Xinanjiang model to absorb some useful meteorological data (e.g., radiation and wind speed) and to obtain the distributed simulation. However, the WRF-Hydro model requires substantial spatial inputs including the forcings, underlying surface state and corresponding parameters and produces spatial outputs including distributed streamflow, water and energy flux and hydrometeorological states. This model assimilates lots of effective information to achieve spatial hydrological simulation, although this simulation is sensitive to the quality of inputs. In addition, it is easy to fully couple with the WRF model for the operational hydrometeorological prediction. Therefore, the WRF-Hydro model has promising potential for flood forecasting in the Chenhe catchment.

Conclusions
Based on the observed and merged rainfalls, GLDAS, WRF outputs and ideal forcings, seven scenarios were designed (G + Gr, G + I, W + Wr, W + I, G + Gm, W + Wm and I + I) for driving the WRF-Hydro model to simulate floods of the Chenhe catchment. It is indicated that the WRF-Hydro model can yield better results when driven by the scenarios G + I, W + I or G + Gm than other scenarios using the WRF-Hydro model (i.e., G + Gr, W + Wr, W + Wm and I + I).
It is not recommended to use directly the GLDAS-and WRF-derived rainfalls to simulate the floods of the Chenhe catchment. The flood simulations can be improved when using GLDAS-and WRF-merged rainfalls. The WRF-Hydro model tends to overestimate the ET and subsequently to underestimate the streamflow when using the ideal forcings (without rainfall). However, the model produces better ET simulations when using the forcings (without rainfall) from the GLDAS and WRF.
Although the performance of the WRF-Hydro and Xinanjiang models is generally comparable, the WRF-Hydro model can produce spatially distributed outputs such as evaporation, streamflow

Comparison of the Scenarios Using the WRF-Hydro Model
Scenarios G + I, W + I and G + Gm perform well, indicated by smaller PB (0.009-0.063) and RMSE (<0.2 mm/h) and narrower spreads than other scenarios using the WRF-Hydro model, similar to that of scenario XAJ ( Figure 10). These results indicate that scenarios G + I, W + I and G + Gm have good and similar performances when regarding them as forcings to drive the WRF-Hydro model.
To analyze how different forcings (without rainfall) influence the streamflow pattern, we also compared scenarios G + I, W + I and I + I ( Figure 10). For scenario I + I, the RR and NSE are moderate, while the 84.2% of PB-values are below 0 and the RMSE-spread is wider than corresponding values of the G + I and W + I (Figure 10c,d). The underestimation of streamflow of scenario I + I (PB = −0.291) is largely due to the overestimation of ET (PB = 6.497), further revealing that different forcings (without rainfall) have a non-ignorable impact on streamflow reproduction.
We further compared scenarios G + Gr, W + Wr, G + Gm and W + Wm to understand the impact of merged rainfall on the discharge simulation (Figure 10). Scenario G + Gr exhibits poor performance with higher PB and RMSE (Table 9), implying that the underestimation of Gr (PB = −0.357) is likely to be responsible for the discharge underestimation. Scenario W + Wr also yields a poor result and overestimates streamflow possibly due to large positive error of Wr (PB = 0.353). However, not only the flood volume but also correlation and shape of hydrograph are improved when Gr is replaced by Gm and Wr by Wm (Figure 10b,d). The mean RMSEs of scenarios G + Gm and W + Wm are as low as 0.17 and 0.23 mm/h, respectively, while 0.41 and 0.74 mm/h for the G + Gr and W + Wr, respectively. As a result, the merged rainfall lead to improved streamflow simulation with the mean RMSEs improved by 58.5% and 68.9% for G + Gm and W + Wm, respectively.
Many noticeable outliers (i.e., the points outside of the 5th to 95th percentiles in box plots) are observed for three primary reasons ( Figure 10). Firstly, the whole distributions are excellent, and thus some acceptable values are recognized as outliers such as the G + Gm (Figure 10b). Secondly, large positive bias in WRF-derived rainfall causes outliers such as the W + Wr (Figure 10c). Thirdly, the overestimation of ET in scenario I + I leads to negative bias in streamflow reproduction such as the I + I (Figure 10d). Moreover, events with smaller peak flow (e.g., Event 040,901 with peak flow of 153 m 3 /s) always exhibit poor performance regardless of forcings, possibly because the duration of spin-up period is too short to obtain suitable initial state.
It is emphasized that the local observation at rain gauges and global, spatially-distributed GLDAS product and FNL data driving the WRF model are available in general SMCs or regions. To study the feature of these data is significant for the hydrometeorological simulation using the WRF-Hydro model to SMCs, especially for the regions with sparse observation. Scenarios G + I, G + Gm and W + I are likely to yield good performances for other SMCs lacking observed forcings. Scenario W + Wm may be more suitable for these catchments when the WRF model uses appropriate parameterization with high-quality input or when data assimilation technologies are used in WRF model.

Comparison of the WRF-Hydro and Xinanjiang Model
To evaluate the flood prediction capability of the WRF-Hydro model, we only compared the best results of the model (i.e., scenarios G + I, W + I and G + Gm) with the result of the Xinanjiang model (i.e., scenario XAJ). As shown in Figure 10a, the results in scenarios G + I, W + I and G + Gm have a narrower spread than scenario XAJ in terms of the PB and RMSE ( Table 9). The RR and NSE of scenario XAJ are better than those of scenarios G + I, W + I and G + Gm (Table 9). It indicates that the WRF-Hydro model has good skills to simulate the flood volume, while the Xinanjiang model performs well in describing the flood hydrograph. Figure 11 shows the rainfall, runoff depth and hydrographs of the two events, 20030916 and 20100820. After merging with observations, the total areal mean rainfall of Gm is consistent with that of IDW product for the two cases (Figure 11a The flood hydrographs of scenarios G + I, G + Gm and W + I indicate that the WRF-Hydro model may be better at describing the rising limb of flood hydrographs. To further investigate this phenomenon, the degree-of-fit of flood hydrograph in rising and falling limbs was calculated through the NSE where the rising limb is from the start to the flood peak time and the rest is falling limb (Figure 12). The WRF-Hydro model has wider NSE-distribution than the Xinanjiang model for rising and falling limbs. The NSE of the rising limb in around half of the events (52.6%, 47.4% and 63.2% for G + I, W + I and G + Gm, respectively) exceed 0.7 versus 47.4% of the Xinanjiang model (Figure 12a). Our analysis confirms that the WRF-Hydro model performs well in simulating rising limb at least in half of the events. Similar NSE-distribution is obtained for the falling limb, but corresponding NSEs are closer to 1.0 for the two models (Figure 12b). The Xinanjiang model with NSE-values in 68.4% of events greater than 0.7 has a narrower spread than that of the WRF-Hydro model. In fact, the low NSEs of the two models often appear in events with small peak flow where the Xinanjiang model outperforms the WRF-Hydro model. As a result, these two models exhibit similar performances for the events with higher NSEs (i.e., NSE ≥ 0.7), while the WRF-Hydro model shows moderate results for the events with lower NSEs (i.e., NSE < 0.7). It is possibly because the simulation of the WRF-Hydro model is more sensitive to the quality of inputs than that of the Xinanjiang model.
The Xinanjiang model was adopted to assess the flood prediction capacity of the WRF-hydro model, and their performance difference is mainly caused by the two points: model structure and input-output data. On the one hand, the conceptual Xinanjiang model adopts the three-layer soil moisture model [63] to calculate the ET, which aims at water balance in hydrological simulation. The saturated-excess runoff module is another characteristic of this model [63]. As a semi-distributed model, it adopts the tension water capacity curve [63] to represent the spatial inhomogeneity of soil moisture. By contrast, the WRF-Hydro model simulates the ET with the Noah or Noah-MP LSM taking account of water and energy balance [48]. This model adopts the combined infiltration-excess and saturated-excess runoff module for runoff calculation. The orthogonal grids are used in the LSM and routing grids to represent the spatial distribution of hydrometeorological variables and parameters. On the other hand, the inputs of the Xinanjiang model only include precipitation and pan evaporation. Its outputs contain ET, the discharge at the outlet of the watershed and the areal mean soil moisture. It is not easy for the Xinanjiang model to absorb some useful meteorological data (e.g., radiation and wind speed) and to obtain the distributed simulation. However, the WRF-Hydro model requires substantial spatial inputs including the forcings, underlying surface state and corresponding parameters and produces spatial outputs including distributed streamflow, water and energy flux and hydrometeorological states. This model assimilates lots of effective information to achieve spatial hydrological simulation, although this simulation is sensitive to the quality of inputs. In addition, it is easy to fully couple with the WRF model for the operational hydrometeorological prediction. Therefore, the WRF-Hydro model has promising potential for flood forecasting in the Chenhe catchment.

Conclusions
Based on the observed and merged rainfalls, GLDAS, WRF outputs and ideal forcings, seven scenarios were designed (G + Gr, G + I, W + Wr, W + I, G + Gm, W + Wm and I + I) for driving the WRF-Hydro model to simulate floods of the Chenhe catchment. It is indicated that the WRF-Hydro model can yield better results when driven by the scenarios G + I, W + I or G + Gm than other scenarios using the WRF-Hydro model (i.e., G + Gr, W + Wr, W + Wm and I + I).
It is not recommended to use directly the GLDAS-and WRF-derived rainfalls to simulate the floods of the Chenhe catchment. The flood simulations can be improved when using GLDAS-and WRF-merged rainfalls. The WRF-Hydro model tends to overestimate the ET and subsequently to underestimate the streamflow when using the ideal forcings (without rainfall). However, the model produces better ET simulations when using the forcings (without rainfall) from the GLDAS and WRF.
Although the performance of the WRF-Hydro and Xinanjiang models is generally comparable, the WRF-Hydro model can produce spatially distributed outputs such as evaporation, streamflow and soil moisture. The WRF-Hydro model shows promising potential for operational flood forecasting of the Chenhe catchment, and we plan to conduct more studies on the application of the WRF-Hydro model to other SMCs with different hydroclimatic patterns to further explore its suitability. Additionally, this model has the potential to extend the lead time of operational flood forecasting since it can be fully coupled with the WRF model. This study facilitates the application of the WRF-Hydro model in worldwide SMCs for hydrometeorological simulation, especially for the SMCs with sparse observation.