Comparing the Hydrological Responses of Conceptual and Process-Based Models with Varying Rain Gauge Density and Distribution

Precipitation provides the most crucial input for hydrological modeling. However, rain gauge networks, the most common precipitation measurement mechanisms, are sometimes sparse and inadequately distributed in practice, resulting in an imperfect representation of rainfall spatial variability. The objective of this study is to analyze the sensitivity of different model structures to the different density and distribution of rain gauges and evaluate their reliability and robustness. Based on a rain gauge network of 20 gauges in the Jinjiang River Basin, south-eastern China, this study compared the performance of two conceptual models (the hydrologic model (HYMOD) and Xinanjiang) and one process-based distributed model (the water and energy transfer between soil, plants and atmosphere model (WetSpa)) with different rain gauge distributions. The results show that the average accuracy for the three models is generally stable as the number of rain gauges decreases but is sensitive to changes in the network distribution. HYMOD has the highest calibration uncertainty, followed by Xinanjiang and WetSpa. Differing model responses are consistent with changes in network distribution, while calibration uncertainties are more related to model structures.


Introduction
Precipitation is one of the most crucial inputs in catchment runoff modeling and measuring rainfall is essential for determining hydrological catchment response [1,2]. However, because precipitation is generated by extremely complicated, non-linear, and sensitive atmospheric physical process [3], it shows highly spatial and temporal variability at the basin scale [4][5][6][7].
Due to the importance of precipitation in modeling, prior work has evaluated the effect of uncertainty in precipitation on the response of simulated streams in hydrological models. Wilson et al. [8] found that peak runoff, total runoff volume, and peak timing are considerably influenced by the spatial distribution and precision of rainfall measurements. Singh [9] provided a detailed literature review on the influence of spatial-temporal variability in hydrological factors on rainfall runoff modeling. Sun et al. [10] found that runoff prediction errors at the catchment scale were significantly related to the representation of rainfall data spatial variability. Shen et al. [11] showed that noticeable uncertainty in stream flow and non-point source pollution modeling was caused by spatial rainfall uncertainty obtained from different precipitation interpolation methods.
Traditionally, there are three mechanisms for measuring or observing precipitation: rain gauges, weather radar, and satellite-based sensors [12][13][14]. Rain gauge networks remain the most common method for measuring precipitation [15][16][17] due to their higher accuracy in representing rainfall at their respective locations [18,19] and longer recording period for investigating long-term rainfall runoff processes [20]. Because a rain gauge is a point measurement for precipitation, representing rainfall spatial variability must be affected by the density and distribution of rain gauge networks. To address this relationship, Pardo-Igúzquiza [21] tried to optimize the rain gauge distribution and quantity in a given catchment. Chen et al. [22] proved that more pluviometers distributed in a catchment resulted in higher precision in areal precipitation calculation. Xu et al. [23] and Girons Lopez et al. [24] found there is a threshold value of rain gauge density, before which the presentation of rainfall improves significantly with increasing rain gauge numbers.
Significant early research focused on the effect of density and distribution of rain gauge networks on catchment modeling. However, most were only concerned with the impact of density. Michaud and Sorooshian [25] showed that poor rain gauge densities caused an inadequate simulation of flood peak in a midsized semi-arid catchment. Chaplot et al. [26] selected rain gauges with a certain empirical distribution. Andreassian et al. [27], Anctil et al. [28], and Xu et al. [23] randomly selected several subsets of different numbers of gauges to achieve the same level of rain gauge density and evaluated the relationship between density and modeling results. Similarly, Drogue and Khediri [29] employed approximately 100 subsets with the same number of rain gauges and used the mean average of rainfall data for their evaluated.
In the study of Bardossy and Das [30], distributions of rain gauges with different densities were obtained from an optimization algorithm. Some studies have considered pluviometer density and distribution at the same time [31], while other researchers have investigated the effect of udometer distribution on conceptual models [27,29,32]. These studies have evaluated the individual responses of different process-based models [26,31], compared a couple of models [33], and even applied a neural network to rainfall-runoff modeling [28]. However, a comprehensive comparison of the performances of different models impacted by different rain gauge network distributions, especially networks with varying numbers of gauges, has yet to be presented.
Identifying the effect of different rain gauges' distribution on the hydrological response of conceptual and process-based models will be helpful in (1) optimizing rain gauge networks and (2) selecting hydrological models for a given basin. Therefore, this study aims to advance the discipline by comparing the impact of rain gauge distribution on typical conceptual and process-based distributed models. Furthermore, the potential influence of uncertainty in model calibrations is investigated. The paper is organized as follows: Section 2 introduces the study area, datasets, applied models and design scenarios. Section 3 provides the simulation results in two parts: for the whole validation period and a typical month within the validation period. Section 4 discusses the results and compares them with previous research. Section 5 provides the study conclusions.

Study Area
The Jinjiang River Basin is located in the north-west Ganjiang River Basin, China, with a drainage area of 6215 km 2 above the Gaoan hydrological station (Figure 1). The elevation of the catchment is higher in the north-west, where most tributaries of the Jinjiang River originate, ranging from 18 to 1096 m above sea level. The Jinjiang River Basin is located in the subtropical region with a warm and humid climate and receives approximately 1300-2100 mm of precipitation per year, with average runoff reaching about 184 m 3 /s at Gaoan station based on measured data. As the second largest tributary of the Ganjiang River, the Jinjiang River flows into the Ganjiang River about 30 km above Nanchang City, the capital city of Jiangxi Province, China. No controlled reservoirs have been built on the main stream of the Jinjiang River, resulting in a direct threat to Nanchang City during flood periods.

Datasets
Daily precipitation data from 20 rain gauges and observed streamflow data from the Gaoan hydrological station for 2008-2013 were provided by the Jiangxi Hydrological Bureau. The distribution of the pluviometers with their names and location of the Gaoan hydrological station are shown in Figure 1. The Thiessen polygon method [34] was used to interpolate the recorded rain data in the sub-basin areas.
Other meteorological data, including temperature, wind speed, relative humidity, and solar radiation, was obtained from the China Meteorological Assimilation Driving Datasets for the SWAT (Soil and Water Assessment Tool) model (CMADS) version 1.1 [35,36]. The CMADS datasets were developed based on the China Meteorological Assimilation Land Data Assimilation System (CLADS) with a temporal resolution of 1 day and spatial resolution of 0.25° × 0.25° [37]. The datasets were tested in the Heihe River Basin, Manas River Basin, and Hunhe River Basin [37][38][39], demonstrating a reliable performance in reflecting the observed meteorological data for China.
The digital elevation map (DEM) was developed by the Data Center for Resources and Environment Sciences, Chinese Academy of Sciences (RESDC), at a resolution of 1 km × 1 km. Landuse data with the same resolution as the DEM for the basin in 2000 were developed by the Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences. The 10 km × 10 km cell soil data and related soil physical data were gathered from the Institute of Soil Science, Chinese Academy of Sciences.

Models
In this study, three hydrological models were applied to quantify hydrologic responses. Each model was a typical representation of a different model type, ranging from conceptual to physically based: the hydrologic model (HYMOD) [40], Xinanjiang model (XAJ) [41], and water and energy transfer between soil, plants and atmosphere model (WetSpa) [42]. All three models were used as semi-distributed models with the same watershed subdivisions shown in Figure 1. Each model

Datasets
Daily precipitation data from 20 rain gauges and observed streamflow data from the Gaoan hydrological station for 2008-2013 were provided by the Jiangxi Hydrological Bureau. The distribution of the pluviometers with their names and location of the Gaoan hydrological station are shown in Figure 1. The Thiessen polygon method [34] was used to interpolate the recorded rain data in the sub-basin areas.
Other meteorological data, including temperature, wind speed, relative humidity, and solar radiation, was obtained from the China Meteorological Assimilation Driving Datasets for the SWAT (Soil and Water Assessment Tool) model (CMADS) version 1.1 [35,36]. The CMADS datasets were developed based on the China Meteorological Assimilation Land Data Assimilation System (CLADS) with a temporal resolution of 1 day and spatial resolution of 0.25 • × 0.25 • [37]. The datasets were tested in the Heihe River Basin, Manas River Basin, and Hunhe River Basin [37][38][39], demonstrating a reliable performance in reflecting the observed meteorological data for China.
The digital elevation map (DEM) was developed by the Data Center for Resources and Environment Sciences, Chinese Academy of Sciences (RESDC), at a resolution of 1 km × 1 km. Land-use data with the same resolution as the DEM for the basin in 2000 were developed by the Institute of Geographical Sciences and Natural Resources Research, Chinese Academy of Sciences. The 10 km × 10 km cell soil data and related soil physical data were gathered from the Institute of Soil Science, Chinese Academy of Sciences.

Models
In this study, three hydrological models were applied to quantify hydrologic responses. Each model was a typical representation of a different model type, ranging from conceptual to physically based: the hydrologic model (HYMOD) [40], Xinanjiang model (XAJ) [41], and water and energy transfer between soil, plants and atmosphere model (WetSpa) [42]. All three models were used as semi-distributed models with the same watershed subdivisions shown in Figure 1.
Each model operated on every sub-watershed separately with the same parameters while the rainfall and meteorological data were unique. The outputs from upper sub-basins were used as the inputs to the following sub-basins. Total streamflow for the entire basin was generated through all sub-basins according to hydraulic connections using the Muskingum method. Observed daily rainfall, potential evapotranspiration and runoff data at the Gaoan station were used as input data for all three models. The WetSpa model required the DEM, land use, and soil type for the study area.

Hydrologic Model (HYMOD)
HYMOD is a watershed-scale conceptual rainfall-runoff model based on the probability-distributed theory introduced by Moore [40]. It has been used in several rainfall runoff modeling studies [43][44][45][46]. This model describes runoff generation using a rainfall excess model and two sets of linear tanks (one slow flow tank and three identical tanks for quick flow) to reflect the routing process. The five parameters used in HYMOD are provided in Table 1; schematics for its application can be found in prior work [44,46].

Xinanjiang Model (XAJ)
The XAJ model, developed by Zhao et al. [41,47], has been widely applied in rainfall runoff simulation, hydrologic forecasting, and other applications [48,49]. This model is more parameterized in describing the details of hydrological responses than HYMOD because it has three modules: three-layer evapotranspiration, runoff generation, and runoff routing [47,50]. XAJ describes the underlying surface of the study area as three vertical layers, uper, lower, and deep. Runoff generation, routing, and evaporation are separately performed in the three layers. Descriptions of the 15 parameters, and their ranges, are provided in Table 1; the calculation process is summarized in Li et al. [50].

Water and Energy Transfer Between Soil, Plants and Atmosphere Model (WetSpa)
WetSpa is a typical physically based distributed model initially proposed by Wang et al. [42] and further perfected and applied to simulating hydrological processes by many researchers [51][52][53][54]. In this model, hydrological processes include precipitation, evapotranspiration, interception, infiltration, surface runoff, depression storage, percolation, ground water drainage, and interflow. Water and energy balances are considered within horizontal grid cells and in four vertical layers, the canopy, root layer, transmission layer, and saturated zone. Therefore, the model accounts for the highly spatial variance in meteorological data, terrain, land cover, and soil across a catchment. Values of some parameters, such as permeability and evapotranspiration rate, in a single grid cell can be determined from a priori knowledge from land use and soil types. Global parameters still need to be calibrated; some are presented in Table 1.

Model Calibration and Validation
Calibration is a necessary procedure for all hydrological models before simulating catchment rainfall runoff. The three described models are calibrated against 2009-2011 observed daily streamflow data with a 1-year period (2008) for spin-up. The dynamically dimensioned search algorithm (DDS) [55] is used to optimize parameters for each model. The DDS algorithm is a heuristic global random search algorithm, in which the search is performed in all dimensions of the decision space and the number of dimensions is reduced as the search continues. This process makes the DDS a very efficient optimization algorithm, which has been successfully applied to simulating rainfall runoff and predicting hydrology [56,57]. With a total 1000 search iterations, the model parameters are optimized by maximizing the objective function, the Nash-Sutcliffe efficiency (NSE) [58], given by: in which obs and sim are the observed and simulated daily runoff, respectively, obs is the mean value of observed runoff, i is the ith day, and N is the number of days. A value of 1 for NSE represents a perfect fit. Considering the equifinality in model calibration [59], each model is calibrated for 30 trials to achieve a better statistical representativeness of the results in the validation period. Correction factor for potential evapotranspiration 0~2 K gi (mm) Initial groundwater storage 0~500 K gm (mm) Groundwater storage scaling factor 0~2000 Rainfall scaling factor 0~500 * Parameter ranges for HYMOD from Moradkhani et al. [45]. Parameter ranges for XAJ from Li et al. [50] and Lü et al. [60]. The selection and range in parameters for WetSpa from Shafii et al. [61].
The optimized parameter values from the calibration are used to model runoff for 2012-2013. The statistical criteria used to quantitatively evaluate model performance are relative bias (BIAS), reflecting the ability of recreating the water balance; the Pearson correlation coefficient (CC), which manifests the agreement between the simulated and the observed runoff; and NSE, as described above, which evaluates the goodness of the model results. The first two indexes are given by the following expressions: where, σ sim and σ obs are standard deviations of simulated and observed runoff series, sim is the average value of assumed runoff, and the other symbols are the same as defined for Equation (1). The perfect value for BIAS is 0; for CC it is 1. The standard deviation σ is calculated as follows: where, x i is the ith value of observed or simulated daily runoff, x is the arithmetic average value of observed or simulated daily runoff, and the remaining symbols are the same as defined previously.

Scenario Settings
The correlations between rain data for a single gauge and streamflow data for Gaoan station are provided in Table 2. Notably, all rain gauges have higher relevance with runoff data with 1 day or 2 days delay, even for LZH, which is the nearest udometer to Gaoan station. However, not a single gauge is significantly related to the streamflow data, and the reverse is the same. To estimate the model uncertainties resulting from the accuracy of precipitation measurements, a total of 16 scenarios with varying rain gauge numbers and distributions are evaluated in this study. As presented in Table 2, rain gauge partition is described in terms of the direction that the barycenter selected udometer sets. Most rain gauges are distributed in one direction, while at least one rain gauge is set to the opposite direction to provide necessary rainfall spatial variety. According to the shape of Jinjiang River Basin, the directions are classified as central, north-east, north-west, south-east, and south-west. Given the five rain gauge distributions and four selected rain gauge quantities, i.e., 5, 10, 15, and 20, different pluviometer densities are defined in the study area. For each direction, rain gauges in scenarios with lower quantity are selected from those with higher quantity, i.e., 10 rain gauges in scenario 10_C are selected from those in scenario 15_C, while udometers in 10_SE are selected from those from 15_SE. Therefore, higher quantity scenarios contain all rain gauges in lower quantity scenarios in the same direction. Table 2. Correlation between rain and streamflow data and gauge selection for different scenarios.

Results
Daily streamflow reproduction performance of the three models for 16 rain gauge scenarios during the validation period is investigated in two parts. In part I, general model performance during the entire validation period (1 January 2012 to 31 December 2013) is assessed using the statistical results from the evaluating indicators described in Section 2.3.4. In part II, model performance is discussed for a single month (June 2012) with a detailed daily flow hydrograph comparison.

Model Performance in the Calibration and Validation Period
To compare model performance between HYMOD, XAJ, and WetSpa for the 16 rain gauge scenarios, the statistical criteria in the calibration and validation period are provided in Tables 3-5. Boxplots for the validation period are shown in Figure 2. Table 3 indicates stable NSE and CC values between the calibration and validation periods for HYMOD in all 16 scenarios. However, there is an clear improvement in the BIAS value from the calibration to validation period. XAJ shows similar behavior (Table 4). WetSpa shows a slight improvement in NSE and CC values in most scenarios (Table 5). Figure 2a-c illustrates the accuracy of the streamflow simulations, as indicated by NSE. Generally, metrics for the validation period are acceptable for all three models despite different pluviometer selection scenarios; average NSE values are greater than 0.6. In more detail, results for XAJ are better than those for HYMOD and WetSpa, and the latter two are comparable in most scenarios. Notably, there are discrepancies between the NSE values for the three models. Interquartile ranges for HYMOD are significantly larger than those for XAJ, followed by those for WetSpa, which has the narrowest interquartile NSE range. WetSpa shows the best stability in model performance in every calibration trial, while HYMOD has the largest uncertainty.
When rain gauge selections are taken into consideration, results for all three models perform consistently. With the reduction in rain gauge numbers (scenario 20_C, 15_C, 10_C and 5_C), the mean NSE values for all models show lower variability. The NSE amplitudes for HYMOD and XAJ are rather small, showing robustness for a simple decline in the quantity of udometers. In contrast, the values for WetSpa show greater sensitivity toward changes in rain gauge quantity.
The impacts of the spatial distribution of rain gauges show similar trends in terms of NSE metrics for all models. With the same number of rain gauges (15), all models in scenario 15_SE and 15_NW perform comparably with those in 15_C. However, the accuracy of the results in scenario 15_SW is generally lower. Unexpectedly, model performances for 10_SE and 5_SE are poorest among scenarios with the same number of rain gauges. Similarly, model performance worsens with a decrease in rain gauges from 15 to 5 when the rain gauges are concentrated in the north-west part of the basin (scenarios 15_NW, 10_NW, and 5_NW). Furthermore, variations in the number of rain gauges concentrated in the south-west and north-east has few impacts on streamflow simulations, except for WetSpa in scenario 5_NE, which shows a dramatic reduction in both average and variation in NSE values. Similar results are found for the CC values, as illustrated in Figure 2d-f. Figure 2g-h shows the ability of the three models to recreate the water balance for the 16 rain gauge scenarios. Generally, HYMOD and XAJ exhibit better performance in water balance simulations; HYMOD overestimates and XAJ underestimates the average within about 5%. WetSpa poorly estimates water balance with a BIAS of approximately 20%. Interquartile ranges for BIAS indicate that WetSpa has most stable model performance, while HYMOD has the lowest under multiple calibrations, as indicated previously.

Monthly Analysis for 2012
Monthly performance is another important mechanism for comparing models' responses due to different rain gauge distributions. Figure 3 illustrates the monthly average runoff and precipitation in the Jinjiang River Basin. Clearly, the wet season in 2012 ranges from March to July, resulting in a flood season with average flow above 200 m 3 /s. However, dry seasons are not particularly dry; most months, except for August and October, have average precipitation over 100 mm per month.

Monthly Analysis for 2012
Monthly performance is another important mechanism for comparing models' responses due to different rain gauge distributions. Figure 3 illustrates the monthly average runoff and precipitation in the Jinjiang River Basin. Clearly, the wet season in 2012 ranges from March to July, resulting in a flood season with average flow above 200 m 3 /s. However, dry seasons are not particularly dry; most months, except for August and October, have average precipitation over 100 mm per month. Monthly comparisons for HYMOD, XAJ, and WetSpa are shown in Figure 4. As shown in Figure  4a-c, all three models tend to simulate streamflow in the flood season better, but poorly reproduce runoff in the dry season, especially in August and October. Specifically, HYMOD shows better adaptability than XAJ and WetSpa, achieving acceptable NSE values in more months, especially in January and September. However, WetSpa performs worse than the other models in March, May, July, and November. In addition, HYMOD monthly performances show no distinct trends indicating an impact of rain gauge distributions. XAJ demonstrates higher usability in the flood season. However, it is more likely to be affected by the distribution of rain gauges. June, for example, has the highest NSE value (0.50) in scenario 15_SE, but the lowest NSE value (0) in 5_NW. Finally, WetSpa only achieves acceptable results in the flood season. NSE values in the dry season are no higher than 0.1, except for December, in which NSE values are about 0.3.
The overall deviations for all three models vary significantly from month to month (Figure 4df). Of the three models, HYMOD generates relatively accurate overall estimates in most months. However, it overestimates total flood volume in August and underestimates total flood volume in January, November, and December. XAJ produces results that vary between overestimates and underestimates. The WetSpa model achieves the best performance in March, but produces more than 30% overestimates from July to October, which confirms the results in Figure 4c. Monthly comparisons for HYMOD, XAJ, and WetSpa are shown in Figure 4. As shown in Figure 4a-c, all three models tend to simulate streamflow in the flood season better, but poorly reproduce runoff in the dry season, especially in August and October. Specifically, HYMOD shows better adaptability than XAJ and WetSpa, achieving acceptable NSE values in more months, especially in January and September. However, WetSpa performs worse than the other models in March, May, July, and November. In addition, HYMOD monthly performances show no distinct trends indicating an impact of rain gauge distributions. XAJ demonstrates higher usability in the flood season. However, it is more likely to be affected by the distribution of rain gauges. June, for example, has the highest NSE value (0.50) in scenario 15_SE, but the lowest NSE value (0) in 5_NW. Finally, WetSpa only achieves acceptable results in the flood season. NSE values in the dry season are no higher than 0.1, except for December, in which NSE values are about 0.3.
The overall deviations for all three models vary significantly from month to month (Figure 4d-f). Of the three models, HYMOD generates relatively accurate overall estimates in most months. However, it overestimates total flood volume in August and underestimates total flood volume in January, November, and December. XAJ produces results that vary between overestimates and underestimates. The WetSpa model achieves the best performance in March, but produces more than 30% overestimates from July to October, which confirms the results in Figure 4c.

Model Performance in June 2012
To investigate the performance of three models based on data precision for different precipitation values, daily time series of the simulated and observed hydrographs in a typical wet season month in the Jinjiang River basin (June 2012) are compared. June 2012 has two complete flood peaks, and the two peak flows are neither too high nor too low. Therefore, analyzing model performance in June 2012 is appropriate.
As presented in Figure 5, the accumulative precipitation in June 2012 ranges from 187.1 mm to 355.3 mm. The rain center is located in the central area of the basin captured by rain gauge LC. Therefore, three scenarios, 20_C as a benchmark, 10_SE absent LC, and 5_NE with LC (see Table 2), are investigated as LC presence could be a critical factor affecting runoff modeling.

Model Performance in June 2012
To investigate the performance of three models based on data precision for different precipitation values, daily time series of the simulated and observed hydrographs in a typical wet season month in the Jinjiang River basin (June 2012) are compared. June 2012 has two complete flood peaks, and the two peak flows are neither too high nor too low. Therefore, analyzing model performance in June 2012 is appropriate.
As presented in Figure 5, the accumulative precipitation in June 2012 ranges from 187.1 mm to 355.3 mm. The rain center is located in the central area of the basin captured by rain gauge LC. Therefore, three scenarios, 20_C as a benchmark, 10_SE absent LC, and 5_NE with LC (see Table 2), are investigated as LC presence could be a critical factor affecting runoff modeling.  Absent the key rain gauge LC in scenario 10_SE (Figure 6d-f), the standard deviations in measured rain data on 25 June 2012 are clearly smaller than in 20_C. The remaining rain gauges do not correctly reflect the spatial variability in actual precipitation for the entire basin. All three models underestimate the second peak flow that they precisely captured in 20_C. For 5_NE, all models overestimate the second peak flow as the rainfall data from LC are assumed to represent the areal rainfall input from most sub-watersheds (Figure 6g-i).
Meanwhile, variations in simulated second peak flow for WetSpa increase notably and the simulated runoff time series can be divided into three groups. Most hydrographs in one group tend to appreciably overestimate the second peak flow. While those in another group tend to significantly overestimate the second peak flow, e.g., XAJ. The remainder in the third group tend to miss the second peak flow.  Absent the key rain gauge LC in scenario 10_SE (Figure 6d-f), the standard deviations in measured rain data on 25 June 2012 are clearly smaller than in 20_C. The remaining rain gauges do not correctly reflect the spatial variability in actual precipitation for the entire basin. All three models underestimate the second peak flow that they precisely captured in 20_C. For 5_NE, all models overestimate the second peak flow as the rainfall data from LC are assumed to represent the areal rainfall input from most sub-watersheds (Figure 6g-i).
Meanwhile, variations in simulated second peak flow for WetSpa increase notably and the simulated runoff time series can be divided into three groups. Most hydrographs in one group tend to appreciably overestimate the second peak flow. While those in another group tend to significantly overestimate the second peak flow, e.g., XAJ. The remainder in the third group tend to miss the second peak flow.

Discussion
The sensitivity of HYMOD, XAJ, and WetSpa models is compared to characterize the spatial distribution of rainfall in a medium scale semi-humid watershed in southeast China. The NSE, CC, and BIAS values for 16 rain gauge distribution scenarios are used to assess the performance of hydrological models with varying rain gauge deployment locations. The HYMOD and XAJ models can be regarded as representative conceptual models with different complexity, while the WetSpa model is considered a typical process-based distributed model.

Influence of Rain Gauge Distribution
For all three models, runoff varied slightly with a decreasing number of rain gauges (from 20 to 5). This result differs from the previous studies of Andreassian et al. [27] and Xu et al. [23], which showed that model efficiency declines with loss of rainfall spatial knowledge. However, it agrees with Chaplot et al. [26], who found that model performance remains stable despite decreasing gauge concentration. One explanation is that former studies used lumped hydrological models that applied average surface rainfall as input while the SWAT used in Chaplot et al. [26] is a distributed model that takes advantage of rainfall data interpolated into each sub-catchment. In this study, all three models are applied as semi-distributed models, and thus rainfall spatial variety is captured with a limited number of rain gauges. While all models are able to cope with a restricted number of rain gauges, their distribution is more likely to result in uncertainty in simulated streamflow. These results suggest that appropriate rain gauge configuration is of higher importance than the number or density of rain gauges in a certain catchment.

Comparing Different Model Structures
Because all three models are regarded as semi-distributed models that operate on sub-basins with distributed rainfall data, differences in simulated runoff between them is likely derived from model complexity and the rationality of parameterization. The general model precision and uncertainty performance in scenario 20_C is consistent with multiple studies [52,[61][62][63][64].
The factor leading to larger uncertainty in model calibration for HYMOD and smaller uncertainty for WetSpa can be attributed to model structure complexity. In the model calibration process, optimization algorithms are supposed to reach optimal values. However, reaching the global optimal value requires significant computing resources and time. Therefore, most trials in the calibration period just reach local optimal values. It is possible that a higher complexity model structure could result in local optimal values closer to global optimal values according to the concept of equifinality introduced by Beven [59]. However, HYMOD has a simpler structure than XAJ and WetSpa with fewer parameters, leading to a parameter space consisting of fewer dimensions and fewer parameter combinations that can reach the 'best' performance, i.e., the global optimal value, during calibration. Most local optimal values are worse than the global optimal values, but the corresponding parameter sets are possibly reached during calibration. Therefore, HYMOD performance varies widely for each calibration trial.
Different performances in the water balance simulations are a more complex problem. As illustrated in Figure 2h, XAJ tends to underestimate total water volume in general. Sun et al. [65] reported similar results and concluded that underestimates of free water storage lead to underestimates of total water volume. In addition, WetSpa systematically overestimates total runoff volume. This behavior may be due to a consistent overestimate of low flows and an underestimate of high flows, where the total volume of low flows is higher than that of high flows (see Figure 4c,f,g).

Detailed Discussion on Model Performance
Finally, it is important to examine the inconsistent simulation of the two floods in June 2012. As illustrated in Figure 6, it is clear that observed streamflow is sensitive to rainfall; for example, small flood peaks on 8 and 18 June were obtained after relatively small rainfall events, showing that basin storage (including soil and groundwater) is rather limited and thus surface flow can be generated directly. When verifying the simulated flow from XAJ and WetSpa, however, responses to small rainfall events are systematically underestimated. While in the two major floods, the simulated streamflow increases and reaches the flood peaks with a one-day delay compared with the observed data, but decreases back to normal at the same day, or even earlier. Similar behavior appears in other flood simulations, as presented in Table 6. These deviations are likely due to overestimating the water storage capacity of the basin. Hence, in the first flood period with rainfall concentrated in a single day, more precipitation turns into storage and generates streamflows with a longer delay, resulting in an average underestimation of −32.1%, −37.4%, and −36.3% in peak flow and −21.1%, −33.3%, and −21.3% in total volume for HYMOD, XAJ, and WetSpa, respectively (see Table 6). In the second flood period, with rainfall scattered over two days, precipitation in the first day fulfills the water storage and thus precipitation on the following day generates surface flow directly, resulting in a better reproduction of the second peak flow. The underestimation drops remarkably to −12.2%, −2.7%, and −7.0% in peak flow and −4.1%, −14.7%, and −4.9% in total volume (shown in Table 6). Such uncertainty derived from the model structure or model parameters needs to be addressed in further studies.

Conclusions
This study compared the performance and uncertainties of the HYMOD, XAJ, and WetSpa conceptual and process-based models with varying rain gauge numbers and distributions in the Jinjiang River Basin, China. Long time series, monthly, and daily performance was analyzed using several statistical indicators.
The results for all three models showed that a reduction in the number of rain gauges only resulted in worse performance when the rain gauge distribution was inhomogeneous. This observation demonstrates that appropriate rain gauge configuration is of greater importance than their deployment density in a certain catchment. Furthermore, the stable performance between different model structures indicates that rainfall spatial variability does not impact model performance due to the model mechanism.
The analyses also show that the simpler conceptual model HYMOD suffers from greater variations during multiple calibrations, while the complex process-based model WetSpa is more stable across different calibration trials. This behavior indicates that the uncertainty in the model calibration is more related to model structure than rainfall spatial variability.
Notably, this study was conducted for a middle-scale semi-humid river basin, which is suitable for rainfall runoff simulation. Future work should build on these results by extending the types of study areas encompassing a range of climates, surface areas, and environmental conditions.  1 Denotes the delay days that the simulated flood peak occurs in 30 trials minus the date of the observed flood peak; 2 refers to the mean BIAS calculated using the simulated flood peak volume versus observed flood peak volume; 3 denotes the mean BIAS calculated using the simulated total flood volume versus observed total flood volume.