Satellite-Based Evapotranspiration in Hydrological Model Calibration

Hydrological models are usually calibrated against observed streamflow (Qobs), which is not applicable for ungauged river basins. A few studies have exploited remotely sensed evapotranspiration (ETRS) for model calibration but their effectiveness on streamflow simulation remains uncertain. This paper investigates the use of ETRS in the hydrological calibration of a widely used land surface model coupled with a source–sink routing scheme and global optimization algorithm for 28 natural river basins. A baseline simulation is a setup based on the latest model developments and inputs. Sensitive parameters are determined for Qobs and ETRS-based model calibrations, respectively, through comprehensive sensitivity tests. The ETRS-based model calibration results in a mean Kling–Gupta Efficiency (KGE) value of 0.54 for streamflow simulation; 61% of the river basins have KGE > 0.5 in the validation period, which is consistent with the calibration period and provides a significant improvement over the baseline. Compared to Qobs, the ETRS calibration produces better or similar streamflow simulations in 29% of the basins, while further significant improvements are achieved when either better ET or precipitation observations are used. Furthermore, the model results show better or similar performance in 68% of the basins and outperform the baseline simulations in 90% of the river basins using model parameters from the best ETRS calibration runs. This study confirms that with reasonable precipitation input, the ETRS-based spatially distributed calibration can efficiently tune parameters for better ET and streamflow simulations. The application of ETRS for global scale hydrological model calibration promises even better streamflow accuracy as the satellite-based ETRS observations continue to improve.


Introduction
Physically-based hydrological models are an essential tool for water resources planning, drought, and flood prediction, climate change assessment, and many other precipitation-runoff related Table 1. Literature reviews on integrating ET RS in hydrological model calibration. In the "ET RS *" column, either available ET RS products or user calculated ET using a specific equation with remote sensing data input are shown. In the "Model Configuration" column, the number of parameters involved in the calibration process is also included in the bracket for each study. The study of Nijzink et al. [27] employed five different conceptual models with multiple parameters; readers are recommended to investigate the original literature if interested; we used (*) to represent this complex case in the "Model Configuration" column.

River Basin ET RS * Model Configuration Precipitation Input
Immerzeel and Droogers [16] One regulated basin, India (45,678 km 2 ) SEBAL(MODIS) Distributed SWAT model (6 to 53) Field measurements Winsemius et al. [17] One regulated basin, Zambia (15,000 km 2 ) SEBAL(MODIS) Distributed HBV model (7) Satellite estimations Zhang et al. [18] 120 basins, Australia (50 to 2000 km 2 ) P-M (MODIS) Lumped SIMHYD model (7) Field measurements Rientjes et al. [19] Seven basins, Iran (1286 to 9873 km 2 ) SEBS(MODIS) Lumped HBV model (7) Field measurements Vervoort et al. [20] Four basins, Australia (147 to 2183 km 2 ) MOD16 ET Lumped SMA model (8) Field measurements Kunnath-Poovakka et al. [21] 11 basins, Australia (62 to 1028 km 2 ) CMRSET Distributed AWRA-L model (5) Field measurements López et al. [22] One basin, Morocco (38,025 km 2 ) GLEAM ET Distributed PCR-GLOBWB model (6) Multi-Source data Tobin and Bennett [23] One basin, the United States (610 km 2 ) GLEAM ET Distributed SWAT model (16) Multi-Source data Demirel et al. [24] One basin, Denmark (2500 km 2  As a critical hydrological component, ET represents moisture lost to the atmosphere from plants and soil, and thus significantly impacts the precipitation partitioning into different water balance components, particularly for runoff estimation. Hence, hydrological model performance should benefit from better ET accuracy by calibrating model parameters against high-quality ET observations. Previous studies using the ET RS calibration method have generally not demonstrated corresponding improvements in streamflow prediction performance. The lack of progress in this area reflects the need for a systematic investigation of the effectiveness of ET RS calibration and greater attention to the uncertainties in model inputs and model structure. The uncertainties in the ET RS estimations over space and time will directly influence the efficiency of the ET RS calibration method. Error in the input data, limitation of the algorithm in representing biophysical processes, and the uncertainty of biophysical and environmental parameters will together contribute to the biases in the ET RS estimations. When precipitation uncertainty is unknown or significant for hydrological model calibration [17,22,23,27,29,35], the optimization algorithm will be forced to adjust ET to compensate for errors in the precipitation inputs, rather than reducing parametric uncertainty. The lumped or Remote Sens. 2020, 12, 428 4 of 22 oversimplified representation of hydrological processes and storages may be the main reason for poor streamflow performance [18][19][20]27]. The model structure limitation can cause serious difficulties in the attribution of the model performance to the ET RS calibration method, particularly in study areas with significant anthropogenic activities [16,17]. Furthermore, most prior studies have focused on only one small catchment, with the representativeness and transferability of the ET RS calibration results remaining unknown [23][24][25]30].
Given the significant improvement in ET RS estimation accuracy [36,37] and foreseeably higher-quality global ET products to be available in the near future, the ET RS calibration method may provide a viable approach for large scale or global hydrological modeling. For example, the need for better calibration is one of the major challenges of large scale hydrological modeling for global flood and drought prediction [38][39][40][41]. Therefore, the purpose of this study is to investigate the utility of remote sensing-based ET estimation in hydrological model calibration, emphasizing the evaluation of ET RS calibration impacts on streamflow simulation.
The following sections describe the methodology for this study, including the detailed calibration experimental design and a brief introduction of the hydrological model (Section 2); a description of the study area and datasets used (Section 3); a presentation of the major results (Section 4); followed by the conclusions of the implications of this work (Section 5).

Methodology
Previous studies involving ET RS calibration for hydrological modeling have largely adopted a lumped calibration method, whereby the modeled basin average ET is tuned toward spatially averaged ET RS observations. Relatively few studies have used the spatially distributed ET RS calibration method with all grid cells or sub-basins being treated individually [30,32,33]. The lumped ET RS calibration method relies heavily on the seasonal pattern of ET RS observations. The ET RS based spatially distributed calibration takes a further step than the lumped calibration by deriving an optimized spatially distributed model parameter set that accounts for the ET RS spatial patterns [9]. It is technically convenient to implement parallel computing for spatially distributed ET RS calibration on a per grid cell basis to reduce the computational load. For global applications and ungauged basins, this study tests the efficiency and effectiveness of the spatial ET RS calibration method based on the large-scale, semi-distributed, Variable Infiltration Capacity (VIC) model [42], coupled with a dominant river-tracing-based source-sink hydrologic routing scheme [43]. A total of 28 natural river basins (with negligible human interference) across a diversity of hydroclimate in the Pacific Northwest region are selected. The model simulations start on 1 January 1999, and end on 31 December 2013. The years 2004-2008 are used as the calibration period, while the 2009-2013 period is used for validation.

Hydrological Model
The VIC is a semi-distributed, physically-based hydrological model, and has been extensively used in studies involving water resources management, land-atmosphere interactions, and climate change [3,35,40,[44][45][46][47][48][49][50]. A key challenge in the application of the VIC model in many large-scale applications has involved the need for automated calibration. Therefore, the VIC model has been tested, improved, and optimized for many regional river basins, resulting in relatively less uncertainty in various model inputs and predictions. Wu et al. [3] developed dominant river-tracing-based streamflow and temperature model, integrated with the VIC model, and successfully produced streamflow and temperature predictions at 1/16 degree resolution for the Pacific Northwest (PNW) domain. Here, we implement a similar modeling framework as Wu et al. [3], including the same VIC model components for ET and runoff simulations, the runoff-routing model, and calibration algorithms. The VIC model balances both the water and energy fluxes at the grid cell scale at each time step. While a sub-daily time step water-and-energy balance simulation is crucial in land-atmosphere studies, satisfying surface energy budgets requires substantial computational time, but has little influence on streamflow or ET Remote Sens. 2020, 12, 428 5 of 22 simulations [51]. Therefore, we chose the VIC model running in the water balance mode at a daily time step for this study, similar to Wu et al. [3].
Based on Wu et al. [3], we set the VIC model run at 1/16-degree spatial resolution with subgrid spatial variability in vegetation and soil [48]. Vegetation parameters profoundly affect all components of the water cycle in the VIC model. The predefined subgrid vegetation parameters (e.g., land cover types, root depths) are set generally identical to those in previous studies [3,48,52], which were derived from the 1-km-resolution University of Maryland land cover database [53]. The subgrid soil parameters (e.g., bulk density, field capacity) are obtained from previous studies [3,48,52], which were derived from the 1-km-resolution Pennsylvania State University database [54]. The VIC model calibration processes for the streamflow simulation involves six parameters ( Table 2). A primary effort of calibrating the VIC model for the major river basins over the Continental United States at the 1/8 degree resolution was made by Maurer et al. [48] by matching the simulated streamflow to available observations. The resulting parameters were used in the study of Wu et al. [3] for the Columbia River Basin. Livneh et al. [52] further refined the parameters B, Dsmax, and D3 at the 1/16-degree resolution using a similar method as Troy et al. [1] from which the resulting parameter set is taken as the baseline soil information input for this study. Note: All six parameters (B, Dsmax, Ds, Ws, D2, and D3) will be used in Q obs calibration model, and four parameters (Dsmax, Ds, Ws, and D3) will be used in ET RS calibration model.

Model Calibration Experiments
A four-step workflow ( Figure 1) was developed to perform the ET RS based spatially distributed calibration in this study and evaluate the impact of the ET RS calibration on the VIC model streamflow simulations. Major uncertainties for the model streamflow simulations are contributed from the model inputs (e.g., vegetation, soil, and precipitation inputs) and the simplified model structure that generally underrepresents the complexity of human-regulated river basins. Only upstream or headwater river basins without dams are selected for this study to emphasize natural flow conditions while minimizing model uncertainty impacts from river regulation.

Updating Model Vegetation Inputs
We first evaluate and update the VIC model vegetation inputs using a satellite observation-based updated Leaf Area Index (LAI) dataset to define variations in canopy cover influencing the model ET calculations. Two widely used LAI climatology datasets are evaluated by comparing their corresponding streamflow prediction skills. The first LAI product is monthly averaged from a quarter-degree resolution monthly global LAI database [55] based on Advanced Very High-Resolution Radiometer (AVHRR) observations extending from 1981 to 1994. The AVHRR LAI record has been used for delineating vegetation cover in the VIC model in many prior studies [48,52] and is still widely used in VIC and other hydrological modeling applications [56][57][58][59][60][61][62]. The AVHRR LAI dataset is therefore referred to as the Default LAI record for this study. However, recent studies have found that temporally updated LAI inputs, rather than an outdated LAI monthly climatology, can improve VIC model performance [51,[63][64][65]. Therefore, we adopted the 8-day, 1-km resolution GLASS/MODIS LAI record [66] as an additional model LAI input in this study, which extends over the 2004 to 2013 period and is monthly averaged as the Updated LAI record. The VIC model was then configured with the studies have found that temporally updated LAI inputs, rather than an outdated LAI monthly climatology, can improve VIC model performance [51,63,64,65]. Therefore, we adopted the 8-day, 1km resolution GLASS/MODIS LAI record [66] as an additional model LAI input in this study, which extends over the 2004 to 2013 period and is monthly averaged as the Updated LAI record. The VIC model was then configured with the up-to-date vegetation information-based on the comparison of the streamflow simulations using the Default LAI and Updated LAI datasets. Figure 1. The four-step scheme used to improve VIC model performance. In Step-One, reference and baseline simulations are performed to quantitatively evaluate the impact of different LAI inputs (Default LAI and Updated LAI) on modeled streamflow. In Step-Two, parameter sensitivity analysis tests for streamflow and ET are conducted to identify the most significant parameters in the calibration process. In Step-Three, the VIC model is calibrated against Qobs and the spatial ETRS target separately. In Step-Four, streamflow predictability skill is validated using baseline soil information (BL_VAL), soil information from Qobs (Q_VAL), and the spatial ET calibration model (ET_VAL). The uncertainties of precipitation and ETRS are considered in Step-Three and Step-Four, as listed in the Scenario column.

Identifying Sensitive Parameters for Model Calibration
In the second step, we identified key model parameters influencing the resulting streamflow and spatial ET simulations. Given the up-to-date vegetation information used in the model, spatially distributed soil parameters are of critical importance. The model sensitivity tests are performed both at the river basin level and on a per grid cell basis, by separately varying each parameter equally over 100 intervals throughout the predefined parameter space ( Table 2). This setting is useful for conducting a local sensitivity analysis, providing diagnostic insights on the model's behavior around a point [67].
The Nash-Sutcliffe coefficient (NSC) [68] is used to evaluate the monthly streamflow simulation. The absolute Percent Bias (PBIAS) [69] is adopted to measure the 8-day spatial ET simulation and abbreviated as APBIAS hereafter.
where and are the monthly time series of streamflow observations and simulations respectively, and is the mean streamflow observation. and are the 8-day time series for the spatial ETRS observations and model ET simulations, respectively. The sensitivity index is then defined as the ratio between the difference and the sum of the maximum and minimum NSC

Period Experiments Scenario
Step-One Step-One Step-Three Step-Three Step-Four Step-Four Step-Two Step-Two  Figure 1. The four-step scheme used to improve VIC model performance. In Step-One, reference and baseline simulations are performed to quantitatively evaluate the impact of different LAI inputs (Default LAI and Updated LAI) on modeled streamflow. In Step-Two, parameter sensitivity analysis tests for streamflow and ET are conducted to identify the most significant parameters in the calibration process. In Step-Three, the VIC model is calibrated against Q obs and the spatial ET RS target separately. In Step-Four, streamflow predictability skill is validated using baseline soil information (BL_VAL), soil information from Q obs (Q_VAL), and the spatial ET calibration model (ET_VAL). The uncertainties of precipitation and ET RS are considered in Step-Three and Step-Four, as listed in the Scenario column.

Identifying Sensitive Parameters for Model Calibration
In the second step, we identified key model parameters influencing the resulting streamflow and spatial ET simulations. Given the up-to-date vegetation information used in the model, spatially distributed soil parameters are of critical importance. The model sensitivity tests are performed both at the river basin level and on a per grid cell basis, by separately varying each parameter equally over 100 intervals throughout the predefined parameter space (Table 2). This setting is useful for conducting a local sensitivity analysis, providing diagnostic insights on the model's behavior around a point [67].
The Nash-Sutcliffe coefficient (NSC) [68] is used to evaluate the monthly streamflow simulation. The absolute Percent Bias (PBIAS) [69] is adopted to measure the 8-day spatial ET simulation and abbreviated as APBIAS hereafter.
where Q obs and Q sim are the monthly time series of streamflow observations and simulations respectively, and µ Qobs is the mean streamflow observation. ET RS and ET sim are the 8-day time series for the spatial ET RS observations and model ET simulations, respectively. The sensitivity index is then defined as the ratio between the difference and the sum of the maximum and minimum NSC (streamflow) per catchment or APBIAS (spatial ET) per grid cell among each parameter space setting (Equation (3)). A parameter is considered as more sensitive if the sensitivity index is closer to 1 than other parameters.
Sensitivity Index(stream f low or spatial ET) = Max NSC or APBIAS − Min NSC or APBIAS Max NSC or APBIAS + Min NSC or APBIAS .

Calibration Model
In the third step, the spatial ET RS observations and Q obs calibration model are executed respectively using the Shuffled Complex Evolution (SCE-UA) global optimization method [70], which follows the modeling framework developed by Wu et al. [3]. The SCE-UA parameters (e.g., the number of complexes, the number of points in each complex) are recommended by [70] and 500 minimum iterations are set unless the parameters or objective function reach convergence prior to this threshold, within a ±1% level of further performance improvement. The objective functions of the Q obs and spatial ET RS calibration schemes are NSC (streamflow) and APBIAS (spatial ET), respectively.
The uncertainties in the ET RS and precipitation inputs were investigated; whereby, a synthetic ET calculation was derived from the simulation using the Q obs calibration optimized model parameters. A similar approach was used by [21] with favorable results. The synthetic ET results were then used for the spatially distributed model calibration, providing a reference to evaluate uncertainty contributed by the ET RS observations. A high-quality precipitation dataset is used in this study [52], and a simple correction approach is applied to further refine the precipitation product according to [35]. Specifically, annual reference precipitation at the basin scale is defined as the sum of the observed streamflow and ET RS for each hydrological year, assuming negligible annual soil water storage change for a water-budget closed river basin between hydrological years. The raw daily precipitation used in this study is then bias corrected by the annual reference precipitation. Such annual bias correction for precipitation input is proven to have positive impacts on streamflow simulations [35]. The further bias-corrected precipitation is then used to drive the model simulations for the comparison, and help understand the uncertainty contributed from the precipitation inputs.

Model Validation and Evaluation
In the fourth step, the effectiveness of the ET RS calibration on the streamflow simulation is evaluated by comparing the model results against independent observations, and through inter-comparison of outputs from different model experiments. Three model validation runs were performed for evaluating the Q obs calibration model (Q_VAL); spatial ET RS calibration model (ET_VAL); the baseline simulation model with the latest community efforts optimized soil information (BL_VAL).
The Kling-Gupta Efficiency (KGE) metric [71] was used to measure the agreement between observed and simulated monthly streamflow time series. The value of KGE ranges from negative infinity (no agreement) to one (perfect agreement).
where µ Qobs and δ Qsim are the mean and standard deviation of the streamflow observations; µ Qsim and δ Qsim are the mean and standard deviation of the streamflow simulations. We defined three general performance categories for the ET_VAL results, as improved (KGE improvement for streamflow > +10%) comparable (KGE improvement for streamflow within ± 10%), or degraded (KGE improvement for streamflow < −10% ) of their counterparts, respectively.

The Test Area and Datasets
The Columbia River Basin (CRB) encompasses approximately 668,000 km 2 of the Pacific Northwest region of North America including portions of the Rocky Mountains in British Columbia, Canada. The CRB encompasses a wide range of physical and ecological environments ranging from semiarid and desert areas of the central plateaus to relatively wet forests in the Cascade Mountains [72]. The Columbia River drains the CRB and represents the fourth largest river in the continental United States. The water resources in the CRB have been extensively exploited in the past 70 years to meet Remote Sens. 2020, 12, 428 8 of 22 various needs such as hydropower production, flood control, agricultural withdrawals, instream flow requirements for fish, and recreational needs [73].
River gauges in the CRB domain maintained by the U.S. Geological Survey (USGS) are selected for this study and used for model calibration and validation. Three general criteria were used for stream gauge selection, including (1) having continuous monthly discharge records from January 2004 to December 2013; (2) being identified as "reference gauges" (least-disturbed by human activities) in the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES-II) database [74,75]; and (3) having dimensional consistency with the dominant river tracing (DRT) hydrography dataset defined at 1/16-degree spatial resolution [76]. The DRT river network upscaling algorithm, utilized 1-km-resolution HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) baseline hydrography inputs, and successfully preserves critical sub-grid scale basin attributes affecting runoff and river flows [77,78]. To meet the requirements for criteria (3) above, the absolute relative difference between drainage areas derived from the DRT datasets and reported by the USGS has to be less than 20%, while visual inspection is also used to ensure geolocation accuracy. As a result, a total of 28 sub-basins in the CRB were selected for this study with sizes ranging from 325.70 km 2 to 14,331.70 km 2 ( Figure 2). Detailed characteristics of the 28 basins can be found in Table 3.
The Columbia River Basin (CRB) encompasses approximately 668,000 km 2 of the Pacific Northwest region of North America including portions of the Rocky Mountains in British Columbia, Canada. The CRB encompasses a wide range of physical and ecological environments ranging from semiarid and desert areas of the central plateaus to relatively wet forests in the Cascade Mountains [72]. The Columbia River drains the CRB and represents the fourth largest river in the continental United States. The water resources in the CRB have been extensively exploited in the past 70 years to meet various needs such as hydropower production, flood control, agricultural withdrawals, instream flow requirements for fish, and recreational needs [73].
River gauges in the CRB domain maintained by the U.S. Geological Survey (USGS) are selected for this study and used for model calibration and validation. Three general criteria were used for stream gauge selection, including (1) having continuous monthly discharge records from January 2004 to December 2013; (2) being identified as "reference gauges" (least-disturbed by human activities) in the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES-II) database [74,75]; and (3) having dimensional consistency with the dominant river tracing (DRT) hydrography dataset defined at 1/16-degree spatial resolution [76]. The DRT river network upscaling algorithm, utilized 1-km-resolution HydroSHEDS (Hydrological data and maps based on SHuttle Elevation Derivatives at multiple Scales) baseline hydrography inputs, and successfully preserves critical subgrid scale basin attributes affecting runoff and river flows [77,78]. To meet the requirements for criteria (3) above, the absolute relative difference between drainage areas derived from the DRT datasets and reported by the USGS has to be less than 20%, while visual inspection is also used to ensure geolocation accuracy. As a result, a total of 28 sub-basins in the CRB were selected for this study with sizes ranging from 325.70 km 2 to 14,331.70 km 2 (Figure 2). Detailed characteristics of the 28 basins can be found in Table 3.   A recently updated gridded meteorological dataset for the Continental United States by Livneh et al. [52] is adopted for this study, which is an extended work of Maurer et al. [48] but has longer temporal coverage (1950 to 2013) and finer spatial resolution (1/16-degree). The quality-controlled dataset contains gridded precipitation, daily maximum and minimum temperature aggregated from point observations at National Weather Service Cooperative Observer (NWS COOP) stations, and surface winds from the National Centers for Environmental Prediction (NCEP)/ National Centers for Atmospheric Research (NCAR) reanalysis products. The gridded precipitation data are interpolated using the synergraphic mapping system (SYMAP) [79] and are scaled to match the Parameter-Elevation Regressions on Independent Slopes Model (PRISM) 1981-2010 precipitation climatology for topographic correction.
The widely used 8-day, 1-km resolution MODIS (MODerate resolution imaging spectroradiometer) MOD16 ET RS operational product [12] was used in this study. The MOD16 ET algorithm uses a modified Penman-Monteith algorithm driven by daily meteorological inputs from the NASA Global Modeling and Assimilation Office (GMAO) reanalysis, along with MODIS derived land cover, albedo, LAI, and enhanced vegetation index observations for global ET mapping and monitoring. The quality of the MOD16 ET record has been validated using tower eddy covariance-based daily ET measurements over North America [12,[80][81][82]. Like other ET RS estimations, validation results of MOD16 products showed positive and negative bias in different places. Several studies have also used the MOD16 ET record for hydrological model calibration [20,27,30,31] and data assimilation [83]. For this investigation, the MOD16 ET record was geo-referenced and spatially aggregated to the 1/16-degree resolution PNW model grid.

The Baseline Simulation
The hydrological model produced generally improved streamflow simulations in 68% (19 out of 28) of the test river basins when using the Updated LAI (mean KGE: 0.52) rather than the Default LAI (mean KGE: 0.45) as model inputs (Figure 3a). The differences in numerical streamflow modeling between the Default LAI and Updated LAI derived results can be large for some river basins. For river basin No.28, for example, Evergreen Needleleaf forest presents the highest LAI (Figure 3b,c), followed by Mixed Cover and Woodland vegetation types from both Default LAI and Updated LAI records. Significant discrepancies exist between the two LAI products, especially for the Evergreen Needleleaf class. The Default LAI fails to capture monthly variations in Evergreen Needleleaf forest, but generally captures similar monthly variations as the Updated LAI record for other land cover types, but with significant value differences, particularly for Mixed Cover. The LAI values are smaller and more spatially variable in the Updated LAI record than in the Default LAI for all the three major land cover classes, which is consistent with previous studies [84,85]. Lower LAI reduces transpiration from vegetation and evaporative losses from canopy interception, which promotes wetter soil conditions and greater surface and sub-surface runoff in the model [85].
aggregated to the 1/16-degree resolution PNW model grid.

The Baseline Simulation
The hydrological model produced generally improved streamflow simulations in 68% (19 out of 28) of the test river basins when using the Updated LAI (mean KGE: 0.52) rather than the Default LAI (mean KGE: 0.45) as model inputs (Figure 3a). The differences in numerical streamflow modeling between the Default LAI and Updated LAI derived results can be large for some river basins. For river basin No.28, for example, Evergreen Needleleaf forest presents the highest LAI (Figure 3b,c), followed by Mixed Cover and Woodland vegetation types from both Default LAI and Updated LAI records. Significant discrepancies exist between the two LAI products, especially for the Evergreen Needleleaf class. The Default LAI fails to capture monthly variations in Evergreen Needleleaf forest, but generally captures similar monthly variations as the Updated LAI record for other land cover types, but with significant value differences, particularly for Mixed Cover. The LAI values are smaller and more spatially variable in the Updated LAI record than in the Default LAI for all the three major land cover classes, which is consistent with previous studies [84,85]. Lower LAI reduces transpiration from vegetation and evaporative losses from canopy interception, which promotes wetter soil conditions and greater surface and sub-surface runoff in the model [85]. Using a more temporally updated LAI record and more heterogeneous LAI data results in a marked improvement in the model streamflow predictions, as the seasonal LAI record is significant in the role of partitioning rainfall to runoff and ET in the VIC module. All of the following sections are based on model simulations using the Updated LAI record as a key input, allowing this study to build on recent advances in landscape parameterization.

Sensitivity Analysis of Soil Parameters
Model sensitivity tests were performed to identify the important parameters affecting both streamflow and the spatial ET simulations. Figure 4a suggests that all six of the soil parameters examined ( Table 2) are essential for accurate streamflow simulations across all 28 study catchments, though the relative importance of the parameters varies across basins. However, the Ws and Ds parameters show the most significant individual influence on ET at the river basin scale (Figure 4b). The Dsmax and D3 parameters tend to have lower model sensitivity but are more effective than parameter D2. The model ET seems to show little sensitivity to parameter B in most of the test river basins.
points are at the top of the Ws and Ds plots, indicating that the model ET calculations are highly sensitive to both parameters for all land cover classes (Figure 4c). The model is also sensitive to the other four parameters, but the sensitivity is more geolocation dependent. However, there were no significant differences in the model ET sensitivities to the model parameters among the four main landcover classes. However, the D2 parameter showed relatively less impact on the model ET calculation in woodlands due to the relatively thinner soil layer represented by this land cover class. Therefore, we allowed all six model parameters to vary in the Qobs calibration consistent with the majority of previous studies, while assuming B and D2 have no substantial impact on spatial ETRS calibration. Further research is warranted to investigate the validity of this assumption for other areas.  Plots showing the sensitivity of the six model parameters at each individual grid cell in the four landscape groups are presented in Figure 4c; these results show the relative model sensitivity to the six parameters, which is consistent with the river basin scale results (Figure 4b). Most of the grid cell points are at the top of the Ws and Ds plots, indicating that the model ET calculations are highly sensitive to both parameters for all land cover classes (Figure 4c). The model is also sensitive to the other four parameters, but the sensitivity is more geolocation dependent. However, there were no significant differences in the model ET sensitivities to the model parameters among the four main landcover classes. However, the D2 parameter showed relatively less impact on the model ET calculation in woodlands due to the relatively thinner soil layer represented by this land cover class. Therefore, we allowed all six model parameters to vary in the Q obs calibration consistent with the majority of previous studies, while assuming B and D2 have no substantial impact on spatial ET RS calibration. Further research is warranted to investigate the validity of this assumption for other areas.

Uncertainty in the Calibrated Model Parameters
The model parameter ensembles paired with the NSC (streamflow) values from Q obs calibration are shown in Figure 5, six significant basins are selected to show the uncertainties in the calibrated model parameters against Q obs . For each of the six selected river basins, grey points denote all 500 evolutions in the Q obs calibration runtime; the 25 black points in each plot refer to equifinal simulations with NSC values very close to each other (within 1% difference), while the red point denotes the best streamflow prediction. The distribution of black points in the figure plots shows that Q obs calibration can yield very good streamflow predictions in most cases, with generally narrow ranges in the model parameter values. However, a lack of convergence in the calibrated parameters can be found in some river basins, indicating that "seemingly right" answers could be produced for the wrong reasons [30].
For example, at river basin No.16, all six model parameters except Ds, show large variations, indicating the low quality of model inputs (precipitation in particular), which may give the optimization algorithm more freedom to fit the objective function. model parameters against Qobs. For each of the six selected river basins, grey points denote all 500 evolutions in the Qobs calibration runtime; the 25 black points in each plot refer to equifinal simulations with NSC values very close to each other (within 1% difference), while the red point denotes the best streamflow prediction. The distribution of black points in the figure plots shows that Qobs calibration can yield very good streamflow predictions in most cases, with generally narrow ranges in the model parameter values. However, a lack of convergence in the calibrated parameters can be found in some river basins, indicating that "seemingly right" answers could be produced for the wrong reasons [30]. For example, at river basin No.16, all six model parameters except Ds, show large variations, indicating the low quality of model inputs (precipitation in particular), which may give the optimization algorithm more freedom to fit the objective function. Keeping water balance closure can help ensure that a model with parameters constrained by reliable ETRS observations yields favorable runoff and streamflow predictions [17]. As shown in Figure 6, a number of river basins (e.g., No. 2,5,7,9,10,15,16) show that improved ET prediction propagates to more accurate streamflow simulations. The model parameter set resulting in the best streamflow prediction (green star) is not (even close to) the one with the best ET simulation in many cases (except No. 15). This reflects both the limitations in the ETRS observations used for model ET calibration and the precipitation inputs. Effective spatial calibration of the parameter fields will be constrained as long as the errors in ETRS and the precipitation inputs remain larger than the variability in the calibrating parameters [32]. However, in all river basins, the model streamflow performance (i.e., KGE metrics) increases or decreases consistently with the gradient in ET bias. The calibration scheme steadily modulates the interplay of the errors in precipitation and ETRS, resulting in progressive changes of KGE values for streamflow simulation. For instance, if precipitation is biased high and ETRS biased low, a larger positive bias in ET would lead to a high KGE value for the streamflow simulation, while a successful ET calibration based on a relatively low biased ETRS would not necessarily produce a favorable streamflow simulation KGE result. Keeping water balance closure can help ensure that a model with parameters constrained by reliable ET RS observations yields favorable runoff and streamflow predictions [17]. As shown in Figure 6, a number of river basins (e.g., No. 2,5,7,9,10,15,16) show that improved ET prediction propagates to more accurate streamflow simulations. The model parameter set resulting in the best streamflow prediction (green star) is not (even close to) the one with the best ET simulation in many cases (except No. 15). This reflects both the limitations in the ET RS observations used for model ET calibration and the precipitation inputs. Effective spatial calibration of the parameter fields will be constrained as long as the errors in ET RS and the precipitation inputs remain larger than the variability in the calibrating parameters [32]. However, in all river basins, the model streamflow performance (i.e., KGE metrics) increases or decreases consistently with the gradient in ET bias. The calibration scheme steadily modulates the interplay of the errors in precipitation and ET RS , resulting in progressive changes of KGE values for streamflow simulation. For instance, if precipitation is biased high and ET RS biased low, a larger positive bias in ET would lead to a high KGE value for the streamflow simulation, while a successful ET calibration based on a relatively low biased ET RS would not necessarily produce a favorable streamflow simulation KGE result.
As shown in Figure 6, when the ET and streamflow simulations simultaneously achieve good calibration performance (i.e., KGE for streamflow greater than 0.7), APBIAS for ET is less than 10% and the KGE value is very close to the best run (green star) result, which would also indicate good quality ET RS and precipitation inputs (e.g., the No. 15 river basin). On the other hand, if both the best run (green star) and corresponding streamflow simulation using the optimized model ET have a negative KGE value, the precipitation input may be considered as very low quality (see No.9 and 13). All of the rest (90%) of the test river basins in Figure 6 achieve positive KGE values with a mean of 0.50 for the model streamflow simulations after the ET calibration, indicating that the ET RS observations and precipitation inputs are of reasonable quality for most of the study basins. However, the distance between the actual (red point) and best run (green star) KGE results indicates potential room for further model improvements using better quality precipitation and/or ET RS records.
calibration performance (i.e., KGE for streamflow greater than 0.7), APBIAS for ET is less than 10% and the KGE value is very close to the best run (green star) result, which would also indicate good quality ETRS and precipitation inputs (e.g., the No. 15 river basin). On the other hand, if both the best run (green star) and corresponding streamflow simulation using the optimized model ET have a negative KGE value, the precipitation input may be considered as very low quality (see No.9 and 13). All of the rest (90%) of the test river basins in Figure 6 achieve positive KGE values with a mean of 0.50 for the model streamflow simulations after the ET calibration, indicating that the ETRS observations and precipitation inputs are of reasonable quality for most of the study basins. However, the distance between the actual (red point) and best run (green star) KGE results indicates potential room for further model improvements using better quality precipitation and/or ETRS records. Figure 6. The uncertainty associated with the parameters in the spatial ETRS calibration scheme and its propagation to streamflow simulations (in Step-Three). For each basin, the APBIAS (spatial ET)* (X-axis) is defined as the sum of the APBIAS (spatial ET) of all grid cells in each iteration, then sorted offline in ascending order (from smallest to largest bias between simulated ET and ETRS). The grey, black, and red points share similar references as Figure 5 but show spatial ET simulation performance. The green stars denote the locations of the best streamflow simulation performance.
It is worth noting that incorporating the spatial distribution of the estimated parameters in the model calibration is a distinct advantage of the spatial ETRS calibration, which is considered a major challenge for large-domain hydrological modeling [9]. As shown in Figure 7, taking river basin No.28 as an example, the spatial ETRS optimized parameters show large spatial variability, while the Dsmax and Ds spatial distributions are similar and generally congruent with the pattern of soil texture (bulk density) across the catchment. Figure 6. The uncertainty associated with the parameters in the spatial ET RS calibration scheme and its propagation to streamflow simulations (in Step-Three). For each basin, the APBIAS (spatial ET)* (X-axis) is defined as the sum of the APBIAS (spatial ET) of all grid cells in each iteration, then sorted offline in ascending order (from smallest to largest bias between simulated ET and ET RS ). The grey, black, and red points share similar references as Figure 5 but show spatial ET simulation performance. The green stars denote the locations of the best streamflow simulation performance.
It is worth noting that incorporating the spatial distribution of the estimated parameters in the model calibration is a distinct advantage of the spatial ET RS calibration, which is considered a major challenge for large-domain hydrological modeling [9]. As shown in Figure 7, taking river basin No.28 as an example, the spatial ET RS optimized parameters show large spatial variability, while the Dsmax and Ds spatial distributions are similar and generally congruent with the pattern of soil texture (bulk density) across the catchment.

Validation and Evaluation
As shown in Table 4, in the "Raw Precipitation" setup, KGE metrics from the validation simulations derived using model parameters from the ETRS calibration range from -0.11 to 0.95 (mean: 0.54). Approximately 61% of the study basins have KGE > 0.5, which is consistent with the Figure 7. The distributed soil parameters obtained from the spatial ET RS calibration model for selected Basin No.28 (in Step-Three) are depicted in (a-e) The "bulk density" from the defined soil information is used to represent soil texture.

Validation and Evaluation
As shown in Table 4, in the "Raw Precipitation" setup, KGE metrics from the validation simulations derived using model parameters from the ET RS calibration range from −0.11 to 0.95 (mean: 0.54). Approximately 61% of the study basins have KGE > 0.5, which is consistent with the calibration period results. Compared to the baseline validation, BL_VAL, in which the KGE values are between −0.12 and 0.85 (mean: 0.53). Table 4. The KGE metrics of streamflow predictability in BL_VAL, ET_VAL, and Q_VAL setups over the 28 study basins (involved in Step-Four). The basins No. 16,20,22,25,26, and 28 are illustrated in Figure 9 and highlighted below (in bold). The "MOD16 ET*" represents the optimum parameter set retrieved offline when observed and simulated streamflow fit best (green star in Figure 6). The ET_VAL results show significant improvements in streamflow simulation, with 50% and 25% of the river basins showing improved and comparable performance, respectively. When compared to the Q_VAL, the ET_VAL results show improved and comparable performance for 18% and 11% of river basins, respectively, while 71% of the river basins show degraded performance (Figure 8a).

Raw Precipitation
Using synthetic ET data for the model calibration causes significant improvements in model performance, with improved or comparable performance to BL_VAL for 64% and 11% of the study basins, respectively (Figure 8b). Thus, relatively closer model performance to Q_VAL is achieved using synthetic ET for the calibration, with a total of 43% of the river basins showing improved (14%) or comparable (29%) performance to Q_VAL. Using simply corrected precipitation in the ET RS calibration and validation, the streamflow simulations are also improved significantly over the ET_VAL results, with 39% of the study basins showing improved (21%) or comparable (18%) KGE metrics against Q_VAL (Figure 8c). Surprisingly, the simply corrected precipitation causes further marked improvement in the model performance, with 96% of the total river basins showing improved (71%) and comparable (25%) performance than the BL_VAL results. Further improvements in the model performance for both ET and streamflow simulations are also expected with better quality ET or precipitation inputs as shown in Figure 8b Raw precipitation input is used to drive the calibration and validation schemes, and the ET_VAL uses the parameter set obtained when MOD16 ET and simulated ET fit to the global optimum (red points in Figure 6). (b) same as (a) but MOD16 ET is replaced by synthetic ET. (c) same as (a) but adjusted precipitation is used as the meteorological forcings of the model. (d) same as (a) but the optimum parameter set is retrieved offline when observed and simulated streamflow fit best (green star in Figure 6).
As shown in Figure 9, six river basins were selected to illustrate the ET_VAL simulated monthly hydrograph comparisons with Q_VAL, including two improved (No. 20 Figure 6). (b) same as (a) but MOD16 ET is replaced by synthetic ET. (c) same as (a) but adjusted precipitation is used as the meteorological forcings of the model. (d) same as (a) but the optimum parameter set is retrieved offline when observed and simulated streamflow fit best (green star in Figure 6). Furthermore, almost all the best model performances (in terms of streamflow) are achieved by using the parameter sets from the best KGE runs (i.e., green stars in Figure 6) using the ET RS calibration without involving streamflow observations. The best model runs represent 89% and 4% of the total study basins with improved and comparable performance to BL_VAL, respectively, and only 7% of the river basins with degraded metrics (Figure 8d). Interestingly, the ET_VAL results appear to be even better than the Q_VAL results, with 29% and 39% of the study basins showing improved and comparable KGE values, respectively. These results promote additional questions regarding how the best model runs can perform better than Q_VAL given the same maximum time of optimizing evolutions for the calibrations, and why the ET RS calibration approach can achieve very good streamflow simulations more efficiently than the Q obs calibration. Such improvements are likely due to the spatial distribution of ET RS , which may help the calibration scheme efficiently tune the model ET calculations at each grid cell toward optimal magnitudes, thereby improving the spatial distribution of runoff relative to the Q obs based model calibration. Understanding the internal processing structure to identify the best model run and optimal outputs under a diversity of climate and landscape conditions remains challenging and is warranted in future studies.
As shown in Figure 9, six river basins were selected to illustrate the ET_VAL simulated monthly hydrograph comparisons with Q_VAL, including two improved (No.20 and No.22), two comparable (No. 16 and No.26), and two degraded (No. 25 and No.28) basins from the adjusted precipitation setup (see Figure 8c). In the CRB domain, the snow melts in spring and accumulates in winter; thus, seasonal variations of modeled hydrograph are all well captured without consideration of parameter optimization method. For Basin No. 20 Figure 8c where adjusted precipitation setup is used). The X-axis is the monthly time series from 2009 to 2013; the Y-axis is the monthly observed and simulated streamflow.

Conclusions
In this study, we hypothesized that hydrological models calibrated against improved ETRS observations would lead to better streamflow predictions. A modeling framework developed by Wu, et al. [3], including the widely used Variable Infiltration Capacity (VIC) model [42], the dominant river-tracing-based source-sink hydrologic routing model [43] and the SCE-UA global optimization algorithm [70] was adopted for model calibration, validation, and evaluation for 28 natural river basins in the Pacific Northwest region of the continental United States. The widely used MODIS MOD16 operational ET product [12] provided a spatially and temporally dynamic ET observational record (ETRS) for this study. A four-step workflow was designed to evaluate the impacts of the ETRS based spatially distributed model calibration for the streamflow simulations, and to clarify how the model parameters determine reasonable ET estimates coherent with other water budget components including precipitation, soil moisture, and runoff.
The baseline model simulations for this study used the latest VIC model inputs including precipitation, vegetation, and soil datasets. The resulting model simulations showed significant improvement in ET and streamflow simulations using a recently updated Leaf Area Index (LAI) climatology [66] as a critical input. Comprehensive model sensitivity tests were performed indicating a set of six critical model parameters for calibrating against Qobs, and four parameters for calibrating against ETRS ( Table 2). Qobs based lumped calibration and ETRS based spatially distributed calibration were designed for this study, while uncertainties of precipitation and ET were considered to improve  Figure 8c where adjusted precipitation setup is used). The X-axis is the monthly time series from 2009 to 2013; the Y-axis is the monthly observed and simulated streamflow.

Conclusions
In this study, we hypothesized that hydrological models calibrated against improved ET RS observations would lead to better streamflow predictions. A modeling framework developed by Wu et al. [3], including the widely used Variable Infiltration Capacity (VIC) model [42], the dominant river-tracing-based source-sink hydrologic routing model [43] and the SCE-UA global optimization algorithm [70] was adopted for model calibration, validation, and evaluation for 28 natural river basins in the Pacific Northwest region of the continental United States. The widely used MODIS MOD16 operational ET product [12] provided a spatially and temporally dynamic ET observational record (ET RS ) for this study. A four-step workflow was designed to evaluate the impacts of the ET RS based spatially distributed model calibration for the streamflow simulations, and to clarify how the model parameters determine reasonable ET estimates coherent with other water budget components including precipitation, soil moisture, and runoff.
The baseline model simulations for this study used the latest VIC model inputs including precipitation, vegetation, and soil datasets. The resulting model simulations showed significant improvement in ET and streamflow simulations using a recently updated Leaf Area Index (LAI) climatology [66] as a critical input. Comprehensive model sensitivity tests were performed indicating a set of six critical model parameters for calibrating against Q obs , and four parameters for calibrating against ET RS (Table 2). Q obs based lumped calibration and ET RS based spatially distributed calibration were designed for this study, while uncertainties of precipitation and ET were considered to improve model performances (Figure 1). The validation results show that the model using the optimized parameters from the ET RS calibration results in a mean KGE value of 0.54, while 61% of the 28 natural river basins tested had KGE values exceeding 0.5. Importantly, the ET RS calibration scheme not only resulted in significant improvements over the baseline model, with 75% of the basins showing improved or comparable KGE metrics, but also resulted in consistent performance between model calibration and validation periods. Our results also indicate that either improved ET observations (indicated from a synthetic ET calibration experiment) or better precipitation inputs (through further bias correction) can provide further improvements to the effectiveness of the ET based calibration for better streamflow simulation. Compared to the Q obs calibration-based results, the synthetic ET based calibration produces better or comparable performance in 43% of the study basins, while the ET RS calibration using further annual bias-corrected precipitation produced better or comparable performance in 39% of the basins.
Interestingly, using the parameter sets from the best model calibration runs also produced the best overall model performance, with 93% and 68% of the study basins showing improved or close performance compared to the baseline simulations and the Q obs calibration-based results, respectively. Given reasonably accurate precipitation and ET RS records, the spatially distributed calibration scheme can be efficient in parameter optimization and effective in tuning the model for accurate streamflow calculations. However, if the precipitation inputs have a significant bias, the model calibration may still provide accurate streamflow simulations, but at the expense of degraded ET performance as the model ET parameters may be adversely adjusted to compensate for the bias in precipitation.
Our validation and evaluation results show that the ET RS calibration can efficiently tune the relevant model parameters for better ET and streamflow simulations within their physically meaningful ranges and with reasonable spatial distribution (e.g., consistent with soil texture patterns), which is likely due to the positive contribution from the spatial distribution information of ET RS . In gauged river basins with reliable Q obs data, the utility of ET RS in hydrological model calibration can further improve model performance by providing additional observational information to constrain the calibration. Moreover, our results demonstrate efficient and effective model calibrations using ET observations from global satellite records. These results are expected to be particularly beneficial for global-scale applications and natural basins lacking river gauge networks. The relative effectiveness of the model calibrations is expected to improve further as the quality of the ET RS observations continue to improve. A similar model calibration framework may apply to other satellite-based hydrological observations including soil moisture, surface inundation, snow cover, and river discharge. Further studies are needed to test the value of these observations for enabling further model calibration and performance enhancements.