Improving the Applicability of Hydrologic Models for Food–Energy–Water Nexus Studies Using Remote Sensing Data

: Food, energy, and water (FEW) nexus studies require reliable estimates of water availability, use, and demand. In this regard, spatially distributed hydrologic models are widely used to estimate not only streamﬂow (SF) but also di ﬀ erent components of the water balance such as evapotranspiration (ET), soil moisture (SM), and groundwater. For such studies, the traditional calibration approach of using SF observations is inadequate. To address this, we use state-of-the-art global remote sensing-based estimates of ET and SM with a multivariate calibration methodology to improve the applicability of a widely used spatially distributed hydrologic model (Noah-MP) for FEW nexus studies. Speciﬁcally, we conduct univariate and multivariate calibration experiments in the Mississippi river basin with ET, SM, and SF to understand the trade-o ﬀ s in accurately simulating ET, SM, and SF simultaneously. Results from univariate calibration with just SF reveal that increased accuracy in SF at the cost of degrading the spatio-temporal accuracy of ET and SM, which is essential for FEW nexus studies. We show that multivariate calibration helps preserve the accuracy of all the components involved in calibration. The study emphasizes the importance of multiple sources of information, especially from satellite remote sensing, for improving FEW nexus studies.


Introduction
Understanding and predicting the linkages in the food, energy, and water (FEW) nexus requires reliable hydrologic modeling tools [1]. Specifically, spatially distributed hydrologic models which simulate different water balance components are widely used [2][3][4]. In addition to streamflow (SF), evapotranspiration (ET) and soil moisture (SM) are the most important hydrologic variables required for accurately characterizing the FEW nexus. This is especially true for studies that deal with the connections between water availability, food production, and electricity usage. Another important hydrologic modeling consideration for FEW nexus studies is the spatial scale at which different water balance components are simulated. In contrast to a typical hydrologic study which requires estimates of the hydrologic cycle at the catchment or watershed scale, FEW nexus studies may need high resolution estimates at the grid-scale. For example, a study which analyzed the interactions between crops, energy, and water availability in the Central Valley in California needed spatially distributed groundwater changes and ET estimates to determine the water footprint for different crops across the study region [5]. Therefore, the establishment of an accurate hydrologic model for multiple water balance components is imperative for reliable FEW nexus studies.
A widely used method for improving the performance of distributed hydrologic models is calibration using observations of a single output variable (either SF or ET or SM). However, recent studies have shown that calibrating a hydrologic model with only one variable adversely impacts the accuracy of other variables. A calibration study [6] showed that calibrating a distributed hydrologic model with only SF reduces the accuracy of both ET and SM. Another study [7] showed that calibrating a hydrologic model with only SM negatively affects the accuracy of the corresponding SF simulation (compared to SF-calibrated model results). In a similar study which used remotely sensed ET instead of SM for calibrating a hydrologic model, streamflow simulation improved compared to the base model but not compared to a SF-calibrated model [8]. In addition, it is seen that using land surface temperature (a proxy for ET) combined with SF for calibration reduces the error in ET (by 8%) but results in an increase in SF error (by 6%) [9]. The findings of another study [10], which considered the effect of calibrating a large-scale hydrologic model with both ET and SM estimates, is in line with the aforementioned studies.
All these studies point towards the existence of a trade-off relationship among the simulated water balance components wherein accurate simulation of one component, through model calibration, comes at the cost of degrading the representation of other components. This issue with single objective calibration can be explained by the fact that it is impossible to locate a single parameter set that can enable the hydrologic model to simulate all the fluxes accurately [11]. In other words, multiple model parameter sets are seen to be 'behavioral' in terms of simulating hydrologic fluxes, leading to the problem of equifinality [12]. In this regard, a number of distinct, yet complementary, solutions have been proposed in the form of generalized likelihood uncertainty (GLUE) [13] and multi-objective calibration [14]. The study [14] proposed a theoretical framework for dealing with multiple error metrics and multiple output fluxes at multiple sites in which objective functions are aggregated based on various criteria.
In this study, we aim develop and test a Pareto optimality-based multivariate calibration approach to improve the applicability of a spatially distributed hydrologic model for FEW nexus studies which need accurate simulation of multiple water balance components (specifically ET, SM, and SF). Specifically, we combine (1) the above-discussed trade-off relationship among the model-simulated water balance components, specifically ET, SM, and SF and (2) global remote sensing-based estimates of hydrologic variable, specifically ET and SM [15]. In this study, we address the following research questions: (1) what is the impact of calibrating a spatially distributed hydrologic model with a sparse network of gauges on the spatio-temporal representation of ET and SM? Specifically, to what extent does univariate calibration of hydrologic with ET, SM, or SF estimates other hydrologic fluxes? (2) Does multivariate calibration help preserve both the accuracy of streamflow and the spatio-temporal representation of ET and SM? Specifically, can a parameter set be identified that can simulate all the hydrologic fluxes with reasonable accuracy? We also discuss the implications of multivariate calibration of hydrologic models for FEW nexus studies.

Study Area
To simulate large-scale FEW nexus studies, we choose the Mississippi river basin in the United States as the study region. The basin covers an area of approximately 3.3 million sq. km. and six USGS HUC-2 (Hydrologic Unit Code) sub-basins ( Figure 1). The average temperature over the basin is approximately 12 • C and the annual average annual rainfall is estimated as 800 mm [16]. The study [16] also classifies the Ohio and Tennessee regions as wet regions, the Missouri sub-basin as dry, and the Upper Mississippi region as a transitional region between wet and dry. As our study is a calibration experiment that aims to analyze the effects of univariate and multivariate calibration, we use historical data from 2004 for calibration and data from 2005 and 2006 for validation. There is a specific reason for selecting one year of data for calibration. The hydrologic model (Noah-MP) used in the study is Remote Sens. 2020, 12, 599 3 of 16 computationally expensive. Therefore, carrying out multiple univariate and multivariate calibration scenarios for a longer length of time would be infeasible. Hence, we choose to limit the calibration period to one year. Also, note that the study is a calibration experiment which aims to compare different scenarios and as such it does not aim to produce the best Noah-MP model setup for the study region.
Remote Sens. 2020, 12, 599 3 of 17 to limit the calibration period to one year. Also, note that the study is a calibration experiment which aims to compare different scenarios and as such it does not aim to produce the best Noah-MP model setup for the study region.

Observational Data
For the calibration of the hydrologic model, we use computed runoff for the six HUC-2 basins sourced from USGS. For calibration of the model with remotely sensed ET, we select the Global Land Evaporation Amsterdam Model (GLEAM) [17] dataset at monthly timestep, based on the findings of [18]. Finally, for soil moisture estimates we use the European Space Agency-Climate Change Institute (ESA-CCI) dataset [19]. We present a brief evaluation of the selected ET (GLEAM) and SM (ESA-CCI) datasets in Appendix A.

Hydrologic Model Setup
We select the Noah-MP (multi-parameterization) land surface model (LSM) [20] as the spatially distributed hydrologic model of choice. The Noah-MP model is driven through NASA's Land Information System (LIS) [21]. The Noah-MP model incorporates a dynamic groundwater model, an improved vegetation canopy model, and a new snow pack model into the original Noah model. [16] provide a detailed description of the model and evaluate the model over the Mississippi river basin. All the static input datasets required for setting up Noah-MP model are sourced from NASA's LIS data portal (https://portal.nccs.nasa.gov/lisdata). The important static input datasets are land cover (from USGS), soil texture (from USDA's STATSGO), and elevation (from GTOPO30). Additional variables such as albedo, greenness fraction, and temperature are sourced from NCEP reanalysis. The Noah-MP model also requires dynamic input datasets such as precipitation, air temperature, surface pressure, specific humidity, and radiation. These variables were sourced from the Global Data Assimilation System (GDAS) of the NOAA. The model is setup for the study region at a spatial resolution of 0.25 o × 0.25 o . The model is spun-up for a period of 68 years by looping through the year 2003 until the storages (groundwater and SM) reach equilibrium. Details of the Noah-MP physics options selected in this study are presented in Table 1.
The Noah-MP model contains approximately 210 soft (present in accessible user-defined tables) and hard-coded parameters (present in the model source code). In this study we focus only on the

Observational Data
For the calibration of the hydrologic model, we use computed runoff for the six HUC-2 basins sourced from USGS. For calibration of the model with remotely sensed ET, we select the Global Land Evaporation Amsterdam Model (GLEAM) [17] dataset at monthly timestep, based on the findings of [18]. Finally, for soil moisture estimates we use the European Space Agency-Climate Change Institute (ESA-CCI) dataset [19]. We present a brief evaluation of the selected ET (GLEAM) and SM (ESA-CCI) datasets in Appendix A.

Hydrologic Model Setup
We select the Noah-MP (multi-parameterization) land surface model (LSM) [20] as the spatially distributed hydrologic model of choice. The Noah-MP model is driven through NASA's Land Information System (LIS) [21]. The Noah-MP model incorporates a dynamic groundwater model, an improved vegetation canopy model, and a new snow pack model into the original Noah model. [16] provide a detailed description of the model and evaluate the model over the Mississippi river basin. All the static input datasets required for setting up Noah-MP model are sourced from NASA's LIS data portal (https://portal.nccs.nasa.gov/lisdata). The important static input datasets are land cover (from USGS), soil texture (from USDA's STATSGO), and elevation (from GTOPO30). Additional variables such as albedo, greenness fraction, and temperature are sourced from NCEP reanalysis. The Noah-MP model also requires dynamic input datasets such as precipitation, air temperature, surface pressure, specific humidity, and radiation. These variables were sourced from the Global Data Assimilation System (GDAS) of the NOAA. The model is setup for the study region at a spatial resolution of 0.25 • × 0.25 • . The model is spun-up for a period of 68 years by looping through the year 2003 until the storages (groundwater and SM) reach equilibrium. Details of the Noah-MP physics options selected in this study are presented in Table 1. The Noah-MP model contains approximately 210 soft (present in accessible user-defined tables) and hard-coded parameters (present in the model source code). In this study we focus only on the soft-coded parameters. The model output is seen to be sensitive to about two-thirds of the 71 soft-coded parameters [22]. We select five parameters which are the most sensitive [22] to the three model outputs considered in this study (ET, SM, and SF). The selected parameters are hydraulic conductivity at saturation for silt clay loamy soil (REFDK), the surface infiltration factor (REFKDT), the exponent in the Brooks-Corey equation (BB), soil porosity (MAXSMC), and hydraulic conductivity at saturation (SATDK). Of the five parameters, BB, MAXSMC, and SATDK depend on soil texture. As there are 10 soil texture classes, the total number of parameters which are calibrated in this study is 38. Details of the parameters selected for calibration is given in Table 2. The number in the brackets represents the internal Noah-MP model code for the selected physics option.

Univariate Calibration
To quantify the effect of univariate calibration of the Noah-MP hydrologic model on the accuracy of the other fluxes, we first define the calibration objective. In this study we select the Root Mean Square Error (RMSE) metric as the objective to minimize. RMSE is a widely used measure of the difference between observed and modeled fluxes which assumes that measurements errors are homoscedastic [14]. The defined objective (Equation (1) [30]. The DDS algorithm is implemented using the Optimization Software Toolkit for Research Involving Computation Heuristics (OSTRICH) toolkit.
where var = hydrologic variable considered for calibration (ET, SM, or SF), N = total number of grid cells (for ET and SM) or basins (for SF), T = total time period in months (12 months), Mod = modeled quantity (ET, SM, or SF) at time t and grid cell (for ET and SM) or basin (for SF) n, Obs = observed quantity (ET, SM, or SF) at time t and grid cell (for ET and SM) or basin (for SF) n. DDS is a heuristic global search algorithm that dynamically scales the dimension of the calibration problem based on the specified number of function evaluations. At the start, the algorithm searches for a global optimum but reduces to a local search as the iterations approach the specified budget. It is considered to be more efficient for high-dimension watershed model calibration problems, compared to other popular algorithms such as the Shuffled Complex Evolution-University of Arizona (SCE-UA) algorithm [31]. In this study, computational efficiency of the DDS algorithm is the primary reason for its selection. As the study compares different calibration cases, the focus of the calibration is on achieving a reasonable RMSE value within a limited computational budget. For the univariate calibration cases (ET, SM, and SF), the maximum function evaluations are set to 15,000. On a workstation with 16 processors, the parallel implementation of DDS, through DDS, required around 13 days to complete each case, with each Noah-MP model run taking around 20 min. The DDS algorithm is initialized with random parameter values and all parameters are considered to be uniformly distributed between the specified bounds. In other words, the calibration is carried out without imposing any prior knowledge of either the initial parameter values or the distribution of the parameters.

Multivariate Calibration
To analyze whether multivariate calibration can help preserve the spatio-temporal accuracy of all the hydrologic fluxes, we consider the following calibration cases: (1) ET-SM in which only ET and SM data from GLEAM and ESA-CCI are used for calibration, (2) ET-SF in which ET from GLEAM and SF from USGS are used, (3) SM-SF in which only soil moisture and streamflow variables are calibrated, and (4) ET-SM-SF in which all three fluxes are calibrated. In this study, multivariate calibration is cast as a Pareto optimal problem which assumes a trade-off relationship among accurate simulation of ET, SM, and SF. Similar to univariate calibration, we use the RMSE error defined in Equation (1) as the calibration objective. For example, the objective functions in the ET-SM are to minimize RMSE ET and RMSE SM error metrics. The Pareto optimal curves showing the trade-off relationship in each of the four enumerated cases are derived using a Multi-Algorithm Genetically Adaptive Multiobjective (AMALGAM) optimization algorithm [32].
Similar to the selection of DDS, the main reason for selecting AMALGAM for deriving the Pareto fronts is computational efficiency. AMALGAM combines the strengths of multiple optimization algorithms to improve the speed and efficiency of finding Pareto-optimal solutions for multi-objective optimization problems [32]. In this study's implementation, four search algorithms are run simultaneously in AMALGAM: differential evolution [33], particle swarm optimization [34], adaptive Metropolis [35], and NSGA-II [36]. In AMALGAM, offspring creation is adaptive; the best performing algorithms in the present generation are weighted more in the creation of offspring for the next generation. In the present study, the population size is set to 120 and the number of generations is set to 100 resulting in a computational 12,000 model runs. Similar to DDS, no prior knowledge of the 38 parameters except their range is assumed. The sampling strategy used in AMALGAM to derive the initial population is Latin Hypercube. On a 16-processor workstation, each of the four multi-objective calibration cases required 10 days to converge to the Pareto-optimal solution.

Methods Used for Analysis of Results
We first validate the Noah-MP hydrologic model using results from the univariate calibration cases (ET, SM, and SF) for the validation period of 2005 and 2006 ( Figure 2). We then quantify the impact of univariate calibration on the representation of water balance components other than the one considered for calibration. Finally, we test the effectiveness of multivariate calibration for overcoming the drawbacks of univariate calibration and improving the simulation of the water balance components important for FEW nexus studies considered in this study (ET, SM, and SF).
For the univariate and multivariate calibration cases, the temporal accuracy is analyzed using bar plots of the RMSE error metric of the basin-averaged monthly ET (top panel), SM (middle panel), and SF (bottom panel) for the seven calibrated scenarios (Figure 3). The RMSE values are scaled by the maximum RMSE among the seven calibration scenarios to allow comparison among the different fluxes that have different units. For the multivariate calibration scenarios (ET-SM, ET-SF, SM-SF, and ET-SM-SF) ten samples are drawn from the Pareto front and the basin-average RMSE is calculated. Spatial scaled-RMSE plots (Figure 4), variograms, and correlograms ( Figure 5) calculated for average ET and SM values at each grid cell are used to quantify the effect of calibration on the spatial accuracy of ET and SM. For the multivariate calibration cases, the point that best represents the trade-off between the objectives was selected from the Pareto front to plot the correlograms and variograms. The equations used to calculate the variograms and correlograms are presented in Appendix B.

Validation of the Noah-MP Hydrologic Model
The Noah-MP

Univariate Calibration
We first address the research question: does calibration of a distributed hydrologic model with a sparse network of streamflow gauges adversely affect ET and SM? From the bottom panel of Figure 3, it is clear that for SF, the model performs best when the calibration objective is SF (blue bar). Comparing the SF-calibrated ET with ET-calibrated ET (top panel, green) and SF-calibrated SM with SM-calibrated SM (middle panel, orange), it is also evident that this increased accuracy in SF comes at the cost of reduced accuracy in simulating ET (top panel, blue) and SM (middle panel, blue). The trade-off in SM accuracy is relatively higher compared to the reduction in ET performance. When the Noah-MP model is calibrated with only SF, the spatial representation of ET and SM is adversely affected (column three in Figure 4) when compared with ET-calibrated ET (first row, first column) and SM-calibrated SM (second row, second column). While the model seems to perform worse in the Lower Mississippi and Arkansas-White-Red sub-catchments for ET, the problem regions for SM seem to be the Ohio and Tennessee regions (both wet regions). The problems in the spatial structure of SF-calibrated ET and SM are clearer in the correlogram and variogram of ET and SM ( Figure 5). For simulation of the ET variable, the correlogram of SF-calibrated ET (blue) matches the observed correlogram (black) up to a distance of 20 grid cells but significantly deviates after that. The variogram of SF-calibrated ET (blue) shows a much higher deviation from the observed variogram (black) with semivariance values between 300 and 400

Univariate Calibration
We first address the research question: does calibration of a distributed hydrologic model with a sparse network of streamflow gauges adversely affect ET and SM? From the bottom panel of Figure 3, it is clear that for SF, the model performs best when the calibration objective is SF (blue bar). Comparing the SF-calibrated ET with ET-calibrated ET (top panel, green) and SF-calibrated SM with SM-calibrated SM (middle panel, orange), it is also evident that this increased accuracy in SF comes at the cost of reduced accuracy in simulating ET (top panel, blue) and SM (middle panel, blue). The trade-off in SM accuracy is relatively higher compared to the reduction in ET performance. When the Noah-MP model is calibrated with only SF, the spatial representation of ET and SM is adversely affected (column three in Figure 4) when compared with ET-calibrated ET (first row, first column) and SM-calibrated SM (second row, second column). While the model seems to perform worse in the Lower Mississippi and Arkansas-White-Red sub-catchments for ET, the problem regions for SM seem to be the Ohio and Tennessee regions (both wet regions). The problems in the spatial structure of SF-calibrated ET and SM are clearer in the correlogram and variogram of ET and SM ( Figure 5). For simulation of the ET variable, the correlogram of SF-calibrated ET (blue) matches the observed correlogram (black) up to a distance of 20 grid cells but significantly deviates after that. The variogram of SF-calibrated ET (blue) shows a much higher deviation from the observed variogram (black) with semivariance values between 300 and 400 compared to the observed range of 20 to 200. For SM, both the correlogram and the variogram of SF-calibrated SM (blue lines) exhibit significant deviations from the observations. Next, we extend the analysis presented above to test whether calibration of hydrologic models with just one water balance component, in general, adversely affects the simulation of other water balance components. From Figure 3, it is fairly evident that accurate representation of any one component (ET, SM, or SF) comes at the cost of degrading the accurate representation of other water balance components. For example, ET-calibrated ET (top panel, green) has the least RMSE but corresponding SM (middle panel, green) is much higher compared to SM-calibrated SM (middle panel, orange). Also the SF simulated from ET-calibrated model is worse than SF-calibrated SF. The same conclusion can be drawn for SM. This fact is reflected in the spatial RMSE plot (Figure 4) as well, where ET-calibrated ET (first row, first column) shows good performance across the Mississippi river basin but the corresponding SM (second row, first column) performs poorly compared to SM-calibrated SM (second row, second column). The variogram of ET ( Figure 5) shows that ET-calibrated ET (green lines) closely follows the observed correlogram and variograms (black lines) but the corresponding ET-calibrated SM shows significant deviation from the observed SM variogram. The correlogram of SM paints a slightly different picture, with ET-calibrated SM (green) seemingly better than SM-calibrated SM. This could be because of the nature of ESA-CCI soil moisture, which has a number of missing values due to the fact that there are a number of days in which the satellite does not pass over certain areas of the Mississippi basin, perhaps leading to erroneous spatial representation. However, we note that the spatial RMSE map of SM-calibrated SM (orange) is very close to the observed variogram (black) but this comes at the cost of reducing the accuracy of SM-calibrated ET. Barring the discrepancy in the correlogram of SM, we see that univariate calibration adversely affects the spatial and temporal accuracy of water balance not considered for calibration.
Remote Sens. 2020, 12, 599 8 of 17 compared to the observed range of 20 to 200. For SM, both the correlogram and the variogram of SF-calibrated SM (blue lines) exhibit significant deviations from the observations. Next, we extend the analysis presented above to test whether calibration of hydrologic models with just one water balance component, in general, adversely affects the simulation of other water balance components. From Figure 3, it is fairly evident that accurate representation of any one component (ET, SM, or SF) comes at the cost of degrading the accurate representation of other water balance components. For example, ET-calibrated ET (top panel, green) has the least RMSE but corresponding SM (middle panel, green) is much higher compared to SM-calibrated SM (middle panel, orange). Also the SF simulated from ET-calibrated model is worse than SF-calibrated SF. The same conclusion can be drawn for SM. This fact is reflected in the spatial RMSE plot (Figure 4) as well, where ET-calibrated ET (first row, first column) shows good performance across the Mississippi river basin but the corresponding SM (second row, first column) performs poorly compared to SM-calibrated SM (second row, second column). The variogram of ET ( Figure 5) shows that ET-calibrated ET (green lines) closely follows the observed correlogram and variograms (black lines) but the corresponding ET-calibrated SM shows significant deviation from the observed SM variogram. The correlogram of SM paints a slightly different picture, with ET-calibrated SM (green) seemingly better than SM-calibrated SM. This could be because of the nature of ESA-CCI soil moisture, which has a number of missing values due to the fact that there are a number of days in which the satellite does not pass over certain areas of the Mississippi basin, perhaps leading to erroneous spatial representation. However, we note that the spatial RMSE map of SM-calibrated SM (orange) is very close to the observed variogram (black) but this comes at the cost of reducing the accuracy of SM-calibrated ET. Barring the discrepancy in the correlogram of SM, we see that univariate calibration adversely affects the spatial and temporal accuracy of water balance not considered for calibration. RMSE values for ET, SM, and SF are scaled by the maximum RMSE across the seven calibration cases. The maximum values are provided in the header of each panel within square brackets. The multivariate calibration results are derived by taking 10 samples from different points on the Pareto fronts. We also present KGE values for each of the calibration cases above each barplot. For multivariate calibration cases, only the range of KGE values from the selected Pareto-optimal parameter sets are shown.   ET-SM, 5) ET-SF, 6) SM-SF, and 7) ET-SM-SF as calibration objectives (x-axis in all three panels). RMSE values for ET, SM, and SF are scaled by the maximum RMSE across the seven calibration cases. The maximum values are provided in the header of each panel within square brackets. The multivariate calibration results are derived by taking 10 samples from different points on the Pareto fronts. We also present KGE values for each of the calibration cases above each barplot. For multivariate calibration cases, only the range of KGE values from the selected Pareto-optimal parameter sets are shown.

Multivariate Calibration
Can multivariate calibration help overcome the drawbacks of univariate calibration? We first analyze whether including ET and/or SM along with SF in calibration helps preserve the accuracy of both SF and ET and/or SM. When only ET is considered along with SF (ET-SF calibration case), some of the ET-SF-calibrated SF RMSE values (red bars, bottom panel, Figure 3) are slightly worse than SF-calibrated ET (blue bar, bottom panel) but the corresponding ET-SF-calibrated ET values (red bard, top panel) are considerably lesser than SF-calibrated ET values (blue bar, top panel). This shows that multivariate calibration (in this case ET-SF) trades-off accuracy in simulation of SF to improve the accuracy of ET. However, as expected, the lack of consideration of SM in the calibration results in a much higher RMSE value for the corresponding ET-SF-calibrated SM (red bars, middle panel). The inclusion of ET in calibration also seems to have improved the spatial RMSE, especially in the Missouri region of other Mississippi basin (first row, fifth column, Figure 4) compared to just SF-calibrated ET (first row, third column). The improvement in the RMSE values is not reflected in either the correlogram or the variogram of ET-SF-calibrated ET (red line, top panel, Figure 5). However, note that the correlograms and variograms are calculated for just one point in the Pareto front. It is safe to assume that there may be points on the Pareto front that could perform better as far as the correlogram or variogram is concerned. When only SM is included with SF, accuracy in SF simulation is traded for better simulation of SM. This is clearly evident in Figure 3, where all RMSE values of SM-SF-calibrated SF (brown bars, bottom panel) are worse than SF-calibrated SF (blue bar, bottom panel) but the corresponding SM-SF-calibrated SM (brown bars, middle panel) simulations perform much better than SF-calibrated SM (blue bar, middle panel). As seen before, this improved performance in SM comes at the cost of a slight reduction in SF accuracy, but the accuracy in ET suffers substantially (brown bars, top panel, Figure 3). The spatial RMSE plots, correlograms, and variograms paint a similar picture. Across the Mississippi basin, SM-SF-calibrated SM (second row, sixth column, Figure 4) performs much better than just SF-calibrated SM (second row, third column) but slightly worse than the SM-calibrated SM (second row, second column).
When both ET and SM are included with SF in calibrated (ET-SM-SF case), the derived Pareto surface has points which trade-offs, within acceptable limits, the accuracy in SF for better simulation of both ET and SM. In Figure 3, most RMSE values of ET-SM-SF-calibrated SF (yellow bars, bottom panel) are worse than SF-calibrated SF (blue bar, bottom panel) but some of the corresponding RMSE values for ET and SM are much better (in particular, the sixth yellow bar in top and middle panel). This improvement is more evident in the spatial RMSE plot (Figure 4) in which both the ET-SM-SF-calibrated ET (first row, last column) and ET-SM-SF-calibrated SM (second row, last column) show improvements over SF-calibrated ET and SM (third column). In the variogram of ET ( Figure 5), ET-SM-SF-calibrated ET (yellow line) shows improvement over SF-calibrated ET (blue line) and even ET-SF-calibrated ET (red line). The SM variogram shows similar improvement. Although this analysis of results has been carried out from the perspective of SF, the conclusions drawn for SF hold true for both ET and SM. In other words, multivariate calibration can discover parameter sets which can simulate all the water balance components (in this case ET, SM, and SF) to a reasonable degree of accuracy, unlike univariate calibration.

Discussion
The inadequacy of calibrating hydrologic models with only SF for FEW nexus studies is clear from the results of our study. SF-calibration not only affects the basin-averaged values of other components such as ET or SM, but also, more importantly, results in an improper spatial representation of ET and SM. This behavior of hydrologic models, wherein the parameters are overfitted to accurately simulate only streamflow, is well-known. Several studies have shown that streamflow, being an integrative response of the entire watershed, cannot adequately inform the spatial representation of ET and SM [37,38]. Recent studies have shown that using the spatial structure of ET and SM from remote sensing data along with streamflow data in calibration can produce physically consistent estimates of all the components [9,39,40]. In this regard, our study agrees with the findings of the hydrologic modeling community, with the spatial RMSE plots (Figure 4), variograms, and correlograms ( Figure 5) showing that considering ET and SM moisture improves their spatial structure. However, in this study, we go beyond quantifying and addressing the impacts of only streamflow calibration by also focusing on the ill-effects of univariate calibration, in general, for hydrologic and FEW nexus studies. In doing so, we find that using soil moisture data for calibration has the most adverse impact on the accuracy of the other variables. This could be due to the fact that the parameters that are sensitive to soil moisture may not be sensitive to other variables. This is in contrast to the findings of [7], who found out that incorporating near-surface remote sensing-based estimates of soil moisture and streamflow in hydrologic model calibration positively impacts streamflow simulation. On the other hand, ET-calibration yields much better results for SF and ET simulation but the corresponding SM representation is poor. This is expected as the parameters which influence SF also impact ET (more than SM). This highlights the need for better, physically interpretable, parameterization of hydrologic models.
Multivariate calibration of hydrologic models is generally carried out by combining different hydrologic variables into a single objective function by using weights [9,40,41]. Such an approach makes it difficult to analyze the trade-offs in accuracy among the different variable involved. In this study, we present a Pareto optimality-based calibration strategy for multivariate calibration which helps in systematically quantifying the improvements in simultaneously simulating multiple water balance components. This is especially true for FEW nexus studies which would need more than just ET, SM, and SF variables, and the approach presented here can help in systematically introducing new datasets such as total water storage change and groundwater information, among others. The advantage of such an approach is apparent in Figure 3, where the progressive introduction of variables (for example, SF, ET-SF, and ET-SM-SF) changes the RMSE of different variables of interest. These changes can be systematically analyzed for a particular hydrologic model (in this case, Noah-MP) and relevant calibration strategies for the specific FEW nexus study can be devised. For the Noah-MP model used in this study, the progressive inclusion of different variables lead to the following calibration strategies for different FEW nexus studies: • ET: If the nexus study only requires ET as the primary input (for example, calculation of water demand), then calibration of a hydrologic model using only observed ET gives the best results in terms of spatial and temporal accuracy. However, increased accuracy in ET comes at the cost of degrading SM and SF estimates. In the absence of SF estimates, calibration with observed ET offers the best alternative for reliably simulating SF [42,43]. ET-SM-SF: Incorporating all the water balance components (ET, SM, SF) for calibration provides the best compromise solution to preserve the accuracies in simulating each of the three components.
Finally we discuss whether using remotely sensed ET and SM for calibration lead to reliable streamflow simulations. This is important for FEW nexus studies in data-scarce regions where in situ measurements of SF are not available. If better SF simulation is the sole objective, combining ET and SM for calibration (ET-SM-calibrated case) produces slightly better results than just considering ET (compare purple and green bars in bottom panel of Figure 3). However, if the intention of the model is to preserve the accuracy of other variables (ET and SM), as is the case in FEW nexus studies, then combining both ET and SM in a multivariate calibration setup produces better results (compare purple, green, and orange bars in top and bottom panel). By comparing ET-calibrated SF (green bar, bottom panel) and SM-calibrated SF (orange bar, bottom panel), it is clear that ET can inform SF simulation better than SM. In most large scale FEW nexus studies it is important to simulate the spatial distribution of ET and SM with reasonable accuracy. For this purpose, combining both ET and SM in calibration produces much better spatial structure rather than univariate calibration (compare 4th column with first and second columns in Figure 4, also purple lines with black lines in Figure 5).

Conclusions
In this study, we presented the results of a calibration experiment in the Mississippi river basin aimed at understanding the applicability of large scale spatially distributed hydrologic models for FEW nexus studies. The primary objective of the study was to quantify the impact of using estimates of a single water balance component on the other water balance components. For this, we analyzed the results from seven different univariate (ET, SM, and SF) and multivariate calibration (ET-SM, ET-SF, SM-SF, and ET-SM-SF) scenarios. We have shown that the univariate calibration does indeed affect the simulation of water balance components that are not included in calibration, and as such is not suitable for FEW nexus studies. In addition, we have shown that a trade-off relationship exists among the simulated water balance components, which is best characterized by a Pareto front or surface. This led us to use multivariate calibration as an approach to improve the applicability of hydrologic models for FEW nexus studies.
Although the theoretical underpinnings of multivariate calibration have been well-established over the last two decades, the applicability of these methods was hindered by the lack of estimates of hydrologic variables other than streamflow. The availability of remote sensing data and powerful optimization algorithms enabled us to quantify the trade-off relationship among accurate simulation of different water balance components. Our results show that multivariate calibration approach used in this study has the potential in improving hydrologic models for FEW nexus studies. With multivariate calibration, the hydrologic models are able to (1) accurately represent the spatial structure of the hydrologic variables other than streamflow (ET and SM) and (2) preserve the accuracy with which streamflow is simulated. In addition, the study adds to an increasing body of evidence in favor of using satellite-based remote sensing data for improving hydrologic models using different approaches such as calibration and data assimilation. This is especially important within the framework of food-energy-water nexus studies.
Finally, we briefly enumerate the limitations of the study and possible future directions. The conclusions drawn above are a result of the chosen hydrologic model (Noah-MP), physics options, parameters, input data, and meteorological forcing. For a different setup the trade-offs may have different characteristics. However, the multivariate calibration approach of combining different remote sensing-based dataset can be applied for any type of hydrologic model and FEW nexus study. Future work involves using the Gravity Recovery and Climate Experiment (GRACE)-based estimates of total water storage (TWS), which could perform better than just near-surface soil moisture (top 5 cm) used in the study. Also, we have considered only a sparse network of streamflow gauges, primarily to mimic large scale FEW nexus studies which deal with data sparsity, but it is worthwhile to explore if a dense network of streamflow gauges could lead to better ET and SM simulations.