Hydrological Model Calibration with Streamflow and Remote Sensing Based Evapotranspiration Data in a Data Poor Basin

Conventional calibration methods adopted in hydrological modelling are based on streamflow data measured at certain river sections. However, streamflow measurements are usually sparse and, in such instances, remote-sensing-based products may be used as an additional dataset(s) in hydrological model calibration. This study compares two main calibration approaches: (a) single variable calibration with streamflow and evapotranspiration separately, and (b) multi-variable calibration with both variables together. Here, we used remote sensing-based evapotranspiration data from Global Land Evaporation: the Amsterdam Model (GLEAM ET), and measured streamflow at four stations to calibrate a Soil and Water Assessment Tool (SWAT) and evaluate the performances for Chindwin Basin, Myanmar. Our results showed that when one variable (either streamflow or evapotranspiration) is used for calibration, it led to good performance with respect to the calibration variable but resulted in reduced performance in the other variable. In the multi-variable calibration using both streamflow and evapotranspiration, reasonable results were obtained for both variables. For example, at the basin outlet, the best NSEs (Nash-Sutcliffe Efficiencies) of streamflow and evapotranspiration on monthly time series are, respectively, 0.98 and 0.59 in the calibration with streamflow alone, and 0.69 and 0.73 in the calibration with evapotranspiration alone. Whereas, in the multi-variable calibration, the NSEs at the basin outlet are 0.97 and 0.64 for streamflow and evapotranspiration, respectively. The results suggest that the GLEAM ET data, together with streamflow data, can be used for model calibration in the study region as the simulation results show reasonable performance for streamflow with an NSE > 0.85. Results also show that many different sets of parameter values (‘good parameter sets’) can produce results comparable to the best parameter set.


Introduction
Hydrological models are widely used to understand catchment processes, and to predict their behavior under different forcing conditions as these model predictions are useful for water resource planning and management at the river basin scale. However, these predictions are jeopardized, when limited spatial data and hydro-meteorological observations are available in a river basin [1]. Most of the river basins in the world are ungauged or poorly gauged [2]. Research interest on hydrological simulations for ungauged or poorly gauged catchments has increased recently, due to the use of distributed modelling, innovative scientific approaches used in model calibration and deriving model inputs from earth observations to assess numerous water-related problems. As an example, an initiative taken by the International Association for Hydrological Sciences (IAHS) on Prediction in Ungauged Basin (PUB) was aimed in achieving reliable prediction in hydrological practice [2].
Hydrological models are usually calibrated by iteratively adjusting the model parameters to obtain a reasonable comparison between simulated and observed hydrographs at a few locations or just at the outlet of the study catchment [32]. Streamflow measurements are point data, and calibrating with only point data at the catchment outlet does not guarantee that spatially varying water balance components such as evapotranspiration are simulated accurately. Another problem in hydrological model calibration is that more than one set of parameter values may result equally good model performance. This issue is commonly referred to as equifinality or non-uniqueness [33,34]. Fenicia et al. (2007) [35] also recognized that model calibration with one variable may risk overfitting of parameter. One way to overcome some of these calibration issues is to calibrate using multi-variables (e.g., streamflow, evapotranspiration, and snow cover), and at multi-sites (e.g., using streamflow data from various streams (or sub-catchments)) within the catchment [36][37][38][39]. With the increase in the availability of global datasets, the use of two or more variables in hydrological model calibration has received considerable attention. For example, a number of studies have used remote-sensing (RS)-based evapotranspiration (ET) [40][41][42][43][44][45][46], soil moisture [42,[47][48][49][50], snow cover and glacier mass balance [51,52], and satellite-based land surface temperature (LST) [53] for hydrological model calibration.
Among the aforementioned variables, RS-based ET can be used to constrain the hydrological modelling parameters related to water balance [1,44,45]. Immerzeel and Droogers (2008) [40] used evapotranspiration data derived from MODIS satellite images for calibration of Soil Water Assessment Tool (SWAT) in data-scarce Krishna Basin in India. Their results indicate that spatially distributed ET data at monthly temporal resolution provides a promising alternative to reduce equifinality resulted from lumping the hydrological processes together by traditional calibration with limitedly available streamflow gauge data. Rientjes et al. (2013) [43] used both streamflow and evapotranspiration to calibrate the HBV (Hydrologiska Byråns Vattenbalansavdelning) model for Karkheh Basin in Iran. Their results indicate that multivariable calibration enabled the reproduction of the catchment water balance. However, the results also suggested that streamflow data is also required to identify model simulation errors in gauged or ungauged systems. Using a distributed hydrology-soil-vegetation model (DHSVM) in Jinhua basin in China, Pan et al. (2018) [54] also mentioned that streamflow was simulated well in the single variable calibration, but not evapotranspiration, and that multivariables calibration showed more reasonable estimation of both streamflow and evapotranspiration simulated. Furthermore, in an application of GLEAM ET and ESA CCI (European Space Agency-Climate Change Initiative) surface soil moisture data for calibration of PCR-GLOBWB (PCRaster GLOBal Water Balance) in Oum er Rbia Basin in Morocco, López et al. (2017) [42] showed that multivariable calibration may help to solve the problem of over-parameterization and equifinality. Ha et al. (2018) [55] used four global ET products (later merging them as one ET product) and leaf area index (LAI) to calibrate SWAT model in ungauged the Day basin in Vietnam and showed reasonable estimation of water balance components and crop coefficients. They suggested that the above method is feasible to calibrate soil, vegetation and eco-hydrological parameters of SWAT, when the basin is ungauged and having complex water distribution.
Despite the aforementioned studies employing RS based ET data, the potential of remote-sensing--based evapotranspiration data for hydrological model calibration has not been fully explored. This is particularly the case in data poor regions (e.g., many river basins in South-east Asia), where any alternative source of data may bring an added value. In this study, the potential of the RS-based ET data is assessed for calibration in two ways. First, as single variable calibration, in which the hydrological model is calibrated on streamflow and ET separately, but in each case the model performance is analyzed on both variables. Second, as multivariable calibration in which both streamflow and ET are used in calibration together. Moreover, in both cases, the aim of the calibration is not limited to find the 'best parameter set', but also to identify a range of parameter sets, called hereafter 'good parameter sets', that can result in model performances similar or comparable to that of the best parameter set.

Study Area
The Chindwin basin, which is a major sub-basin of the Irrawaddy basin, was used in this study. It is located in the north-western part of Myanmar, and has the drainage area of 110,926 km 2 with its 1100 km long main river. The elevation in the basin varies from about 3800 m above mean sea level (MSL) in the northern forested mountainous region to about 100 m above MSL in the southern part ( Figure 1a). More than 80% of the Chindwin basin area is covered with forest. The rest of the basin is primarily covered by cropland, sparsely vegetated or bare land, and water bodies. According to the FAO soil map [56], Acrisols and Cambisols are the dominant soil types found in this river basin (Figure 1b). The Chindwin basin experiences subtropical and tropical climates [57]. Based on the available data from 2001 to 2010 from the meteorological stations (shown in Figure 1), the mean annual rainfall in the basin varies from 3900 mm (Hkamti station in the northern part) to 770 mm (Monywa station in the southern part), and the monthly average temperature varies from 17 • C to 40 • C ( Figure 1c). The area average annual runoff is~1200 mm at the basin outlet (Monywa Station).

Datasets Used
The geo-spatial data used in this study include: (1) 90 m resolution Digital Elevation Model (DEM) from the Shuttle Radar Topography Mission (SRTM) digital topographic data [14], (2) 300 m resolution land-use dataset in 2009 (GlobCover 2009) from the European Space Agency [15], and (3) 7 km resolution soil data (FAO Soil) from the UN Food and Agriculture Organization [56]. The precipitation data used in this study comprised spatially interpolated daily precipitation data from 2001 to 2010. Other meteorological data (for the same period) used included: temperature (at Hkamti, Homalin, Hakha, and Monywa), relative humidity, solar radiation, wind speed, and daily streamflow data at four stations (Hkamti, Homalin, Kalewa, and Monywa). Model calibration was carried at the aforementioned four stations (Figure 1a). More information on model forcing data can be found in Sirisena et al. (2018) [58].

Datasets Used
The geo-spatial data used in this study include: 1) 90 m resolution Digital Elevation Model (DEM) from the Shuttle Radar Topography Mission (SRTM) digital topographic data [14], 2) 300 m resolution land-use dataset in 2009 (GlobCover 2009) from the European Space Agency [15], and 3) 7 km resolution soil data (FAO Soil) from the UN Food and Agriculture Organization [56]. The precipitation data used in this study comprised spatially interpolated daily precipitation data from 2001 to 2010. Other meteorological data (for the same period) used included: temperature (at Hkamti, Homalin, Hakha, and Monywa), relative humidity, solar radiation, wind speed, and daily streamflow data at four stations (Hkamti, Homalin, Kalewa, and Monywa). Model calibration was carried at the aforementioned four stations (Figure 1a). More information on model forcing data can be found in Sirisena et al. (2018) [58].
The RS-based ET data were obtained from the Global Land Evaporation Amsterdam Model (GLEAM) [26,59]. GLEAM version 3.0b (GLEAM_v3.0b) driven only by satellite-based data (radiation, air temperature, precipitation, surface soil moisture, vegetation optical depth) spanning over 13 years (2003-2015) at daily time scale with a spatial resolution of 0.25° × 0.25°. Actual evapotranspiration presented in GLEAM is estimated as the sum of different terrestrial evaporation components such as transpiration, bare-soil evaporation, interception loss, open-water evaporation, and snow sublimation [26]. The GLEAM ET data for the study area indicates that, the monthly average evapotranspiration over the basin varies from 46 mm (minimum) in January to 121 mm (maximum) in July (Figure 2a). The average annual evapotranspiration over the basin is 1009 mm with a standard deviation of 24 mm. The annual evapotranspiration generally decreases from upstream to downstream ranging from 1106 mm to 790 mm (Figure 2b), similar to the precipitation distribution in the basin (Section 2.1). The highest evapotranspiration occurs in the Hkamti sub-basin The RS-based ET data were obtained from the Global Land Evaporation Amsterdam Model (GLEAM) [26,59]. GLEAM version 3.0b (GLEAM_v3.0b) driven only by satellite-based data (radiation, air temperature, precipitation, surface soil moisture, vegetation optical depth) spanning over 13 years (2003-2015) at daily time scale with a spatial resolution of 0.25 • × 0.25 • . Actual evapotranspiration presented in GLEAM is estimated as the sum of different terrestrial evaporation components such as transpiration, bare-soil evaporation, interception loss, open-water evaporation, and snow sublimation [26]. The GLEAM ET data for the study area indicates that, the monthly average evapotranspiration over the basin varies from 46 mm (minimum) in January to 121 mm (maximum) in July (Figure 2a). The average annual evapotranspiration over the basin is 1009 mm with a standard deviation of 24 mm. The annual evapotranspiration generally decreases from upstream to downstream ranging from 1106 mm to 790 mm (Figure 2b), similar to the precipitation distribution in the basin (Section 2.1). The highest evapotranspiration occurs in the Hkamti sub-basin (sub-basin 1), where most of the land cover is forest. The most downstream of the basin is covered by residential areas, crop lands and mining activities.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 26 (sub-basin 1), where most of the land cover is forest. The most downstream of the basin is covered by residential areas, crop lands and mining activities.

Model Setup
The Soil and Water Assessment Tool (SWAT) was used to build a hydrological model of the basin. SWAT requires topographic, land use, soil and weather data to run hydrological simulations. The DEM data were used for stream network generation and basin delineation, which is needed to set up a SWAT model. The land-use, soil data and slope classification derived from the DEM were used for creating Hydrological Response Units (HRUs) within the delineated sub-basins. HRUs are the primary computational units of SWAT, in which the simulations are governed by the soil water balance equation [60,61].
Here the Chindwin Basin was subdivided into nine sub-basins (Figure 2b), and 202 HRUs. The Hargreaves method [61], which requires only temperature data, was used to estimate potential evapotranspiration. Surface runoff was estimated using the Soil Conservation Services-Curve Number (SCS-CN) method, and channel routing was based on the variable storage method. SWAT simulations were performed at a daily time step for the 2001-2010 period, with a warm-up period of one year to allow the model to establish a stable initial state.
Monthly time series of streamflow from four gauging stations and the GLEAM based evapotranspiration from nine sub-basins were used for calibration. Based on the data availability, calibration for streamflow was performed for 2002-2010, while calibration for evapotranspiration was done for 2003-2010. Selection of model parameters for calibration was based on their sensitivity to streamflow and evapotranspiration. The initial ranges of the selected 22 parameters for calibration are presented in Table S1 in Supplementary material. Model calibration was carried out in several

Model Setup
The Soil and Water Assessment Tool (SWAT) was used to build a hydrological model of the basin. SWAT requires topographic, land use, soil and weather data to run hydrological simulations. The DEM data were used for stream network generation and basin delineation, which is needed to set up a SWAT model. The land-use, soil data and slope classification derived from the DEM were used for creating Hydrological Response Units (HRUs) within the delineated sub-basins. HRUs are the primary computational units of SWAT, in which the simulations are governed by the soil water balance equation [60,61].
Here the Chindwin Basin was subdivided into nine sub-basins (Figure 2b), and 202 HRUs. The Hargreaves method [61], which requires only temperature data, was used to estimate potential evapotranspiration. Surface runoff was estimated using the Soil Conservation Services-Curve Number (SCS-CN) method, and channel routing was based on the variable storage method. SWAT simulations were performed at a daily time step for the 2001-2010 period, with a warm-up period of one year to allow the model to establish a stable initial state.
Monthly time series of streamflow from four gauging stations and the GLEAM based evapotranspiration from nine sub-basins were used for calibration. Based on the data availability, calibration for streamflow was performed for 2002-2010, while calibration for evapotranspiration was done for 2003-2010. Selection of model parameters for calibration was based on their sensitivity to streamflow and evapotranspiration. The initial ranges of the selected 22 parameters for calibration are presented in Table S1 in Supplementary Material. Model calibration was carried out in several iterations, with each iteration consisting of 2000 simulations with parameter sets generated using the Latin Hypercube Sampling technique [71]. To go from one iteration to the next, SUFI-2 suggests new ranges for parameter values based on their performance in the current iteration. Occasionally, Remote Sens. 2020, 12, 3768 6 of 24 SUFI-2 suggested parameter ranges needed to be adjusted when they were beyond the acceptable parameter ranges of SWAT. The Modified Nash-Sutcliffe Efficiency (MNSE, Equation (1) [62]) was used as an objective measure of model calibration performance or objective function. When there is no significant improvement in the MNSE between two successive iterations (<0.02), the calibration process is terminated.
where, Q o and Q s are observed and simulated streamflow, respectively and Q o is mean observed streamflow. Similarly, MNSE of ET is calculated based on ET GLEAM , ET s and ET GLEAM . The four streamflow gauging stations (used for calibration) are in series, from upstream to downstream: Hkamti, Homalin, Kalewa and Monywa (Figure 1a). The drainage area for these stations are from sub-basin 1 for Hkamti (27,439 km 2 ), 1 and 2 for Homalin (43,370 km 2 ), 1 to 4 for Kalewa (73,495 km 2 ), and 1 to 9 for Monywa (110,926 km 2 ) ( Figure 2b). The calibration was carried out from upstream to downstream by considering one station at a time. Once the calibration is completed for one gauging station, the parameters of the sub-basins corresponding to the gauging station were fixed to their best-fitted values, and the calibration for the next station downstream was undertaken.
Two calibration schemes were used in this study: single variable calibration and multivariable calibration. In the single variable calibration, the model was calibrated for streamflow and evapotranspiration separately. In multivariable calibration, the calibration was carried out for streamflow and evapotranspiration together using a multivariable objective function. The multivariable objective function, here referred to as overall MNSE, is defined as the weighted sum of the MNSEs with respect to streamflow and evapotranspiration.
The multivariable calibration requires weights to be defined for each variable. Different approaches have been tried in a number of previous studies to specify weights to the multivariable objective function. For example, Abbaspour et al. (2007) and Rostamian et al. (2008) [72,73] applied weights inversely proportional to the variance of the observed data used in calibration. Rientjes et al. (2013) and Rode et al. (2007) [43,74] used the coefficient of variation, i.e., standard deviation divided by the mean, instead of the variance. However, assigning equal weights is by far the most commonly adopted approach (e.g., Franco and Bonumá 2017 and Rajib et al. 2016 [41,48]), probably due to the lack of sufficient justification for other approaches. In this study, different weight combinations varying from 0 to 1 (with the sum of two weights equals to 1) with an interval of 0.05 were tested. When the weights for streamflow are between 0.4 and 0.75 and for ET between 0.6 and 0.25 are used, we found reasonable model performance for both streamflow and evapotranspiration. Therefore, here too equal weights were adopted for both variables.
As said earlier, in this study, the calibration aim is not just to look for one 'best parameter set', but also a range of 'good parameter sets' that can produce model results comparable to the best parameter set. This is also one reason why we chose not to split the limited available dataset into calibration and validation. The assumption here is that the parameter ranges defined by the 'good parameter sets' found in the calibration also comprise of parameter sets that can produce similar level of model performance in validation. This assumption can be considered reasonable as long as the inter-annual variability in the validation period is not too different from that in the calibration period. Thus we used all the eight years of data (2003-2010) for which both streamflow and ET are available for calibration, because splitting eight years into calibration and validation would represent only a narrow band of a long-term hydrological cycle in the catchment. However, when the model performance is evaluated on two variables (streamflow and evapotranspiration) as in this study, the model result on the second variable can also be viewed as model validation. Similar examples can be found in several publications that utilized remote-sensing based ET data for hydrological model calibration, e.g., [40,44,46,50,54]. To find the 'good parameter set', we adopted a method described by Finger et al. (2011) [51], where 100 simulations with the highest model performance were used. Although, only MNSE was used in objective function, the performance of calibrated results were also assessed with other commonly used efficiency criteria such as NSE (Nash-Sutcliffe Efficiency), PBIAS (Percentage of BIAS), and R 2 (coefficient of determination).

Estimation of Uncertainty in Model Parameters
Model input data, model structure, parameters, and calibration data are the main sources of model prediction uncertainties associated with hydrological modelling [75,76]. In the SUFI-2 approach used in this study, all sources of model prediction uncertainties are expressed by parameter uncertainty [62,71]. Various parameters such as CN2 (SCS Curve Number), GW_DELAY (Ground Water Delay time), ESCO (Soil Evaporation Compensation factor), and EPCO (Plant uptake Compensation factor) (22 parameters listed in Table S1) related to streamflow and evapotranspiration computations were used for model calibration. Based on the global sensitivity analysis [62], the five most sensitive parameters with respect to simulating for streamflow and evapotranspiration separately were selected to analyze the uncertainty of parameters in each iteration.
SUFI-2 uses the Latin Hypercube Sampling technique (LHS), which generates possible sets of parameter values from the given parameter ranges. To determine the extent of uncertainty associated with each of the model parameter, the normalized uncertainty scoring method [48,77] was applied, in which parameter values are normalized between 0 and 100 using Equation (2).
where P n is the normalized uncertainty score of a parameter of one simulation, P b is the parameter value of the same simulation, and U l and L l are upper and lower limits of the corresponding parameter, respectively. The minimum and maximum of the normalized scores represent the uncertainty range associated with that particular parameter. The broader the range of uncertainty scores, the higher the parameter uncertainty.

Calibration with a Single Variable
Note that all the model performance analysis presented in this study are based on monthly time series data. When adopting the single variable calibration approach, in general, model performance with respect to a variable with which it is calibrated against (e.g., streamflow) improves while the performance with respect to another variable (e.g., evapotranspiration) decreases ( Figure 3). However, here the calibration with streamflow alone resulted in a reasonable performance in both streamflow and evapotranspiration on the monthly time series data. This is particularly true with respect to R 2 . For example, in the fifth iteration of Hkamti station, the NSE of streamflow varies between 0.74 and 0.94, whereas, NSE of evapotranspiration varies between 0.30 and 0.55. At the same station and same iteration, PBIAS of streamflow and evapotranspiration varies from −20.0% to −6.7%, and −5.2% to −15.6%, respectively, while R 2 varies between 0.82 and 0.95 for streamflow, and between 0.8 and 0.85 for evapotranspiration. In contrast, calibration with evapotranspiration alone shows considerably low performance for streamflow. For the same example mentioned above, in the calibration with evapotranspiration alone, the NSE of streamflow ranges between −0.51 and 0.19, and NSE of evapotranspiration varies between 0.32 and 0.72. PBIAS values of streamflow and evapotranspiration vary between −60% to −12.9%, and −15.6% to −5.2%, respectively, while R 2 varies between 0 and 0.31 for streamflow, and 0.8 and 0.85 for evapotranspiration (under the same conditions considered above).   However, when calibration is carried out at successive stations (starting at Hkamti, then Homalin, Kalewa, and ending at Monywa), the variability of performance reduces considerably for both variables (streamflow and evapotranspiration, Figure 3). This is not surprising because the upstream station is already calibrated for streamflow and evapotranspiration separately. In successive iterations, model simulations show approximately similar performance for both variables for calibration with evapotranspiration alone, but always with better values for evapotranspiration than for streamflow. However, in both calibration approaches, NSE is better for streamflow compared to evapotranspiration. For example, in the fifth iteration of Homalin, the NSE of streamflow varies from 0.91 to 0.95, and from 0.55 to 0.7 for single variable calibration with streamflow and evapotranspiration, respectively (Figure 3a). For the same iteration at the same station, weighted NSE values of evapotranspiration (from sub-basin 1 and 2) vary between 0.27 and 0.41, and between 0.67 and 0.72 in the two calibration approaches (streamflow alone and evapotranspiration alone). In contrast, streamflow shows higher bias: −15% to −11% and −28% to 21% for calibration with streamflow and evapotranspiration, respectively, whereas the bias of evapotranspiration changes between −14% to −9%, and between −5% to 4%, respectively (Figure 3b). However, R 2 is greater than 0.77 for streamflow, and evapotranspiration in both calibration approaches (Figure 3c), implying that both variables preserve the temporal characteristics of observed data.
To assess the variation of the model performance belongs to best model simulation with respect to number of iteration, we analyzed the NSE as a representative model performance indicator of the best simulation in each iteration. In successive calibration iterations, an increase in the NSE of the calibration variable (either streamflow or evapotranspiration) is not always accompanied by a decline in the performance of the other variable (Figure 4a). For example, in the calibration with streamflow alone, the NSE of evapotranspiration increased when the NSE of streamflow also increased, and subsequently decreased in next iteration(s) at all stations but Kalewa station (Figure 4a). Similar results were also obtained in calibration with evapotranspiration alone (Figure 4b). This may be because different parameter combinations also could simultaneously increase the performances of both variables.

Calibration with Multiple-variables
In the multivariable calibration approach using streamflow and GLEAM ET, successive iterations improve the performance of evapotranspiration predictions compared to that of streamflow at all stations except the Hkamti station (

Calibration with Multiple-variables
In the multivariable calibration approach using streamflow and GLEAM ET, successive iterations improve the performance of evapotranspiration predictions compared to that of streamflow at all stations except the Hkamti station ( Figure 5). For example, in the final iteration at Homalin, Kalewa, and Monywa, NSE values of streamflow vary between 0.86 and 0.92, 0.92 and 0.96, and 0.95 and 0.97, respectively. During the same simulation, the NSE values of evapotranspiration for the same stations vary between 0.46 to 0.66, 0.53 to 0.63, and 0.63 to 0.67, respectively (Figure 5a). In addition, PBIAS values of evapotranspiration vary between −6% and +7%, indicating under and over-estimations of evapotranspiration in all three above mentioned stations. However, model results always underestimated streamflow at all but the Monywa station (Figure 5b). With respect to R 2 , model calibrations show good performances (>0.77) for both streamflow and evapotranspiration at all the stations (Figure 5c). This behavior implies that simulated streamflow and evapotranspiration adequately capture the temporal variability in the observed data. Compared to single variable calibration, multivariable calibration also reduced biases in both streamflow and evapotranspiration.

Model perfroamnce with 'Best Parameter Set'
When considering the best performing simulation in the last iteration for both single variable calibrations, results with streamflow alone show better performance in dry seasons (i.e., during low flows) than in monsoon seasons (high flows) (Figure 6a). Notably, the flows at Hkamti and Homalin are under estimated during high flow periods (June to August). This mismatch in high flows may be In the best performed simulation in every successive iterations of multivariable calibration, the NSE of streamflow shows an increase in most of the cases, whereas that of evapotranspiration shows a reduction (Figure 4c). However, these decreases are small (e.g., from 0.62 to 0.61 at Kalewa station, and from 0.68 to 0.64 at Monywa station). In multivariable calibration, NSEs of evapotranspiration reduce at all but Kalewa station for every successive iteration (Figure 4c). Performance of streamflow does not always increase with the reduction of evapotranspiration performances. For example, at Homalin station, NSE of streamflow increases from 0.89 to 0.92 in the second iteration and reached 0.91 by the third iteration. Thus, a clear pattern in the performance of individual variables is not identified.

Model Perfroamnce with 'Best Parameter Set'
When considering the best performing simulation in the last iteration for both single variable calibrations, results with streamflow alone show better performance in dry seasons (i.e., during low flows) than in monsoon seasons (high flows) (Figure 6a). Notably, the flows at Hkamti and Homalin are under estimated during high flow periods (June to August). This mismatch in high flows may be due to the poor spatial representation of rainfall over sub-basins. Model calibration with evapotranspiration alone under-estimates high flows and over-estimates the low flows at all stations for the entire simulation period.
When the calibration with evapotranspiration alone is considered, GLEAM ET and simulated ET show a well-matched seasonal variation of evapotranspiration in all the sub-basins (Figure 6b). However, in terms of magnitude or bias, model estimated ET is lower than the GLEAM ET during the dry period (December to February). In contrast, the model estimated ET is higher than GLEAM ET in sub-basins 6, 8, and 9 during the monsoon period. The calibration with streamflow alone under-estimates evapotranspiration in all sub-basins during the dry period, and over-estimates the same during the monsoon period particularly in sub-basins 1, 6, 8, and 9. Overall, calibration with streamflow alone appears to be capable of reproducing the observed evapotranspiration (in this case the GLEAM ET) at an acceptable level.
In multivariable calibration, simulated streamflow shows similar results obtained from calibration with streamflow alone (Figure 6a). However, high flows (during July to August) are less than the same resulted in calibration with streamflow alone. High flows are further underestimated in multivariable calibration. In contrast, simulated evapotranspiration at basin-level shows reasonably good performance similar to results obtained from calibration only with evapotranspiration ( Figure 6b). Therefore, multivariable calibration shows reasonable estimations for both variables (i.e., streamflow and evapotranspiration).
Model performances corresponding to the best simulation at each of the station are presented in Table 1. Overall, in each performance indicator of streamflow, there is no significant difference among four stations. Among all sub-basins, the least accurate performances of evapotranspiration are for sub-basin 6, 8 and 9. These results imply that multivariable calibration is able to reproduce the observed streamflows better than evapotranspiration. However, model simulations produced a reasonably good estimation of evapotranspiration as well.  In general, the multivariable calibration with streamflow and GLEAM ET showed reasonably good streamflow estimations (NSE > 0.85). However, understandably, that calibration does not result in producing better streamflow performance than the results obtained from single variable calibration with streamflow only (Figure 4c, Figure 6a and Table 1). On the other hand, this multivariable calibration showed still sounds better for evapotranspiration as that from the single variable calibration with evapotranspiration only and better than the single variable calibration with streamflow only (Figure 6b).  Figure 7 presents the selected five global parameters (CN2, GW_DELAY, ALPHA_BF (Base flow Alpha Factor), SOL_AWC (Available Water Capacity in the Soil layer), and ESCO, details of parameters are in Table S1 in Supplementary Materials) and corresponding model performance for the last iteration of Hkamti station taking as an example station to discuss the variation of model performances with 'good parameter set'.

Model Perfroamnce with 'Good Parameter Sets'
Remote Sens. 2020, 12, x FOR PEER REVIEW 17 of 26 and R 2 ) of streamflow and evapotranspiration vary within a narrow range (Figure 7), implying 'good parameter sets' can reproduce similar model results.

Parameter Sensitivity and Uncertainty
The most sensitive model parameters for streamflow and evapotranspiration were found to be different. Figure 8 shows the five most sensitive model parameters of each variable (streamflow and evapotranspiration) with respective MNSE values. GW_DELAY, RCHRG_DP (Deep aquifer Percolation fraction), ALPHA_BF, SLSOIL (Slope length for lateral subsurface flow), and LAT_TIME (Lateral flow travel time) are identified as the five most sensitive parameters for streamflow, whereas same parameters for evapotranspiration are SOL_BD (Moist bulk density of soil layer), SOL_Z (Depth from soil surface to bottom of layer), ESCO, EPCO, and SOL_AWC (refer Table S1 for definitions of parameters). Variations of MNSE for evapotranspiration-based calibration was found to be less than that for streamflow based calibration. In the multivariable calibration, weighted average MNSE values show less variation than individual streamflow and evapotranspiration calibrations. A considerable variation of parameter values are shown for ALPHA_BF and SOL_BD in evapotranspiration based model calibration. Results also indicate that, except SLSOL (subsurface lateral flow) and SOL_Z (depth of soil layer from surface), final parameter ranges in the last iteration of multivariable calibration are within the ranges of single variable calibration. The above two parameters directly correspond to changes in soil water balance.
Except for SLSOL, all the above identified sensitive parameters of streamflow are mostly related to the base flow (or groundwater flow) SLSOL parameter is related to the subsurface lateral flow. In In single variable calibration, four parameters show high variability for calibration with streamflow than with evapotranspiration, only ALPHA_BF shows high variability for calibration with evapotranspiration. Among these five parameters, CN2, GW_DELAY and ALPHA_BF are more sensitive to streamflow, whereas SOL_AWC and ESCO are more sensitive to evapotranspiration. In contrast, model performances show low variability except for performances of streamflow under calibration only with evapotranspiration ( Figure 7). This implies that a range of parameter sets (100 sets considered here) can produce similar model performances of streamflow and evapotranspiration when those variables are calibrated separately. Therefore, these parameter ranges are very likely to show a similar level of performance with a dataset not used in calibration (i.e., validation dataset), unless inter/ intra-annual variability of the new dataset is considerably higher than calibration dataset.
In multivariable calibration, all the selected five global parameters show high variabilities (based on 100 'good parameter sets' considered) compared to the variabilities associated with the single-variable calibration (Figure 7). These best performed simulations have resulted from different combinations of calibration parameter sets (100 'good parameter sets'), in which parameter value vary within a high range. Similar to single variable calibration, model performances (NSE, PBIAS, and R 2 ) of streamflow and evapotranspiration vary within a narrow range (Figure 7), implying 'good parameter sets' can reproduce similar model results.

Parameter Sensitivity and Uncertainty
The most sensitive model parameters for streamflow and evapotranspiration were found to be different. Figure 8 shows the five most sensitive model parameters of each variable (streamflow and evapotranspiration) with respective MNSE values. GW_DELAY, RCHRG_DP (Deep aquifer Percolation fraction), ALPHA_BF, SLSOIL (Slope length for lateral subsurface flow), and LAT_TIME (Lateral flow travel time) are identified as the five most sensitive parameters for streamflow, whereas same parameters for evapotranspiration are SOL_BD (Moist bulk density of soil layer), SOL_Z (Depth from soil surface to bottom of layer), ESCO, EPCO, and SOL_AWC (refer Table S1 for definitions of parameters). Variations of MNSE for evapotranspiration-based calibration was found to be less than that for streamflow based calibration. In the multivariable calibration, weighted average MNSE values show less variation than individual streamflow and evapotranspiration calibrations. A considerable variation of parameter values are shown for ALPHA_BF and SOL_BD in evapotranspiration based model calibration. Results also indicate that, except SLSOL (subsurface lateral flow) and SOL_Z (depth of soil layer from surface), final parameter ranges in the last iteration of multivariable calibration are within the ranges of single variable calibration. The above two parameters directly correspond to changes in soil water balance.
Except for SLSOL, all the above identified sensitive parameters of streamflow are mostly related to the base flow (or groundwater flow) SLSOL parameter is related to the subsurface lateral flow. In model calibration with evapotranspiration alone, all the sensitive parameters are related to soil properties except for EPCO, which is the plant uptake compensation factor [61]. Parameter values resulting in a reasonable MNSE (> 0.5) are scattered within their initial ranges (for initial ranges, please refer to Table S1 in Supplementary Material). This scattered parameter values indicate that there are more than one combination of parameter values that can reproduce similar outputs.
For the multivariable calibration, each of the model parameter uncertainty range was examined in terms of normalized uncertainty scores (Equation (2)) that range from 0 to 100 (Figure 9). Uncertainty scores presented here were calculated from the last iteration of multivariable calibration. The analysis shows that uncertainty scores vary greatly among different parameters and stations for the same parameter. Parameters which are most sensitive to streamflow show higher uncertainty than the parameters sensitive to evapotranspiration except for SOL_BD. However, among the five most sensitive parameters for streamflow, SLSOIL and LAT_TIME show considerably less uncertainty at all the stations. In contrast, only ESCO, one of the most five sensitive parameters for evapotranspiration, shows considerably less uncertainty. Furthermore, uncertainty ranges for SOL_AWC are minimal at Homalin and Kalewa stations, whereas their maximum variations are found at Hkamti and Monywa stations.
Remote Sens. 2020, 12, x FOR PEER REVIEW 18 of 26 model calibration with evapotranspiration alone, all the sensitive parameters are related to soil properties except for EPCO, which is the plant uptake compensation factor [61]. Parameter values resulting in a reasonable MNSE (> 0.5) are scattered within their initial ranges (for initial ranges, please refer to Table S1 in Supplementary material). This scattered parameter values indicate that there are more than one combination of parameter values that can reproduce similar outputs.  . Parameter values or absolute changes versus objective function MNSE for each iteration with 2000 simulations for calibration with a single variable; blue dots represent calibration based on streamflow, red dots represent calibration based on evapotranspiration, and black dots for multivariable calibration. Y-axis is a MNSE value ranging from 0 to 1 and the X-axis represents the changes of parameter value or absolute change of a corresponding parameter. 'v_' denotes replacement of the existing parameter value (e.g., v_GW_DELAY.gw), 'a_' denotes adding a fix value to existing value (e.g., a_SOL_BD.sol). The first five parameters correspond to streamflow and the last five to evapotranspiration. than the parameters sensitive to evapotranspiration except for SOL_BD. However, among the five most sensitive parameters for streamflow, SLSOIL and LAT_TIME show considerably less uncertainty at all the stations. In contrast, only ESCO, one of the most five sensitive parameters for evapotranspiration, shows considerably less uncertainty. Furthermore, uncertainty ranges for SOL_AWC are minimal at Homalin and Kalewa stations, whereas their maximum variations are found at Hkamti and Monywa stations.

Discussion
Results presented in this study are another application among a limited number of studies that have used GLEAM ET data to support hydrological model calibration in the data-scarce Chindwin River Basin in Myanmar. In general, under the three calibration approaches used, streamflow shows the best and worst performance in calibration only with streamflow and evapotranspiration, respectively, and vice versa in the calibration performance for evapotranspiration. Both variables show slightly lower performance in multivariable calibration than the individual calibrations with streamflow and evapotranspiration. In some of the sub-basins (e.g., sub-basins 2 to 4), simulated ET does not properly represent the GLEAM-ET during dry seasons (Figure 6b) under all three calibration approaches. In contrast, sub-basins 6, 8, and 9 overestimate ET during high flow periods (June to September) for the calibrations with streamflow and multivariable calibration. These differences can be attributed to the precipitation over the basins. Similar findings were obtained in a study on the Karkheh River Basin in Iran by Rientjes et al. (2013) [43]. They found that streamflow and ET obtain good performances when the variables are calibrated separately. However, for model calibration with both streamflow and ET, simulated ETs of most of the sub-basins show good agreement with SEBS-ET (satellite-based surface energy balance system -actual evapotranspiration). Immerzeel and Droogers (2008) [40] also could not find any improvement in streamflow simulation under the model calibration with satellite-based evapotranspiration from time series of MODIS images of the Upper Bhima catchment in south India. López et al. (2017) [42] also found that model calibration with GLEAM ET and soil moisture data shows a reasonable streamflow estimation (with NSE values varying from 0.5 to 0.75). However, better model performance was obtained when it was calibrated with in-situ streamflow data.
Tobin and Bennett (2017) [44] concluded that although there is no performance improvement in streamflow simulation, GLEAM ET data can be used to constrain the evapotranspiration parameters in the SWAT applications. They considered 16 major SWAT parameters, in which five are attributable to evapotranspiration (ESCO, EPCO, CANMX (Maximum canopy storage), GW_REVAP (Groundwater "revap" coefficient), and REVAPMN (Threshold water depth in shallow aquifer for "revap" or percolation to occur), refer to Table S1 for details). Here, ESCO and CANMX were identified as the most sensitive parameters for model calibration. Immerzeel and Droogers (2008) [40] found that actual ET is more sensitive to the groundwater (GW_REVAP) and meteorological (monthly rainfall increment--RFINC) parameters than soil (SOL_AWC) and land-use parameters (maximum plant leaf area index--BLAI). In our analysis, SOL_BD, SOL_Z, ESCO, EPCO, and SOL_AWC are found to be the most sensitive parameters for evapotranspiration. The analysis of 'good parameter sets' and resultant model performances imply that many combinations of model parameters can reproduce the streamflow and evapotranspiration comparable to same obtained from 'the best parameter set'. Furthermore, Winsemius et al. (2009) [1] suggested that satellite-based ET and soil moisture information are required in hydrological model calibration to reduce parameter uncertainties and constrain model parameters within physically realistic ranges.
Even though we discussed only the parameter uncertainty, there are other sources of uncertainty in hydrological modelling, such as model input data, model structure, and observed data used for calibration [75,76]. Precipitation, the primary input data used in this analysis was derived from interpolated gauge data of six stations, which are not well distributed over the basin. Therefore, we believe that precipitation data contributes to prediction uncertainty. In terms of observed streamflow data, Harmel et al. (2009) [78] stated that the typical uncertainty in measured streamflow varies by ±7 − 23%. Di Baldassarre and Montanari (2009) [79] stated that the average error of measured streamflow data was 25.6% for the Po River, Italy. With respect to ET, the accuracy of remote sensing-based products may vary significantly over the different regions due to climatic variability, topography, and land cover [47]. For example, Trambauer et al. (2014) [3] have compared eight different ET products over the African continent and summarized that some products show good consistency among them in some areas and diverge in other areas of the continent. Another analysis by Tobin and Bennett (2017) [44] showed that simulated ET matches with GLEAM-TMPA (GLEAM -Tropical Rainfall Measurement (TRMM) Mission Multi-satellite Precipitation Analysis) better than GLEAM-CMORPH (GLEAM -Climate Prediction Center MORPHing technique). Therefore, these RS-based ET products may possess considerable biases. Thus, further information is required to analyze and discuss the uncertainty in model outputs.

Conclusions
This study evaluated the use of measured streamflow and RS-based ET data to calibrate a hydrological model of the Chindwin Basin, Myanmar. The model was developed with SWAT and two calibration approaches were tested: single variable and multiple variable calibration, and identified and discussed 'good parameter sets', which can produce similar or comparable results that obtained from 'the best parameter set'. RS-based ET data used in the model calibration were obtained from the Global Land Evaporation: Amsterdam Model (GLEAM ET), and measured streamflow were obtained at four gauging stations in the basin.
In general, this study indicates that GLEAM ET data, together with streamflow data, offers good potential for hydrological model calibration in the study region as the simulation results show a good performance for streamflow (with an NSE > 0.85 on monthly time series), while maintaining a reasonable performance for evapotranspiration (with an NSE > 0.61). Moreover, the results of single variable calibration with GLEAM ET indicate that even in the absence of streamflow data (i.e., ungauged basin), the model would have produced streamflow NSE values of more than 0.69 for three out of four stations with PBIAS varying between −2.6% and −23%. The only exception to the above behavior was found at the Hkamti station with NSE of 0.16 and PBIAS of −22%. It is noted that the Hkamti station is the uppermost station of the four and hence its calibration is independent of the other three stations. In contrast, the other three stations (Homalin, Kalewa, and Monywa) are all downstream of Hkamti and in series. Thus, their calibration results are influenced by the calibration results of up-stream station(s).
Results also indicate that there can be many different sets of parameter values ('good parameter sets') can produce similar results, which can be obtained from 'the best parameter set'. The analysis of calibration parameters suggests that the parameter sensitivity and their values change among different calibration set-ups, and uncertainty ranges of parameters may vary among both different parameters and stations.
This study provides valuable insights on hydrological modelling, in the context of using remote-sensing based and multiple data sources for model calibration, which are particularly useful for data poor basins. There is, however, room for further research on multivariable calibration in distributed hydrological modelling with the use of globally available datasets such as evapotranspiration, snow cover, and soil moisture data together with traditionally used streamflow data. Such a calibration approach could lead to better representation of hydrological responses of particularly, ungauged basins, as this approach would enable reasonable parameter estimation across the basin.