Beneﬁts of Combining Satellite-Derived Snow Cover Data and Discharge Data to Calibrate a Glaciated Catchment in Sub-Arctic Iceland

: The beneﬁts of fractional snow cover area, as an additional dataset for calibration, were evaluated for an Icelandic catchment with a low degree of glaciation and limited data. For this purpose, a Hydrological Projections for the Environment (HYPE) model was calibrated for the Geithellna á catchment in south-east Iceland using daily discharge (Q) data and satellite-retrieved MODIS snow cover (SC) images, in a multi-dataset calibration (MDC) approach. By comparing model results using only daily discharge data with results obtained using both datasets, the value of SC data for model calibration was identiﬁed. Including SC data improved the performance of daily discharge simulations by 7% and fractional snow cover area simulations by 11%, compared with using only the daily discharge dataset (SDC). These results indicate that MDC improves the overall performance of the HYPE model, conﬁrming previous ﬁndings. Therefore, MDC could improve discharge simulations in areas with extra sources of uncertainty, such as glaciers and snow cover. Since the change in fractional snow cover area was more accurate when MDC was applied, it can be concluded that MDC would also provide more realistic projections when calibrated parameter sets are extrapolated to di ﬀ erent situations. calibration (MDC). To this end, a case study was conducted for which a hydrological model was calibrated using both daily discharge data and daily discharge data in combination with fractional snow cover area data. Compared to other studies examining MDC approaches, this study focused on a remote region with limited hydrological and meteorological data, with a small fraction of glaciation and high ﬂuctuation of snow cover area in the catchment. Potential advantages of MDC in discharge simulation were examined and the implications for hydrological projections under changing conditions were assessed.


Background
Hydrological models are essential for a range of important functions for society, including freshwater resource management, flood prediction, and hydropower development. Limited availability and poor hydrological data quality pose a challenge in achieving satisfactory performance of hydrological models [1,2]. Calibrating hydrological models for glaciated areas is particularly challenging, as ice and snowmelt represents an additional source of uncertainty [3][4][5]. Models are continually being updated and improved to overcome these and other obstacles, through various approaches to calibration and validation [6]. This study assessed the benefits of including satellite-derived data as an additional dataset for calibration, a process known as multi-dataset does not account for the performance of the various components in the runoff generation process. Hence, further research should be conducted on integrating parameter regionalization and MDC to calibrate a conceptual model.
In high alpine areas, fractional snow cover area has proven to be a valuable complementary dataset to daily discharge data to improve the calibration of models, because snow and ice melt contributes to river discharge [22,23,28]. Previous studies have shown that the degree of glaciation in a catchment influences the importance of accurate representation of the glacial melt process in hydrological models [33]. Therefore, the value of adding fractional snow cover area as an additional calibration dataset also depends on the amount of glaciation in a catchment. For example, Finger et al. [23] conducted a study using the MDC approach with snow cover images for three medium glaciated catchments and found that the improvement in performance was greatest for the catchment with the lowest degree of glaciation. The catchment examined in the present study, the Geithellnaá catchment in south-east Iceland, also has a low degree of glaciation, with 4.83% of the catchment covered by the Þrándarjökull glacier. However, snow cover area is highly variable throughout the seasons, with the Geithellnaá catchment fully covered in snow during winter and almost zero snow cover during summer. Hence, we tested the hypothesis that the fractional snow cover area can be valuable as an additional calibration dataset.
As many catchments around the world lack sufficient ground observations or data of satisfactory quality, earth observations or re-analyzed products from satellites are increasingly being integrated into hydrological models [2]. In this study, Moderate Resolution Imaging Spectroradiometer (MODIS) snow cover data were used to generate fractional snow cover area data for the study area. Previous studies have found earth observations to be suitable as an additional dataset when applying MDC [34]. Examples of additional datasets used for MDC include snow cover [20], glacier mass balance [35][36][37], a combination of those two [22,23,28], groundwater level [11], and soil saturation [38]. In the present study, the potential of fractional snow cover area as an additional dataset for calibration was evaluated in a low glaciated catchment with limited data (the Geithellnaá valley), using the Hydrological Predictions for the Environment (HYPE) model [39], calibrated using discharge data and MODIS snow cover images. The objective was to assess the benefits of calibrating the model against the two datasets, to improve the internal consistency and overall performance of the model. Compared to previous studies, this study focused on a region with limited data and a catchment with only a small fraction covered by a glacier but a great fluctuation in the amount of snow cover throughout the seasons. The fluctuation in snow-covered area is important, as the objective of the study was to assess the benefits of including satellite-derived snow cover data as an additional calibration dataset. Additionally, due to the remote nature of the catchment, data availability is limited in various ways. The study area has one operational discharge gauging station, while there are four meteorological stations in the vicinity of the catchment. Surface composition data are available from the Corinne database [40], but data about the underlying soil-layers is nonexistent. Furthermore, the abundance of cloud-cover and long winter nights in sub-arctic Iceland limit the use of satellite images, compared to other regions. One of the major benefits of MDC is that it can be valuable in helping to ensure that several relevant variables of a modeled catchment are simulated realistically. Seibert and McDonnell [11,18] claim that MDC can help ensure that a model provides the "right answers for the right reasons", a claim confirmed by several other studies [22,41] and for most areas in the world, as reviewed by Helmert et al. [42]. Based on these previous studies and the modeling structure in HYPE, we expected discharge and snow cover observations to improve HYPE model performance in the study catchment.
HYPE is a semi-distributed conceptual model developed by the Swedish Meteorological and Hydrological Institute (SMHI) between 2005 and 2007. It is designed to simulate discharge and model flow paths of nutrients in water [39]. HYPE has since been applied in some countries outside Sweden [43]. It is similar to HBV and several other conceptual models in the sense that it uses lumped sub-catchments and a concept similar to hydrological response units (HRUs) to characterize these sub-catchments. HYPE was chosen for this study due to its versatility in performance criteria and calibration methods [39], and the option to include additional calibration datasets. As HYPE is similar in setup to several conceptual models like HBV, we believe that this paper is of interest to the wider hydrological modelling community. The remainder of this paper is structured as follows: In Section 1.2, the study area is described. Section 2 provides an overview of materials and methods. Results regarding the overall model performance, discharge simulations, and fractional snow cover area simulations are discussed, in that order, in Section 3. The implications of the results obtained are discussed in Section 4, together with the limitations of the study and recommendations for future research. Finally, some conclusions based on the results and discussion are presented in Section 5.

Study Area
The Geithellnaá catchment is located in south-east Iceland, east of the Vatnajökull icecap ( Figure 1). The valley was created due to erosion by retreating glaciers after the last Ice Age and has a long, narrow shape, with steep slopes in upper regions and flat lower plains. The National Energy Authority of Iceland (NEA) has classified Geithellnaá as "direct snowmelt runoff" with little to no groundwater contribution. The main river, also named Geithellnaá, collects water from the highlands and a small icecap, Þrándarjökull, which is partly located in the valley [44]. The highest peak in the catchment is also the highest point of Þrándarjökull and reaches an elevation of 1231 m a.s.l. The glacier accounts for 4.8% of the valley area and approximately half the catchment is covered by vegetation, mainly on the flat lower plains. Accordingly, it can be assumed that glacier melt and snowmelt contribute significantly to the runoff in Geithellnaá. The River Geithellnaá drains a 190 km 2 large watershed and is host to only three farms, near the outlet [44]. The mean annual precipitation is 2680 mm, while mean the annual temperature is 1.1 • C in the upper reaches and 3.9 • C at sea level [45]. Daily temperature and precipitation data for the present analysis were interpolated from four meteorological stations in the vicinity of the study catchment, using the Harmonie model operated by the Icelandic Meteorological Office [45][46][47]. Monthly mean precipitation and temperature reveal a typical sub-arctic seasonal pattern, with monthly mean precipitation in October of over 10 mm day −1 in May-June less than 5 mm day −1 and mean monthly air temperatures in winter of −5 • C and around 5 • C in summer [29]. The regional weather is characterized by wet ocean air coming from the east [29]. There are three gauging stations in the catchment, V256, and V277. This study relied only on discharge data from gauging station V149, which is located at the estuary of River Geithellnaá. More detailed meteorological data are presented in Finger, 2018 [29].

Materials and Methods
This section presents the methods and data used in this study. First, an overview of the input data is given, followed by a description of the model, current model configuration, calibration-parameter selection, performance criteria and, finally, calibration methods. Temperature and precipitation time series were created by averaging interpolated gridded Harmonie datasets, which comprise re-analyzed weather predictions [45], for the period 2000-2017.The Harmonie datasets were selected over other datasets as their temporal coverage overlaps most with the other observed datasets (the data can be accessed through a linked repository in MyGeohub, DOI:10.13019/G5BM-SD63). Table 1 shows mean observed daily precipitation and temperature during the whole calibration period and per year of the validation period. Continuous discharge data for calibration and validation purposes were only available from gauging station V149, operated by the Icelandic Meteorological Office (IMO). It was decided to use data from this station due to its downstream location ( Figure 1) and sufficient data availability (continuously available for the periods 1972-2000 and 2005-2015). The two other gauging stations in the catchment, V256 and V277, are manual stations with only a few sporadic discharge measurements.

Input Data and Calibration Data
Fractional snow cover area data were computed from gridded data, using the MOD10A.1 dataset, version V006 [48]. These snow data consist of satellite images obtained with MODIS using two satellites, Terra and Aqua [48]. Both the Terra and Aqua datasets, which are available from 2000 and 2002 onwards, respectively, were used. Gridded data with 500-m resolution were averaged for the whole catchment. Fractional snow cover area was calculated by dividing the snow-covered area by the whole area of the catchment. Due to frequent cloud-covered days, it was decided that only days with less than 40% cloud cover data would be used for calibration and validation purposes. This resulted in an average of 65 days per year with useable data, or 18% of days.

Model Description
HYPE uses lumped sub-catchments, all of which are defined by area and divided into soil type and land use combinations (SLCs), a concept similar to HRUs. A more detailed description of the HYPE model can be found in Lindström et al. [39] and on SMHI.net [43]. For the creation of sub-catchments in the Geithellnaá catchment, we used a digital elevation model (DEM) with resolution 10 m obtained from the National Land Survey of Iceland [47]. With these raster data, sub-catchments were created in ArcMap (v.10.2). After tests with varying numbers of sub-catchments, it was found that a minimal number of sub-catchments led to the highest performance of the model. Hence, the area was divided into three sub-catchments, with the outlets located at the three gauging stations. Information provided to the model about the sub-catchments included area, elevation (mean and standard deviation), soil type and land use characteristics, and downstream sub-catchments. For each sub-catchment, a water balance was calculated for each time step, including precipitation, evaporation based on temperature, storage in snow, water, and soil, inflow from upstream catchments, and outflow. Each of these processes was simulated based on various parameters and aspects of the model setup. Various sub-models can be added to the structure to determine how specific processes are generated, but if a sub-model is included, separate parameters incorporated in this process have to be defined (see SMHI.net [43]).
For the creation of SLCs, soil data were obtained from the Icelandic Agricultural Research Institute [49], and five different soil classes were identified within the catchment, including "glacier". For land cover data, the EU Corine Land Cover dataset (Version 18.5.1) was used [40]. Nine different land cover types were initially identified within the catchment and were reclassified to six, including "water" (lake) and "glacier". SLCs were created by overlaying the soil type and land cover layers. Based on the five soil classes and six land cover types identified, this led to a potential total of 30 different SLCs. However, both the land and soil type classes included "glacier", and "water" was one of the land cover types. Glaciers and lakes are simulated as separate classes in HYPE, existing as only one layer without sub-surface water flow. Hence, only 18 combinations of SLCs were possible, 16 of which were found in the catchment. For each sub-catchment, the coverage of each SLC was expressed as a fraction. SLCs do not require information about elevation or geographical location in the sub-catchment, but, for each SLC, soil depth has to be defined in the model setup. Glacier mass is calculated per time step, using a coefficient and exponent parameter for the area-volume relation and a factor for the density of the ice (see SMHI.net [43]).
Fractional snow cover area was calculated per land use type for every sub-catchment. There are two different functions in HYPE, for the snow accumulation and ablation phases. During the snow accumulation phase, fractional snow cover is a function of snow water equivalent (snow) and the maximum snow cover equivalent during winter (fscmax) [43,50]. Snow water equivalent (snow) is calculated per time step and is dependent on snowmelt and snowfall. Snowmelt occurs when the observed temperature is above the threshold temperature (ttmp) and the amount is determined by the snowmelt parameter (cmlt). Similarly, snowfall occurs on days with precipitation when air temperature is below the threshold temperature (ttmp). Both cmlt and ttmp are land-use-dependent, meaning they vary with land use type. As soon as a threshold value is reached for fractional snow cover (fscmax-fsclim), the model switches to the ablation phase and uses a function that represents snow cover during winter. Fractional snow cover is then calculated based on snow water equivalent, maximum snowpack during winter (snowmax), and a distribution factor. The distribution factor is governed by the parameters fscdistexp, fscdist0, and fscdist1, and is dependent on the standard deviation of elevation in the sub-catchment. In this study the maximum value, fscmax, was set at 1, following observed data, while minimum snow cover, fscmin, was set at 0.048, as this is the fractional area of the glacier.

Parameter Selection and Calibration
In total, 74 parameters were included in the HYPE setup for this study. Initially, 40 parameters were assessed for sensitivity, to test which parameter values had the most influence on the performance of the model ( Figure A1 in Appendix A). These 40 parameters were tested because they are incorporated in the general flow simulation and not related to any sub-models included in the model structure. Those sub-models and the other 34 parameters were defined separately, based on different studies or default values suggested by SMHI [43]. The sensitivity of the 40 general parameters was assessed using the Monte Carlo approach with 50,000 runs. Based on the 100 parameter sets with the highest Nash-Sutcliffe efficiency (NSE) value, six parameters were selected for calibration, as model performance proved most sensitive to variation in the value of these parameters ( Figure A1). Figure 2 shows the selected parameters and the distribution of the values of the 100 best parameter sets for the calibration. The selected parameters showed more convergence within their given range than other parameters ( Figure A1), indicating that variation in the value of these six parameters influenced model performance. These six parameters were further calibrated, applying MDC and single dataset calibration (SDC), again using the Monte Carlo approach with 50,000 runs. The values of the other parameters were based on other studies and values suggested as common or default by SHMI in the model description [43].
None of the parameters selected for final calibration have a direct influence on HYPE simulation of snow cover area (Table 2). However, parameters ttpd and ttpi are correcting parameters for the threshold temperature for snowmelt and snowfall, and thus influence the snowpack. Parameter pcaddg influences precipitation distribution based on elevation, which in turn influences fractional snow cover area. One of the parameters included (ratcorr) is region-dependent and was hence calibrated per sub-catchment, also known as parameter regionalization.

Performance Criteria
Model performance was evaluated using an overall performance criterion based on the average of two separate criteria, one for discharge simulations and one for fractional snow cover area. HYPE offers a range of performance criteria to cater for various variables that can be included (a full list can be found at SMHI.net [43]). Here, Nash-Sutcliffe Efficiency (NSE) was used to evaluate discharge simulations and normalized root mean square error (NE) was chosen for the fractional snow cover area. Average NSE was calculated for the whole Geithellnaá-catchment as [51]: where Qobs is observed discharge, Qsim is modeled discharge, and mi is the number of values in a time series of data from a meteorological station.
To evaluate simulations of fractional snow cover area, the mean NE value was calculated for all sub-catchments [43], as: where Fscobs is observed fractional snow cover, Fscsim is simulated fractional snow cover, and mi is the number of values in a time series of a sub-catchment. To determine overall model performance, NSE and NE values were averaged. NE should be minimized, meaning that better performance of the simulation leads to a lower NE value. For NSE the best score is 1, meaning that better performance leads to a higher value [43,51]. To combine the two performance criteria with opposing maximum scores, the following formula was used: where x and y refer to the weights given to NSE and NE, respectively. HYPE allows the user to assign weights to each evaluation criterion when using multiple criteria, with the sum equaling 1. For this study, both performance criteria were given a score of 0.5. We decided to weigh both the NSE and NE scores equally, as we deemed correct simulation of discharge and fractional snow cover area to be equally important when applying the MDC approach.

Calibration Using SDC and MDC
In order to evaluate SDC simulations, only the NSE values were considered, as only daily discharge was used as calibration dataset. To evaluate MDC simulations, NE and NSE values were averaged, as both daily discharge and fractional snow cover area data were used as calibration sets. Only the parameters listed in Table 2 were calibrated, while all other parameters were kept the same in both MDC and SDC. For the validation phase, the optimal parameter sets obtained during the calibration phase were used to simulate discharge and fractional snow cover area. The overall performance and the performance of discharge and fractional snow cover area simulations were compared.

Results
In this section, overall model performance is presented first, followed by the discharge simulations and fractional snow cover area simulations, respectively.

Overall Performance
The results showed that MDC using discharge data and fractional snow cover led to 17% higher overall performance of the model than SDC during the validation phase (January 2011-December 2015). The largest relative improvement derived from the higher performance of the fractional snow cover area simulations with MDC, which increased model performance by 12% from 0.25 to 0.22 (Table 3, lowest column). The largest absolute improvement with the inclusion of fractional snow cover area for calibration was found in the discharge simulations, with an increase from 0.73 to 0.78.

Discharge Simulations
During the validation period, the MDC method resulted in more realistic simulations of daily discharge, with an average NSE value of 0.78, compared with 0.73 with SDC, an improvement of 7%. Moreover, the parameter sets created using MDC led to higher NSE values during each individual year of the validation period (Table 3). Figure 3 shows the NSE values for the 10 highest performing parameter sets when considering the overall performance criterion. Meteorological circumstances and changes in these seemingly influence the ability of the model to simulate discharge correctly. During years when temperature and precipitation were similar to the calibration period, such as 2011 and 2015, model performance was high; SDC led to (almost) equally realistic discharge simulations ( Figure 3). Conversely, for those years in the validation period when meteorological conditions deviated most from the calibration period, model performance was lowest, and the difference in performance between simulations created with MDC and SDC was greatest ( Figure 3, Table 1). During the years with the lowest performance, 2012 and 2013, both temperature and precipitation were low on average compared to the calibration period, except for several peaks in temperature in summer (Table 1). Yet, since model performance was higher for the MDC simulations, this suggests that including fractional snow cover data leads to model calibrations that are less susceptible to changing conditions. It should be noted that, despite the increase in performance when applying MDC, the discharge simulations during 2013 were not of sufficient quality, with a mean NSE value of 0.34 for the best 10 parameter sets (Figure 3). Figure 4 shows a timeseries of meteorological circumstances and discharge simulations in 2014, which was taken as a representative year for the results during the validation period. The results show that discharge is high when either precipitation or temperature is high: the graph displays discharge peaks during January and October/November when precipitation is high and in June/July when observed temperatures are relatively high (Figure 4).

Fractional Snow Cover Area Simulations
Fractional snow cover area simulations created with HYPE were compared to MODIS snow cover data. Fractional snow cover area was modeled more realistically when MDC was applied during the overall validation period (Table 3) and for each year during this period (Table 4). During 2012, 2014 and 2015 fractional snow cover area simulations were most realistic. The performance of fractional snow cover was lower during 2011 and 2013, and, corresponding to the discharge simulations, the difference in performance between SDC and MDC is largest for these years. Interesting in this case is the year 2012, for which discharge simulations are relatively low and the difference in performance is highest between MDC and SDC. Fractional snow cover area simulations are relatively realistic for the MDC simulations but less so in the SDC simulations, compared to other years of the validation period. A similar observation can be made for 2011, when the difference in the performance of fractional snow cover simulation is the same, and there is a difference in performance for discharge simulations despite overall model performance being high that year. These results imply, together with the observation that MDC is less susceptible to changes in meteorological circumstances, that MDC simulations are more likely to be right for the right reasons.  Figure 5 shows the meteorological circumstances and fractional snow cover area timeseries for 2014. The simulations with MDC and SDC follow largely the same trend. There are two periods with noticeable differences: during the snowmelt period (February-April) when the MDC simulations show an earlier onset of snow-melt trends, and during the snow-build period (October-December). This is reflected in the discharge simulations (Figure 4), where there is a delay in discharge, leading to an overestimation during the period of February to April.

Discussion
The results obtained indicated that including satellite-derived fractional snow cover data as an additional dataset for calibration helps to improve hydrological models in glaciated areas with limited data. Calibration with both datasets led to more realistic simulations of fractional snow cover and discharge for almost every year of the validation period, compared with SDC (Tables 3 and 4, Figure 3). The difference in accuracy between SDC and MDC was particularly apparent during years when model performance was low, such as 2012, 2013, and 2014 in the validation period. Data requirements for the additional dataset are low, as shown in this study, where only 18% of days could be used for fractional snow cover observation data. The improved overall model performance with MDC supports earlier findings (e.g., [1,20,22,23,31]).
Besides improving discharge simulations, MDC could be particularly valuable when extrapolating parameter sets, for example to ungauged catchments or to make discharge projections. Parameter regionalization has been proven to be useful when extrapolating constrained parameter sets to ungauged areas [28]. However, the technique becomes challenging when applied to catchments with high uncertainty, for example with diverse vegetation, porous underground layers, and glaciated or snow-covered areas [30,35]. The results of the present study show that MDC could be useful in overcoming this challenge, since fractional snow cover area simulations created with MDC were consistently more accurate during the validation period. This indicates that some internal processes related to snow cover were simulated less accurately when calibrating towards daily discharge only. In other words, the SDC simulations were right for the wrong reasons. Model performance when simulating discharge could be influenced by changing circumstances, such as different meteorological conditions or physical characteristics when applied in another catchment. Indeed, as concluded in the results section, during years in the validation period when meteorological conditions deviated most from the calibration period, the difference in performance between simulations created with MDC and SDC was greatest (Figures 4 and 5, Table 1). It can be concluded that applying MDC leads to model calibrations that are less susceptible to changing conditions. Hence MDC could be valuable when extrapolating model setups, particularly in combination with parameter regionalization as in this study, for example for ungauged catchments or climate change projections. In this study, it was not possible to carry out actual climate projections, due to limited temporal data coverage.
The degree of glaciation in a catchment influences the quality of discharge simulations, mainly as the difference in performance between SDC and MDC. Finger et al. [23] conducted a study using MDC with snow cover images for three separate catchments and found that the improvement in performance was highest for the catchment with the lowest degree of glaciation. When a smaller fraction of the catchment is glaciated, the fractional snow cover area in the catchment is more variable throughout the year. Geithellnaá has a relatively low degree of glaciation ( Figure 1). Therefore, based on our results and those in previous studies, it can be concluded that applying MDC with fractional snow cover area is more beneficial for the overall performance in low to medium glaciated catchments. In highly glaciated catchments, glacier mass balance would be more appropriate as an additional dataset for calibration purposes, as glacier mass balance has been shown to contain important information that could improve the reliability of simulations [52]. MDC can also improve discharge simulations in non-glaciated catchments, as shown in various studies (e.g., [1,12,20]. For regions with seasonal snowfall, MDC could be applied using fractional snow cover area as an additional dataset, while in areas in warmer climates other variables, such as groundwater level or soil saturation, could be used. It could be argued that MDC has more benefits in glaciated areas or regions with seasonal snow coverage, as snow and ice are an additional source of water, and hence add uncertainty [3][4][5]. While calibrating MDC can improve model performance, several issues might arise that can compromise the quality of the simulations, as noted by Freer et al. [53]. First, as the input data may include uncertainty, researchers risk degrading the quality of their simulations and the ability of the model to represent a system when including additional datasets [53]. In this study, the frequent cloud cover was a major source of uncertainty regarding the observed fractional snow cover data. Despite this uncertainty, the overall model performance improved with MDC and the model also simulated discharge better, indicating that the snow cover data were of sufficient quality. Second, the chosen additional dataset should be selected carefully, to be commensurate with the model setup [53]. The additional variable used for discharge calibration should have a significant influence on runoff generation throughout the year. The results obtained in the present and previous studies (e.g., [54]) indicate that accurate modeling of snow cover is important when attempting to model discharge in glaciated catchments. In this study, when the performance of fractional snow cover simulations was low, as in 2013, discharge simulations were also modeled less accurately. Given the importance of accurate modeling of snow cover in our study setting in Iceland, fractional snow cover can be considered an appropriate choice of an additional dataset for calibration. Hence, our results confirm that the additional dataset should be chosen with care.
The results obtained in this study indicate that HYPE is a suitable model for use with the MDC method and that adding an additional dataset is advisable for users of the open-access version of HYPE. HYPE is versatile and allows for easy inclusion of additional datasets and variables. It is possible to average performance criteria to evaluate the overall performance, making it easy to calibrate towards multiple datasets at once. One drawback of HYPE is the limited choice of performance criteria available for use in the calibration process. For example, in this study it was initially decided to use mean absolute error (MSE) to evaluate the performance of fractional snow cover area simulations, but this option was not available in the version of HYPE used [55,56].

Uncertainty, Limitations, and Future Research
Simulation results are only as reliable as the inputs, parameter estimates, and underlying assumptions used in the model [7]. Here, MODIS snow cover data, which were frequently obscured and included 40% cloud cover, were used for calibration purposes. An additional limitation is that snow cover fractions were computed for the whole catchment, while HYPE requires input per sub-catchment for additional observation data. Hence for every sub-catchment, the whole catchment average was given, which deviates from reality. Furthermore, the precipitation and temperature data were based on modeled interpolated data, as there were no meteorological stations within the catchment. Both the interpolation method and the averaging of spatial data add some uncertainty.
Based on the results and conclusions presented here, some recommendations can be made for future research. This study had limited time coverage due to the availability of observed data. Ultimately, the benefits of MDC for hydrological projections should be studied over a longer period, to test whether MDC does indeed lead to improved hydrological projections under climate change. Furthermore, a study using multiple catchments could lead to an enhanced in-depth analysis of the benefits of MDC. As MDC was found to lead to satisfactory results, further studies can employ other options for additional calibration datasets. The present study could be repeated for non-glaciated catchments with temporary snow cover during winters, to assess the usefulness of fractional snow cover data in such cases. For glaciated catchments, the application of MDC for discharge projections under future climate change should be examined for various types of models, as in Etter et al. [22].

Conclusions
This study examined the possible benefits of using satellite-derived snow cover area data as an additional calibration dataset (MDC) in a low-glaciated catchment with limited data. A case study was carried out using the Swedish HYPE model to simulate daily discharge in the Geithellnaá catchment in south-east Iceland. The results revealed that simulations using MDC with snow cover area as an additional calibration dataset led to 17% better overall model performance than simulations using only discharge observations, or Single Dataset Calibration(SDC). Both discharge and fractional snow cover area were modeled more realistically when applying MDC, with the performance of discharge simulations created with MDC improving by 7% and that of fractional snow cover area simulations by 12%. This improvement was found despite limited days with fractional snow cover area observations due to the abundant cloud cover in Iceland. We concluded that, for this case study, the MDC simulations were likely to be right for the right reasons. The performance of the model with SDC decreased most during years in the validation period when meteorological conditions differed most from the calibration period. The fractional snow cover area simulations created with MDC were more realistic for every year of the validation period than those created with SDC. This indicates that internal hydrological processes in the study area are likely to be more realistic when modeled using MDC. Since the MDC results are more likely to be right for the right reasons, it can be concluded that including satellite-derived data as an additional dataset for calibration has potential when working in low-glaciated catchments with low data. For example, MDC with fractional snow cover area could be valuable when extrapolating constrained parameter sets to ungauged catchments or when making hydrological projections in remote, glaciated areas with limited data. Finally, we concluded that the data requirements for additional calibration datasets are low, as only 18% of the days had usable snow cover area observations. Acknowledgments: We would like to extent our gratitude to the Icelandic Meteorological Office for supplying the data and much useful advice and feedback. We would also like to thank the Swedish Meteorological and Hydrological Institute for making HYPE freely available.

Conflicts of Interest:
The authors declare no conflict of interest.

Data Availability:
The data used in this study can be accessed through a linked repository in MyGeohub, (doi:10.13019/G5BM-SD63). Figure A1 shows the distribution of values for all 40 parameters that were tested for sensitivity. They were included as they are related to the main flow generation process in the model, and not to any sub-models such as snowmelt or glacier melt. All 40 parameters were tested in a large Monte Carlo analysis, with 50,000 runs. Figure A1 shows the distribution of values for the 100 highest performing parameter sets for each individual parameter. For many parameters, the distribution showed a "normal" shape, with the mean roughly in the middle of the range of possible values for that parameter, the first and third quartile at around one quarter and three-quarters of the range, and the minimum and maximum at the ends of the range. Other parameters showed clear divergence within their set range of possible values in the Monte Carlo analysis (e.g., ssrrate (panel D) and preccorr (panel F)). Convergence showed that the model was sensitive to variation in the value of a parameter, as the best performing 100 parameter sets showed some consistency as regards the value of this parameter. For the six parameters used in the final calibration of the model, the convergence in model values was strong (see Figure 2 in main text).

Appendix A
It should be noted that for the final calibration, only six parameters were included in a Monte Carlo analysis with 50,000 runs. For all parameters presented in Figure A1, values suggested by the creators of HYPE or in other literature were used. We realize that 50,000 runs is low for Monte Carlo analysis with 40 parameters, but this initial test was intended solely to determine which model parameters were most important for model performance, not for model calibration purposes.