On the Benefits of Bias Correction Techniques for Streamflow Simulation in Complex Terrain Catchments: A Case-Study for the Chitral River Basin in Pakistan

: This work evaluates the suitability of linear scaling (LS) and empirical quantile mapping (EQM) bias correction methods to generate present and future hydrometeorological variables (precipitation, temperature, and streamflow) over the Chitral River Basin, in the Hindukush region of Pakistan. In particular, LS and EQM are applied to correct the high-resolution statistically downscaled dataset, NEX-GDDP, which comprises 21 state-of-the-art general circulation models (GCMs) from the coupled model intercomparison project phase 5 (CMIP5). Raw and bias-corrected NEX-GDDP simulations are used to force the (previously calibrated and validated) HBV-light hydrological model to generate long-term (up to 2100) streamflow projections over the catchment. Our results indicate that using the raw NEX-GDDP leads to substantial errors (as compared to observations) in the mean and extreme streamflow regimes. Nevertheless, the application of LS and EQM solves these problems, yielding much more realistic and plausible streamflow projections for the XXI century.


Introduction
General circulation models (GCMs) are the mostly widely used tools to generate long-term projections of different climate variables, such as precipitation and temperature, on global scales. However, the spatial resolution of these GCMs is too coarse yet (around 50 km in the most recent models) [1,2], which strongly limits their usability for impact studies, which typically require local-scale information [3][4][5][6][7][8][9]. Indeed, their usefulness is especially questionable in regions with complex terrain characteristics, where most of the GCMs are not able to adequately resolve several factors directly affecting the local climate (e.g., topography, net radiation, or water vapor fluxes) [10][11][12][13].
For the particular case of hydrology, even though the coarse outputs from GCMs can be used directly in impact studies, applications at the basin and/or catchment levels normally rely on some form of downscaling (either statistical or dynamical), which allows for the better incorporation of local-scale features of relevance for the region under study, such as complex topography or sub grid processes that are not represented by the GCMs [3,4,14].
In this context, statistical downscaling (SD) aims to build empirical relationships linking the large-scale circulation (e.g., geopotential height, winds, and humidity at different vertical levels) to the local scale (e.g., precipitation at one particular site) [15,16]. Differently, dynamical downscaling (DD) is based on the use of the high-resolution regional climate model (RCMs), which is nested to the coarser GCMs over a limited area [4]. Both SD and DD are widely recognized as complementary approaches leading to overall similar results in many situations (see, e.g., [17] for a detailed discussion on this topic). Here, we use SD, which is computationally cheaper than DD.
An ensemble of 21 state-of-the-art GCMs from the fifth coupled model intercomparison project (CMIP5) have been downscaled to a high-resolution (0.25° × 0.25°) common grid and made publicly available by the National Aeronautics and Space Administration (NASA), through a product called the NASA Earth Exchange (NEX) Global Daily Downscaled Projections (GDDP) dataset ( [18], (https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-gddp (Access date; 20-07-2021)). This dataset (NEX-GDDP, hereafter), which is based on a SD technique known as bias-corrected spatial disaggregation (BCSD) [18][19][20][21], provides daily precipitation and the maximum and minimum temperatures globally from 1950 to 2005. NEX-GDDP has rapidly become one of the most popular datasets within the climate community, and its performance over different regions of the world has been already assessed in numerous previous studies. For instance, [22] investigated the future changes of precipitation at the local scale over China using NEX-GDDP and reported a "certain" reliability over the Han River, emphasizing the potential usefulness of this dataset for climate change impact studies at watershed scales. The author of [23] reported that, although NEX-GDDP offers potential improvements in reproducing monthly temperature and precipitation over China, it still exhibits region-dependent systematic errors. The author of [24] reported this dataset to be well in agreement with the observations on a monthly basis over southeast Asia; however, they also found significant biases, which were predominantly region-specific. The author of [25] used NEX-GDDP over India and reported the existence of some biases (e.g., significant overestimation and underestimation of temperature and precipitation, respectively). The author of [26] showed the existence of significant biases in this dataset for some regions in India.
Despite this extensive body of research, only a few studies (to-date) have focused on the assessment of the adequacy of NEX-GDDP downscaled data for hydrological applications. The present work aims to fill this gap, building on the premise that the convincing replication of historic streamflow conditions is the least of the requirements to be fulfilled by any hydrological model in which downscaled climatic data are used as inputs [20,27]. In other words, if a hydrological model is unable to replicate historic streamflow conditions in an efficient manner, it will most likely continue to perform inadequately in the future (e.g., [28][29][30]).
Moreover, note that having high-quality climate information is one of the key requirements for the assessment of long-term variations in the hydrologic cycle of a region [2]. For instance, small biases in the climate-driving variables can lead to significant changes in the dynamics of the hydrological system being studied [31]. As such, it is wellestablished that the direct use of GCM outputs (which typically incorporate important systematic biases) as inputs to drive the hydrological models is unreasonable and of little utility [32,33]. To overcome this issue, bias correction techniques are usually applied to improve the reliability of the raw GCM outputs [2,34].
Owing to the importance of bias correction, this work investigates the applicability of the NEX-GDDP dataset to simulate hydrometeorological conditions in Chitral River Basin (CRB), Pakistan. To do so, we first assess the biases that are present in this highresolution dataset over this complex terrain region. Subsequently, linear scaling (LS) and empirical quantile mapping (EQM) methods are applied to correct the NEX-GDDP meteorological data that are used as inputs to a (previously calibrated and validated) hydrological model, in order to produce local streamflow projections, up to the end of the 21st century. This allows us to assess the potential impacts that different bias correction strategies may have on the projected hydrological regime of the catchment.

Study Area and Data Description
The CRB is located in Pakistan and is one of the river basins of the Hindukush-Karakoram-Himalayan (HKH) region ( Figure 1). It is characterized by complex orographic features and drains an area of more than 11,000 km 2 , and its streamflow is predominantly fed by snow and glacier melt [35]. The river water supports domestic activities, and irrigation is dependent on it. It plays a key role in the socio-economic development of the region. Note, also, that the CRB is considered one of the most complex terrain catchments of High Mountain Asia (HMA). was retrieved from http://srtm.csi.cgiar.org (Access date; 02-02-2021) and covers the entire region of study. Finally, the NEX-GDDP dataset, which contains daily precipitation and maximum and minimum temperature values from 21 GCMs included in the CMIP5 project [18][19][20][21] was obtained from https://ds.nccs.nasa.gov/thredds/catalog/bypass/NEX-GDDP/catalog.html (Access date; 20-07-2021). These 21 GCMs have been downscaled using the bias-correction spatial disaggregation (BCSD) method [36], providing highresolution simulations for historic  and future  periods. For the latter, two different greenhouse gas emission scenarios (representative concentration pathways (RCPs)) were considered, namely RCP 4.5 and RCP 8.5, respectively.

Hydrological Modeling
The Hydrologiska Byråns Vattenbalansavdelning (HBV) hydrological model [37,38] was used in this study. Streamflow was simulated using the modified HBV-light (version # 4.0.0.23) [39]. The input data (streamflow, precipitation, and temperature) used for hydrological modeling was sourced from hydrological and climate stations in the catchment. Once the model was fed with all the required input variables, it was calibrated using a genetic algorithm and Powell optimization (GAP; [40]). For the calibration of the HBV-light model, thirteen years were selected, 1995-2007, leaving five years, 2008-2012, for validation. The year 1994 was used as a spin-up period. However, the objective function maximizes the NSE. Even though the NSE has a quadratic nature that favors the model's ability to simulate high flows [41], it can also approach optimum values, due to periodicity. Nonetheless, different statistical measures, including Nash-Sutcliffe efficiency (NSE), co-efficient of determination (R2), and percent bias, were used to assess the model's performance. NSE was used as an objective function to calibrate the model. For further details about calibration and validation of HBV-light, the interested readers are referred to the following sources: [4,[42][43][44].

Bias Correction
Observed precipitation and temperature are typically misrepresented by GCMs when they are evaluated at local spatial scales. Particularly, systematic biases may arise [9,45], making the application of some form of correction to adjust their raw outputs [2,46] towards the corresponding observations necessary. Here, we use two popular bias correction techniques to adjust the NEX-GDDP towards the available gauge meteorological data, namely linear scaling (LS) and empirical quantile mapping (EQM). The common period for NEX-GDDP and observations, 1995-2005, is used to this aim. Both LS and EQM are separately applied to each of the calendar months, based on daily data. That is, we built twelve independent statistical models. In all cases, the correction factors derived in the training phase for the period 1995-2005 were then applied to correct for the future simulations from NEX-GDDP for the period 2040-2100. In order to avoid the artificial skill that may arise from model overfitting, a leave-one-year-out cross-validation strategy [47] was adopted, in which each year was separately considered for testing, whilst the remaining ones were kept for fitting/calibration.

Linear Scaling (LS)
Linear scaling (LS) is the simplest bias correction technique and has been applied in numerous studies, with overall good results [48][49][50]. This method aims to match the mean of the simulated values with that of observed ones [50]. A multiplicative term is typically used to correct precipitation, whilst an additive term is used for temperature (Equations (1) and (2), respectively) [51].
where and stand for precipitation and temperature, M indicates the month of interest (1 stands for January, 2 for February, etc.), points to the NEX-GDDP data, indicates observed station data, and means corrected (bias).

Empirical Quantile Mapping (EQM)
Empirical quantile mapping (EQM) aims to adjust the simulated empirical cumulative distribution function (CDF) towards the observed one [52,53], using a number of quantiles [54]. Before adding each quantile to the observed quantile to create a revised calibrated CDF for the future period with a demonstrated climate change signal, the changes in the CDF are further rescaled, depending on the CDF during calibration or historical period [55]. Here, we use the particular implementation described in [56].

Assessing the Impacts of Bias Correction on Hydrometeorological Projections
In order to assess the reliability of NEX-GDDP over the region of study, we first analyzed the annual cycle of precipitation and temperature over the CRB for the reference period 1995-2005, as provided by the observations, raw, and LS-and EQM-corrected NEX-GDDP (Section 3.2).
Afterwards, raw and LS-and EQM-corrected NEX-GDDP models were fed into HBV-light to simulate the streamflow characteristics of the CRB for the mid-future (2040-2069) and far future (2070-2099) periods, under both RCP 4.5 and RCP 8.5 emission scenarios. Again, we focused on mean monthly streamflow, 7-day minimum and maximum streamflow, 10th and 90th percentiles of streamflow, and mean monthly precipitation and temperature, which allowed us to assess the impacts of bias correction on future hydrometeorological projections (Section 3.3 and 3.4)

Calibration and Validation
As shown in Figure 2, the performance of the HBV-light was very good for the calibration period (1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007), with an NSE value of 0.91, a PBIAS of 3.7 %, and an R 2 value of 0.91. These good results were found to be consistent for the validation period (2008-2012), reaching an NSE value of 0.81, a PBIAS of −2 % and a R 2 value of 0.82. Overall, the HBV-light was able to accurately simulate historic streamflow over the CRB. Bottom panels: measure of fit (including the R 2 coefficient) between daily observed and HBV-light simulated streamflow during the calibration (left) and validation (right) periods. Figure 3 shows that considerable monthly biases exist between raw NEX-GDDP model outputs and observations for the reproduction of the annual precipitation cycle over the CRB. Note that the observed precipitation is substantially higher in NEX-GDDP models over the central part of the year (May to September), but also in December and January for the period 1995-2005 (left panel). However, these biases were considerably reduced in the LS-and EQM-corrected NEX-GDDP models.

Impacts of Bias Correction on Simulated Observed Hydrometeorological Conditions
The observed temperature was significantly lower in the wet period (June-September), according to the raw NEX-GDDP models, for the period 1995-2005 (center panel). Differently, the observed temperatures were substantially higher for November-April (dry and early wet season). Nevertheless, LS-and EQM-corrected NEX-GDDP provided unbiased results throughout the entire year.
Even though the raw NEX-GDDP models considered in this study are already biascorrected and provide a high resolution, they still exhibit significant biases. One potential explanation for this negative behavior is that the BCSD is applied based on spatially aggregated reference data, which typically misrepresent important local events, especially in regions with complex topography, such as Hindukush. Therefore, to remove these biases, the above-described LS and EQM methods were applied. Both LS and EQM performed well in simulating observed precipitation and temperature; however, LS has a better impact on simulated observed precipitation than the EQM. The HBV-light model was forced with the observed data, raw, LS-corrected and EQM-corrected NEX-GDDP data, to simulate streamflow for the reference period 1995-2005 (right panel). As compared to the observed data, the raw NEX-GDDP led to significantly lower streamflow in the wet season (from May to late September). For the rest of year, however, raw NEX-GDDP yielded higher than the observed streamflow. Nonetheless, when LS-and EQM-corrected NEX-GDDP data were used as inputs to the HBV-light model, the simulated streamflow closely resembled the observed values throughout the entire year.
Other extreme aspects, such as 7-day minimum and maximum streamflow, and 10th and 90th percentile streamflow, were also shown to be highly sensitive to the driving input data used (Figure 4). In particular, the simulations provided by the HBV-light model, when it was forced with raw NEX-GDDP data, presented substantial deviations, as compared to the observed values. Differently, the simulations driven with LS-and EQM-corrected NEX-GDDP data showed consistent similarity with the observations.  Figure 5 shows the projected annual cycle for precipitation in the CRB, as provided by the raw and LS-and EQM-corrected NEX-GDDP models for the mid-future (2040-2069) under the RCP 4.5 and RCP 8.5 emission scenarios. As compared to raw NEX-GDDP, both LS and EQM data introduce modifications in the temporal pattern. In particular, as compared to the raw NEX-GDDP models, both bias correction methods lead to considerably lower precipitation in July and August and substantially higher precipitation in October, both for RCP 4.5 and RCP 8.5. Note that the LM and EQM methods yielded similar projections, which kept compatibility with the annual cycle drawn from the observations during the reference period ( Figure 3). Likewise, for the case of temperature (Figure 6), the LS and EQM methods were also shown to similarly modify the signal projected by raw NEX-GDDP models. Particularly, both approaches correct the cold values exhibited by the raw NEX-GDDP models, which are far from the observed regime throughout the year (Figure 3). Finally, Figure 7 shows the results obtained when the calibrated hydrological model, HBV-light, was forced with the raw and LS-and EQM-corrected NEX-GDDP models to project mid-future streamflow in the CRB for the period 2040-2069, under the RCP 4.5 and RCP 8.5 emission scenarios. As for precipitation and temperature, the streamflow projected by raw NEX-GDDP did not seem realistic at all, when compared to the observed one ( Figure 3). However, the LS-and EQM-corrected models were shown to produce similar, realistic future annual cycles for streamflow. In particular, as compared to raw NEX-GDDP models, the bias-corrected ones projected higher streamflow values during the wet season (June-August) and low streamflow values during the rest of the year, under both the RCP 4.5 and RCP 8.5 emission scenarios.   Figure 9 illustrates 10th and 90th percentile streamflow for mid-future period (2040-2069), under the RCP 4.5 and RCP 8.5 emission scenarios. The raw NEX-GDDP models were shown to provide higher values than LS-and EQM-corrected ones for 7-day minimum streamflow. On the contrary, the 7day maximum streamflow simulated using raw NEX-GDDP models was projected to be lower than the one obtained using simulated with LS-and EQM-corrected NEX-GDDP models. Likewise, the 10th and 90th percentile streamflow, simulated with raw NEX-GDDP models, were also projected to be considerably lower in the wet period and higher for rest of the year, as compared to the values obtained with bias-corrected data, which seem much more realistic.  All these results warn on the direct use of global datasets, such as NEX-GDDP, for impact studies at local scales, especially in regions with complex terrain characteristics, such as CRB. In particular, the biases (with respect to gauge-based observations) exhibited by this dataset for precipitation and temperate can originate unrealistic future projections of different hydrological characteristics, from the mean to the extremes. Nevertheless, our findings emphasize the importance of bias correction techniques in coping with this issue, providing realistic future scenarios and contributing, therefore, to minimizing misadaptation. Figure 10 shows the projected annual cycle for precipitation in the CRB, as provided by raw and LS-and EQM-corrected NEX-GDDP models for the far future (2070-2099), under the RCP 4.5 and RCP 8.5 emission scenarios. Both LS and EQM data lead to modified temporal patterns, as compared to raw NEX-GDDP. Particularly, projected precipitation was significantly lower in July and August with both bias correction methods and was substantially higher in October and slightly higher in November. Likewise, the mid-future precipitation projections, i.e., the precipitation projected for the far future with both bias correction methods, depicts the compatibility with annual cycle of precipitation exhibited from observations during the reference period ( Figure 3). Moreover, for the case of temperature (Figure 11), both bias correction methods depicted modified signals, as compared to raw NEX-GDDP models, too. Similar to midfuture period, both bias correction methods corrected the cold values shown by the raw NEX-GDDP models, which were in noteworthy contrast to the observed regime throughout the year (Figure 3). Figure 11. As Figure 6, but for the period 2070-2099. Figure 12 shows the results obtained when the calibrated hydrological model HBVlight was forced with raw and LS-and EQM-corrected NEX-GDDP models to project far future streamflow in the CRB for the period 2070-2099, under the RCP 4.5 and RCP 8.5 emission scenarios. It is clearly depicted that the streamflow projected by raw NEX-GDDP is not realistic, as compared to the observed streamflow ( Figure 3). However, the LS-and EQM-corrected models are shown to produce similar, realistic far future annual cycles for streamflow. Particularly, higher streamflow values were projected with bias-corrected models during the wet season (June-August), as compared to raw NEX-GDDP models, and low streamflow values were projected during the rest of the year under both RCP 4.5 and RCP 8.5 emission scenarios. A relatively higher streamflow was projected with the raw NEX-GDDP models in the wet period under the RCP 8.5 emission scenario, as compared to the streamflow projected under the RCP 4.5 emission scenario; similarly, relatively higher streamflow was projected with LS-and EQM-corrected NEX-GDDP models in the wet period under the RCP 8.5 emission scenario, as compared to the RCP 4.5 emission scenario, depicting a noticeable impact of the extreme greenhouse gas emission scenario on the projected streamflow.  Figures 13 and 14, respectively. The raw NEX-GDDP models depict higher values for 7-day minimum streamflow than the LS-and EQM-corrected ones. On the other hand, the 7-day maximum streamflow, simulated using raw NEX-GDDP, was lower than the one obtained using the LS-and EQM-corrected NEX-GDDP models. In addition, the 10th and 90th percentile streamflow, simulated with the raw NEX-GDDP models, was also projected to be substantially lower in the wet period and higher for rest of the year, as compared to the values obtained with bias-corrected data, which seem much more realistic.  Like mid-future hydrometeorological projections, the results for the far future also negate the direct use of the high-resolution NEX-GDDP dataset for impact studies at local scales, particularly in regions with complex terrain characteristics, and accentuate the need of bias correction methods for removing the (region dependent) biases present in the raw NEX-GDDP models.

Conclusions
The Chitral River Basin (CRB) is one of the most important catchments within the Hindukush region of Pakistan. Its particularly complex terrain and large climate variability (which is linked to different monsoonal systems) play a key role in defining the hydrometeorological characteristics of the catchment and have a strong influence on water demand (including domestic and irrigation uses), hydropower generation, and food security, affecting millions of people living downstream. As such, it is crucial to develop adequate strategies that allow us to produce reliable long-term streamflow projections over the region. In this context, a key requirement is to have access to highresolution, fine quality climatic information of temperature and precipitation, which are used as inputs for the hydrological models employed in impact studies [57][58][59]. In fact, climate models are generally selected based on the so-called "past performance approach", i.e., on their ability to efficiently simulate the historical observed conditions [60,61]. Based on this premise, this study evaluates the adequacy of a widely used, highresolution dataset, called NEX-GDDP, which comprises 21 state-of-the-art CMIP5 climate models, which have been statistically downscaled to a 0.25° × 0.25° regular grid for hydrological modeling in the CRB. Our results highlight that this dataset presents important biases for precipitation and temperature over the area of study, which, in turn, leads to unrealistic streamflow simulations, as revealed by comparison with the local observed data over a recent historical period. In order to cope with this issue, we also assessed the potential benefits of two different bias correction techniques, namely linear scaling (LS) and empirical quantile mapping (EQM), which are first shown to efficiently remove the mean errors found in the NEXGDDP models.
Then, the raw and LS-and EQM-corrected NEXGDDP data were used to feed the HBV-light hydrological model, which was used (after proper calibration/validation) to produce streamflow projections over the CRB for up to the end of 21st century, under two different emission scenarios, RCP 4.5 and RCP 8.5. The resulting projections strongly differed when the meteorological input variables (precipitation and temperature) were (not) bias-corrected, particularly in terms of the distribution of the annual cycle and extreme regimes.
In this study, both bias correction methods performed well in simulating reference streamflow for the CRB. As compared to the simplest bias correction strategy, LS, more sophisticated alternatives, such as EQM, do not seem to provide a clear advantage when used to correct the precipitation and temperature used as inputs for the hydrological modeling, leading to similar results, both in terms of the mean and extreme streamflow regimes. It should also be noted that the objective function for the hydrological model calibration could influence this result [41]. As demonstrated by [2], bias correction methods, which have performed well in simulating reference conditions, are expected to perform equally well in changing climatic conditions. In general, given that the BC approaches are designed to correct certain features of climate models, such as geographical, multivariate, temporal, and marginal aspects [55,62], it is clear that the biascorrected climate change signal is accurately representative of the elements under consideration. Despite the stationarity issue, bias correction techniques are more likely to provide realistic estimates of the future climate than the raw model outputs, which had already exhibited a (relatively) poor performance over a recent historical reference period.
Therefore, this work warns on the use of raw NEXGDDP data for long-term hydrological modeling in complex terrain regions, where unrealistic future conditions could be obtained. Nevertheless, this important issue can be alleviated by using simple techniques, such as LS or EQM, to reduce the systematic errors present in forcing climate data. The findings of this study may extend beyond the NEX-GDDP dataset (used in this study) to other datasets that have low temporal or spatial resolution and may lead to misleading results in impact modeling. Finally, against the critique of [62], a multivariatebiased correction technique could be adopted to maintain the dependency structure between different variables used for hydrological simulations [55,63].

Conflicts of interest:
The authors declare no conflict of interest.