Metrics Assessment and Streamﬂow Modeling under Changing Climate in a Data-Scarce Heterogeneous Region: A Case Study of the Kabul River Basin

: Due to many uncertainties in hydrological data and modeling, the ﬁndings are frequently regarded as unreliable, especially in heterogeneous catchments such as the Kabul River Basin (KRB). Besides, statistical methods to assess the performance of the models have also been called into doubt in several studies. We evaluated the performance of the Soil and Water Assessment Tool (SWAT) model by statistical indicators including the Kling-Gupta efﬁciency (KGE), Nash–Sutcliffe efﬁciency (NSE), and the coefﬁcient of determination (R 2 ) at single and multi-outlets in the KRB and assessed the streamﬂow under changing climate scenarios i.e., Representative Concentration Pathways (RCP) 4.5 and 8.5 (2020–2045). Because of the heterogeneous nature of the KRB, NSE and R 2 performed poorly at multi-outlets. However, the KGE , as the basic objective function, fared much better at single-outlet. We conclude that KGE is the most crucial metric for streamﬂow evaluation in heterogeneous basins. Similarly, the mean and maximum annual streamﬂow is projected to decrease by 15.2–15.6% and 17.2–41.8% under the RCP 4.5 and 8.5, respectively.


Introduction
The analysis of rainfall-runoff relationships provides essential information necessary for developing plans for infrastructure development and adaptive water management under changing climate. Several tools and methods are used for simulating the rainfallrunoff under a data-scarce environment [1,2]. However, most of these tools have their limitations in the context of large river basins due to the lack of data as well as the high cost of obtaining the required datasets and low economic potential, especially in the arid regions [3]. The Hydrologic Engineering Center's River Analysis System (HEC-RAS) [4] is used for rainfall-runoff estimation. In such a case, the computed flow goes to the lowest elevation point within the watershed rather than following the flow path of the stream or river [5]. Similarly, the Revitalized Flood Hydrograph (ReFH) [6] is also used for rainfallrunoff modeling, but it has limitations in simulating peak flow in large catchments [7]. The Hydrologic Engineering Center's Hydrologic Modeling System (HEC-HMS) models can also be used for rainfall-runoff modeling, although, among others, the snow accumulation and snowmelt are not yet incorporated [8]. Among all these approaches, the spatially distributed and physically based Soil and Water Assessment Tool (SWAT) is considered the most reliable because of its simulating capability of rainfall-runoff, groundwater flow, and channel routing [9]. However, its comparative performance has not been sufficiently evaluated at different physio-climatically heterogeneous spatial scales. Few studies show that the SWAT model's performance at multi-outlets is poorer than that of a single outlet [10].
In addition, at the spatial scale, few studies exist about the SWAT model's evaluation using additional simultaneous performance measures.
The performance of hydrological models is affected by a variety of parameters and indicators. Therefore, a single statistical metric may not be suitable for evaluating physicallybased models [11]. In the case of large heterogeneous river basins, care must be taken in choosing the performance metrics. Akhtar et al. [12] used Nash-Sutcliffe efficiency (NSE) for evaluating the SWAT model's performance at multi-outlets in the Kabul River Basin (KRB), whereby NSE performed poorly in the case of seasonal and low streamflow. Similarly, Zhang et al. [13] and Pfannerstill et al. [14] highlighted NSE's poor performance in low streamflow conditions. Ref. [15] reports a rather high NSE, in the range of 0.86-0.89, while still underestimating the river discharge at four stations of the Ivaí River basin.
With the use of distinct sets of sensitive parameter values, the multisite calibration methodology under a physio-climatically heterogeneous environment shows discrepancies in the hydrological behavior of the basin's upstream and downstream sections [16]. Therefore, the model's performance at different outlets may be compromised, thereby balancing the targeted threshold of the objective function during the multisite calibration approach [12]. Chilkoti et al. [17] found similar limitations in the objective functions based on conventional evaluation statistics used between actual and simulated flows of NSE, PBIAS, and logNSE. Willmot [18] states that traditional ways of evaluating geographic models through statistical comparisons of simulated and observed data are largely criticized; it is being proposed that statistical comparisons using R, coefficient of determination (R 2 ), and their tests for statistical significance are not enough. That is why the Root Mean Square Error (RMSE), its relevant measures, and the agreement index are being proposed to evaluate geographical models. The proposed index of agreement is used as the degree of predicting the model's error. A score of 1 indicates complete fitness, whereas 0 indicates no fitness between the observed and simulated values. Because the index of agreement has not been utilized extensively in the literature, a more comprehensive range of reported values is not present [19]. According to Legates and McCabe [20], employing statistics like R and R 2 for correlating observed and simulated data is not appropriate because these are extremely sensitive to extreme values or outliers and insensitive to proportional differences between the two different data sets.
Pushpalatha et al. [21] suggest the use of NSE computed on inverse flows and argues for its suitability under extremely low-flow conditions as compared to the NSE calculated on logarithm flows since it is not sensitive to high-flow values. According to Pushpalata et al. [21], the NSE computed on inverse flows focuses on the lowest 20% of flows, whereas the NSE on natural flows focuses on the largest 20% of flows. Contrary to other studies, Osuch et al. [22] and Garcia et al. [23] argue that the mean of the Kling-Gupta efficiency (KGE) applied to the streamflow and the inverse of the streamflow is sufficient to meet all evaluation requirements, but the KGE applied to the streamflow does not qualify the assessment criteria for modeling low-flow indices while using the conceptually continuous rainfall-runoff models. Santhi et al. [24] suggest using prediction efficiency (P e ), which measures the probability distribution of simulated and observed data fitness. Since its use is rare, its respective value range is seldom used to base models' performance decisions on [19]. Therefore, care must be taken to use any of these metrics, e.g., NSE, KGE, PBIAS, RSR, LogNSE and VE, etc. while considering the diversity in streamflow, e.g., the simultaneous calibration of low and high flow rates [12][13][14]. When it comes to the KGE, the propensity to underestimate flow variability is not as significant as with the NSE [25,26]. Therefore, Chilkoti et al. [17] recommend KGE to tackle the discrepancies of low and high flow rates at physio-climatically heterogeneous conditions, as it incorporates the bias, variability, and standard deviation, which may better assess the model's performance.
Another essential point in evaluating the performance of the SWAT model, besides other metrics, is the p-factor and r-factor. The p-factor shows the measured data (%) bracketed by the 95% prediction limit during the calibration process. Its values close to 1 indicate a very high performance and efficient model and vice versa. The r-factor, ranging from 0 to infinity, is the average width of the 95% prediction uncertainty (95PPU) band divided by the standard deviation of the measured data. The p-and r-factors are firmly connected, indicating that a greater p-factor can only be obtained at the expense of a greater r-factor; a simulation with a p-factor of 1 and an r-factor of 0 matches perfectly to observed data [27]. Both of these factors, though extremely important to be considered, are often ignored in evaluating the SWAT model's performance, and the model's evaluation is merely based on the value limits of a few known metrics such as R 2 , NSE, VE, and KGE, etc. [28,29].
Snowmelt from the Hindukush and Himalayan mountains provides over 80% of Afghanistan's water resources [30] for the country's 32.9 million population [31]. The country's river basins are experiencing data scarcity as a result of long-term insecurity and political turmoil [12], and water-relevant institutions are under-resourced in terms of the technical capacities needed for executing sound water management plans [32]. Furthermore, as the country's population grows, so does the demand for water resources amid the growing impacts of climate change on snow cover and precipitation patterns, resulting in increased intersectoral competition among diverse water consumers [32]. According to Moisello et al. [33] runoff variance is subject to the effects of climate change and human activities in the watershed, therefore, water storage operations in lakes and reservoirs shall be carried out in such a way that does not a pose risk of flooding in riparian regions. According to Doummar et al. [34], the integrity of global river systems is regularly jeopardized by the pressures of human population and development trends, therefore, an integrated water resources management (IWRM) approach is required to meet the future demands of water.
According to Akhtar et al. [12], until the end of 2030, streamflow in most KRB tributaries will fall by a maximum of 5% and 8.5% under Representative Concentration Pathways (RCP) 4.5 and 8.5 scenarios, respectively. Recently, the country's political turmoil has halted practically every sector, including academic institutions and service-delivery ministries, and as a result, research into water resources and climate change implications has been largely disregarded. Therefore, we conducted a study in the Alingar watershed, the Kabul River Basin (KRB) to (a) assess the performance of the SWAT model at single and multioutlets on comparative bases, (b) evaluate different statistical metrics used for calibration and validation of the SWAT model, and (c) analyzed the impacts of climate change on streamflow from 2020-2045. The importance of this study also lies in the fact that the Alingar watershed contributes about 10% of the mean annual streamflow (i.e., 61.5 m 3 /s, calculated as of the base period (2000-2019)) of the Dakah station (i.e., 614 m 3 /s), the outlet of the KRB at the downstream. This study not only provides detailed information about the potential impacts of climate change on water resources in a data-scarce environment, but also highlights the relevant metrics that could be considered while dealing with hydrological models and simulating future scenarios in heterogeneous watersheds.

Study Region
This research was carried out in the Alingar watershed, part of the Kabul River basin (KRB), which is highly heterogeneous in terms of substantial elevation differences between upstream and downstream. The snowmelt's significant contribution to runoff is therefore prone to the impacts of climate change. This watershed is a typical example of challenging water management tasks due to data-scarce conditions versus the vital need for water allocation considering climate change adaptation, etc. The total area of this basin is about 6239 km 2 , which is further divided into 13 sub-catchments ( Figure 1). The elevation range of the study region is from 601 to 5399 m above mean sea level (m.a.s.l.). Based on the Climate Forecast System Reanalysis Survey (CFSR), the mean annual precipitation received at this watershed during the baseline (2000-2019) is around 634 ± 177 mm, while the mean annual temperature observed is about 5.7 • C. The Afghanistan land cover atlas [35] shows that forest and shrubs cover about 31%, rangeland, built-up area, and barren land cover 64%, glaciers cover 0.27%, water bodies cover 1%, and irrigated agriculture covers only 3% of the total area of the watershed. The major rivers flowing in this watershed are the Alingar and Alishing rivers. The tributaries of Alingar are Noralam, Kolman Dara, Kolam, Poshal, Chomar, and Bugulchi rivers, whereas the tributaries of the Alishing river are Doab and Karak rivers. Both the Alingar and Alishing rivers meet at Mihtarlam (Laghman province); the streamflow gauging station is installed at Pul-e-Qarghayi on the Laghman river.

Description of the SWAT Model
The SWAT is a commonly used semi-distributed and physically-based continuoustime simulation model for quantifying streamflow [36], nutrient transport [37], surface runoff [38], and water demand [39]. The United States Department of Agriculture (USDA) developed and maintains this model regularly, and it works in daily time steps. This model has been used globally in small and medium watersheds [40] to large mountainous watersheds [24]. SWAT has also proven to be very effective in simulating the eventual effects of climate change (CC) scenarios on streamflow [41], erosion [42], and nutrient transport [43], etc. The mathematical expression of the water balance equation, which is the core driving engine of the SWAT model, is given below in Equation (1) [44]: where SW t represents the final soil water content (mm), SW 0 represents the initial soil water content on the day i (mm), and t represents the time (days). The amount of precipitation on the day i (mm) is R day , the amount of surface runoff on the day i is shown by Q surf (mm), E a is the sum of evapotranspiration (mm) on the day i and Q gw is the amount of return flow on the day i (mm).

Key Input Data and Methodological Processes Used in the SWAT Model
SWAT is a data-intensive model used for streamflow and other biophysical processes' simulation in diverse, complex and heterogeneous watersheds. The key datasets used in the SWAT model are the digital elevation model, land use and land cover map, soil map, meteorological and streamflow data. The key processes including data preparation, parametrization, discretization of the subbasins, sensitivity analysis, performance evaluation of metrics and simulation of climate change scenarios etc. has been summarized in the methodological framework ( Figure 2). The summary of the key data used in the SWAT model is given below in Table 1.

Digital Elevation Model
A digital elevation model (DEM) is a very important and crucial spatial input in the SWAT model used to automatically extract topographic parameters such as slope, elevation, relative relief, aspect, slope-length, etc. A high spatial resolution DEM can be used as the best alternative for conventional surveys and manual evaluation of the topographic conditions of a watershed [47]. Tan et al. [48] investigated the DEM at spatial resolutions ranging from 20 to 500 m and revealed that the DEM's spatial resolution significantly affects the quality of streamflow computation. Dixon and Earls [49] demonstrated that the DEM's spatial resolution affects the precision of streamflow estimation through the SWAT model. Moreover, in larger heterogeneous, mountainous, and diversified river basins, the use of higher resolution DEM could be more genuine in watershed delineation to higher accuracy [50]. Therefore, we used an ALOS PALSAR DEM of 12.5 m resolution derived from the Alaska Satellite Facility (ASF) [51]. The elevation across the study region varies between 601-5399 m.a.s.l. (Figure 1). While delineating, 13 sub-basins were formed with the minimum threshold for the sub-basin area of 2000 ha. Using the DEM data, four land slope classes were created i.e 0-15, 15-25, 25-35, and 35-100%. Around 77% of the watershed area was covered by the slope class >35, almost 8% was covered by the slope class 25-35, 6% was covered by the slope class 15-25 and 8.5% was covered by the slope class 0-15% ( Figure 3).

Land Use and Land Cover Map
Changes in land use and land cover are critical to the hydrologic cycle [52]. The spatial distribution of different land covers within a watershed is a significant aspect of water balance and is important in inducing the water flux. According to Akhtar et al. [53], there was just a 2% change in the KRB's agricultural land area from 2003 to 2013, which may not have had a significant influence on the research region's hydrology. Therefore, we used the FAO land cover map (30 m) [35], which is based on satellite imagery from SPOT-4 and the Global Land Survey (GLS) Landsat Thematic Mapper. The land cover map was resampled to 12.5 m. Around 15 land cover types covered the whole study area; among them, rangeland covers about 54%, forest and shrubs cover approximately 31.5%, while irrigated agricultural land covers around 3% of the study region ( Figure 4).

Soil Data
The SWAT model's hydrological response to the soil properties, reliability of the soil and land use data, as well as the pre-processing procedures used on the geographically dispersed data, is critical to the SWAT model since its internal aggregation procedures have a substantial impact on the original map's fragmentation and resolution [54]. A comparative analysis of two different resolution soil maps for the SWAT model [55] showed that soil maps with higher resolution yielded better R 2 and NSE for simulating streamflow and water quality. We used the FAO/UNESCO map [45] in this study and resampled it to 12.5 km to cover the smallest sub-basin. Four dominant soil types represented the entire study region ( Figure 5). The major soil type (Lithosols 2), a combination of Lithosols, Cambisols, and Rankers, covers approximately 55% of the study area, Haplic Xerosols cover approximately 29%, Lithosols (1), a combination of Lithosols and Xerosols, covers 13%, and Gleysols covers approximately 3%.

Meteorological Data
Hydrological models require meteorological data to simulate any relevant operation. While this information may be plentiful in some parts of the world, most watersheds and river basins in Afghanistan lack regular meteorological data. The Alingar watershed is one such example where the observed data is limited both in space and time [56]. The two precipitation gauges installed at the extreme outlets at Pul-e-Islamabad and Pul-e-Qarghayi do not represent the upstream's meteorological conditions, which receive massive precipitation and experience minimum temperatures. Therefore, the use of a global dataset is the only available option in such data-scarce watersheds. The study of Tolera et al. [57] shows that CFSR performed relatively better in the smaller watershed (i.e., Keleta) than in the larger watershed (i.e., Melka Kunture). Therefore, we also used the daily CFSR data of a spatial resolution of 19.2 km while covering the period of 2000-2019; the data was retrieved from the climate engine in CSV (comma-separated values) format for further processing [46].

The SWAT Model's Calibration, Validation, and Uncertainty Analysis
To diagnose the most sensitive parameters in the streamflow modeling with the SWAT model, the Uncertainty in Sequential Uncertainty Fitting (SUFI-2) algorithm in SWAT Calibration and Uncertainty Procedures (SWAT-CUP) was utilized, which generates Latin Hypercube samples for sensitivity analyses. For calibrating the SWAT model for those watersheds which are snow-fed, it is important to consider snow relevant parameters, e.g., Snowfall temperature ( • C) (SFTMP), Snowmelt base temperature ( • C) (SMTMP), Maximum snowmelt factor (SMFMX, mm of water/ • C day −1 ), Minimum snowmelt factor (SMFMN, mm of water/ • C day −1 ) and Snowpack temperature lag factor (TIMP), and runoff relevant parameters, e.g., the soil permeability, land use, and antecedent soil-water conditions that all influence the SCS curve number, soil water content (SOL_AWC) as well as soil evaporation compensation factor (ESCO) [58]. We used the SUFI-2 algorithm in the SWAT-CUP to calibrate, validate, and analyze the sensitivity of the SWAT model [27]. The SWAT model's simulation accuracy was assessed using the monthly observed streamflow data collected from the Pul-e-Qarghayi station. Based on the limited streamflow data availability, the calibration period was 2008-2013, and the validation period was 2014-2019. Via an iterative method, the SUFI-2 algorithm calculates the range of the 95PPU and brackets the actual streamflow data points within the band, which is called the r-factor [59]. The algorithm uses two main metrics to determine the goodness of fit, i.e., (1) the p-factor, which ranges from 0 to 1, and (2) the r-factor, which is the average 95PPU-band's width ratio to the observed variable's standard deviation [59]. After obtaining appropriate KGE, NSE, and R 2 , the parameter ranges thus yielded in the last iteration were considered the calibrated parameters for the validation of the SWAT model.

Evaluation of Different Statistical Metrics
While evaluating hydrological models, the performance criteria are used as a single number to show the similarities between simulated and observed streamflow [25]. In this study, we used the KGE [25] as the objective function with the SUFI-2 algorithm for automatic calibration. The minimum threshold of the objective function (i.e., the KGE) for the behavioral solutions was set to be 0.50. The mathematical expression of these metric indicators is given below: where the linear correlation between actual and simulated data is expressed by r, the metric of the streamflow variability error is denoted by α, and the bias is shown by β.
The performance of the model is considered reasonable when −0.41 < KGE < 1 [60]. Similarly, NSE [61] and R 2 are the byproducts automatically calculated by the SWAT-CUP model by comparing the observed and simulated streamflow. The NSE is calculated by using Equation (3), as given below: where Q obs denotes the actual streamflow, Q sim denotes the simulated streamflow and Q obs denotes the observed mean streamflow and n is the total number of observations. NSE is varying between −∞ and 1, when the NSE = 1, it is considered to be the optimal simulation. Values between 0 and 1 are typically considered acceptable levels of performance; however, a value below 0 means that the mean observed streamflow is a better predictor than the simulated streamflow, showing unsatisfactory performance. Similarly, the degree of collinearity between observed and simulated streamflow is denoted by R 2 . It indicates the fraction of the variance in observed streamflow that the model explains. The value of R 2 ranges from 0 to 1, with higher values suggesting less error variation, and values larger than 0.5 are normally regarded as acceptable [24]. The coefficient of determination (R 2 ) is calculated using the following equation: where n is the total number of observations, Q obs and Q sim show the observed and simulated streamflow, and Q obs and Q sim and are the observed and simulated mean streamflow, respectively.

Analysis of the Streamflow under Changing Climate
The CORDEX (Coordinated Regional Downscaling Experiment) provides the expected changes in numerous variables under the climate change (CC) scenarios [62]. The data (50 km spatial resolution) was retrieved using the regional circulation model (RegCM4-4) from the South Asia domain of the CORDEX. For analyzing the impacts of CC on streamflow from 2020 to 2045, the daily maximum and minimum temperature and precipitation data were downloaded from the CORDEX domain. The study region is expected to receive mean annual precipitation of 774 ± 229 and 789 ± 203 mm during 2020-2045, respectively, under the RCP 4.5 and 8.5. Similarly, the mean annual temperature under the RCP 4.5 and 8.5 during 2020-2045 is predicted to be 5.5 and 5.8 • C, respectively ( Figure 6). The future streamflow was analyzed in annual (i.e., minimum, and mean) and monthly time steps to diagnose major shifts at seasonal scales to relate these temporal changes at the water supply side to periods of peak demand for agricultural purposes.

Results and Discussion
During the calibration procedure, the highest absolute t-Stat value is attributed to the highest sensitivity of that specific parameter during calibration [27]. In the present study, the sensitivity analysis tool in the SWAT-CUP model identified eight parameters as being the most sensitive (Table 2). Following the identification of the sensitive parameters, we calibrated their values while the default model values of other parameters were retained throughout the calibration process. The streamflow was sensitive to maximum snowmelt factor (SMFMX), minimum snowmelt factor (SMFMN), the soil conservation service (SCS)' runoff curve number (CN2), soil evaporation compensation factor (ESCO), an available water capacity of the soil layer (SOL_AWC), snowfall temperature (SFTMP), Snowmelt base temperature (SMTMP), and lag factor for snowpack temperature (TIMP). The KGE, the selected objective function, acquired during iterations for the SWAT model's calibration at the Pul-e-Qarghayi outlet was 0.72, NSE was 0.59, and R 2 was 0.60, which can be classified as "moderate" to "good according to Thiemig" [63]. During validation of the SWAT model (after considering the new values of the sensitive parameters) the KGE increased to 0.89, while NSE and R 2 increased to 0.98 each (Figure 7).

Performance Evaluation of the Statistical Metrics
During iterations for calibration of the SWAT model at the Pul-e-Qarghayi outlet, the KGE achieved was 0.72, the NSE was 0.59, and R 2 was 0.60 ( Figure 7). The mean annual simulated streamflow during calibration was 54.53 ± 53.69 m 3 /s, while for the same period the mean annual observed streamflow was 50.93 ± 63.06 m 3 /s. During the calibration, the p-and r-factors were 0.86 and 1.58, respectively. During calibration, there has been considerable improvement in the p-factor (i.e., 0.92) which shows that 92% of the observed data points have been successfully bracketed during the simulations. In contrast, the r-factor reduced from 1.58 (during calibration) to 0.83 (during validation), which shows considerable model development in simulating the streamflow at monthly time intervals. As a result, the KGE improved from 0.72 (during calibration) to 0.89 (during validation). Similarly, the NSE increased from 0.59 (during calibration) to 0.98 (during validation), reflecting a considerable development in the model's prediction of the streamflow. Improvement has been observed in the RSR value, which reduced from 0.64 (during calibration) to 0.14 during validation. During the validation period, the mean annual simulated and observed streamflow were 73.81 ± 115.40 and 81.56 ± 121.58, respectively, which are close to each other. Further statistical details are found in Table 3. The mean annual simulated and observed streamflow increased by around 35% and 29% in the validation period compared to the calibration period, respectively; this increase was caused by a 123% and 41% increase in total monthly precipitation, especially in the months of February and March. Note: p-factor refers to the percentage of observations covered by the 95PPU, r-factor depicts the thickness of the 95PPU, R 2 , is the coefficient of determination, NSE is the Nash-Sutcliffe Efficiency, PBIAS is the Percent bias, KGE is the Kling-Gupta Efficiency, RSR is the ratio of the RMSE and standard deviation, VOL_FR is the fraction of the overall water balance which is predicted, Mean annual SSF refers to mean annual simulated streamflow, Mean annual OSF refers to mean annual observed streamflow, STD_S refers to Simulated standard deviation and STD_O refers to Observed standard deviation,.
According to Thiemig [63], the simulations' performance can be grouped as intermediate when 0.75 > KGE > 0.5 and good when KGE > 0.75; according to Moriasi et al. [19], the scatter plots shown in Figure 8 depict a satisfactory result for calibration where 0.50 < NSE < 0.65, whereas for validation the NSE values are very good, i.e., 0.75 < NSE < 1.00. According to Knoben et al. [60], if the mean streamflow was used as a benchmark, model output in the range of −0.41 < KGE ≤ 1 may be considered "reasonable" as the model outperforms the benchmark; based on this, the KGE was 0.72 during calibration and 0.89 during validation, which rationalizes the model's performance as acceptable. A SWAT model was also used in Pakistan at the Simly Dam watershed, and while evaluating the model's performance, an NSE of 0.85 and 0.79 for validation were yielded for calibration and validation periods respectively [64]. Another similar experiment by Shrestha et al. [65] shows that R 2 values were 0.67 and 0.75 while NSE values were 0.63 and 0.74 during calibration (1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993) and validation periods (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001), respectively [65]. Zare et al. [66] evaluated soil water content through the SWAT model for Southern Saskatchewan (SK, Canada) and achieved NSE of 0.62 and 0.78, during calibration and validation (of streamflow) respectively. Son et al. [67] calibrated and validated the SWAT model with streamflow in the Nam Rom catchment (Vietnam), and the NSE achieved during calibration and validation was 0.76 and 0.64.

Temporal Analysis of the Variation in Streamflow under Changing Climate from 2020 to 2045
The mean annual streamflow during the base period (2000-2019) was 61.48 ± 69.0 m 3 /s, while it was 52.12 ± 48.3, and 51.87 ± 37.3 m 3 /s under the RCP 4.5 and 8.5 (2020-2045), respectively ( Figure 9). There was not much difference between the mean annual streamflow under both CC scenarios; however, we recorded high variability under the based period and RCP 4.5. The statistical evaluation of the annual streamflow shows that by 2045, approximately a 15 and 16% reduction in the mean annual streamflow is expected under RCP 4.5 and 8.5, respectively.  Table 4 summarizes the mean annual variation in the streamflow compared to the base period; the annual minimum streamflow under scenarios RCP 4.5 and 8.5 are expected to increase by 48.2 and 125.6%, respectively. The annual maximum streamflow is projected to decrease by 17.2 and 41.8%, respectively. Similarly, the annual mean streamflow is projected to decrease by 15.2 and 15.6%, respectively.  Figure 10 depicts that the peak hydrograph under the base period is usually in June when increased temperature melts down most of the accumulated snow on the upstream heights of the watershed. Under the CC scenarios, the peak hydrograph is expected to split into two peaks, one formed in Mar-Apr which indicates earlier snowmelt, and the second one in August, which is due to an increase in summer or Monsoon rains contrary to the base period, which is adding to the peak hydrograph in August. The comparative analysis of the monthly streamflow under the RCP 4.5 shows that streamflow in January, February, and March increased by 48, 30, and 74%, respectively. With 262 and 207%, the highest increase is projected for October and November, respectively. Under the same scenario (i.e., RCP 4.5), the mean streamflow in May and June is expected to decrease by 78 and 87%, respectively. Likewise, under the RCP 8.5, streamflow in November, December and January is expected to increase by 240, 226, and 126%, respectively; this change can be explained by a partial shift from snowfall to rain (as a consequence of higher temperature) or by snow melting periods in winter. Conversely, the highest decrease is projected for May and June by 71 and 66%, respectively. Overall, the shift in hydrograph is similar under both scenarios.

The Impact of Spatial Scale on the Performance of the SWAT Model
The calibration of the SWAT model for multi outlets, especially in large river basins under highly heterogenic conditions of land use, soil types, climate, and elevation differences, is quite challenging [68]. This is because if one outlet is being calibrated for a targeted objective function, any increase in any variable may equally influence the streamflow (by decreasing or increasing the hydrograph peak), actual evapotranspiration, and runoff generation as well as other water balance components [69]. Therefore, as a standard iteration exercise, it is challenging to achieve a common range of sensitive parameters, which boosts the calibration results equally [70,71]. Therefore, in this study, we used the KGE, which incorporates the model's bias, temporal fitness, and variability instead of NSE, which underestimates the variability in the simulated streamflow [25]. Moreover, to evaluate the SWAT model's performance at a single outlet in the Alingar watershed, i.e., Pul-e-Qarghayi, the results were compared to the SWAT model's performance at multi-outlets, which included Pul-e-Qarghayi together with five other outlets reported in Akhtar et al. [12], while ensuring that there are no storage structures or large diversions upstream of the specified outlets that might affect the volume of river discharge. Table 5 reveals that the KGE (as the chosen objective function) yielded a value of 0.72 (during calibration) and 0.89 (during validation), which according to Thiemig [63] can be classified as intermediate to good performance. During calibration, the NSE achieved was 0.59, which suddenly increased to 0.98 (during validation) and indicated a perfect correspondence between the observed and simulated streamflow per the criteria suggested by Moriasi et al. [19] and highlighted estimation error variance almost equal to 2%. The SWAT model's calibration for a single outlet improved the model's performance by increasing the objective function and other relevant performance metrics and confined the list of sensitive parameters that are very much specific to the physiographic conditions of the rather smaller watershed with a single outlet. Table 6 shows that the fitted values for the common parameters under both single and multi-outlets-based calibration have considerably changed. The reason behind this is that any parametric values under multi outlets are strongly driven by the spatial heterogeneity of parameters, errors incurring during the acquisition of hydrological data, and errors resulting from optimizing the model's behavior that reach a compromise between the different outlets [69]. Besides, calibrating multi-outlets at spatially large and heterogenic watersheds is computationally intensive [72]. Likewise, the sensitivity ranking has also changed with a greater difference. The number of parameters used in the calibration phase was reduced while calibrating at a single outlet, and simulation performance was also improved by selecting spatially homogenous parameters. The results of this study are in line with a similar study from the Yellow River, where the comparative performance did not change, but the number of sensitive parameters did [10]. * r means an existing parameter value is multiplied by (1+ a given value), ** v__ means the existing parameter value is to be replaced by a given value. SOL_BD refers to soil bulk density (g/cm 3 ), ALPHA_BF is baseflow alpha factor (1/day), GW_DELAY shows groundwater delay time (days), GWQMN is the threshold depth of water in the shallow aquifer required for return flow to occur (mm H 2 O), REVAPMN is the threshold water depth in shallow aquifer for "revap" or percolation to occur (mm), GWQMN is threshold depth of water in the shallow aquifer needed for return flow to occur (mm H 2 O), EPCO is the plant uptake compensation factor, CH_N2 is the hydraulic conductivity of the main channel (mm/h) and SURLAG is the surface runoff lag coefficient.

Conclusions
The performance of NSE, as an objective function, was poor for both single and multi-outlets. However, the KGE performed better when used as an objective function for streamflow estimation by using the SWAT model. The NSE values improved many times as the KGE value increased. The KGE is recommended as the best performance indicator and an alternative to NSE for multi-outlets simulation, especially under a heterogeneous environment. The KGE is based upon the balanced optimization of the model's bias, temporal fitness, and variability. While using the KGE, low values were well expressed for correlation and bias present in the data compared to the NSE, which didn't perform well, especially under low streamflow conditions.
The SWAT model's performance is compromised at multi outlets in large river basins; this is more evident especially under physiographically and climatically large and heterogeneous river basins. Therefore, to overcome these shortcomings during model evaluation, it is recommended to test the SWAT model's performance at the individual outlet. Moreover, simulating multi-outlets is computationally intensive to achieve the desired objective function for enhanced performance of hydrological modeling. For hydrological models' performance in regions with higher heterogeneity in climatic and physiographic conditions, land cover, cropping pattern, streamflow, etc., using the KGE as a performance metric in combination with R 2 , p-and r-factors enhances the SWAT model's simulation performance.
The modeling of CC scenarios demonstrates that the mean annual streamflow under RCPs 4.5 and 8.5 is not much different from each other. Under these scenarios, the model project will see a 15.2% to 15.6% decline in mean annual streamflow by 2045 under the RCPs 4.5 and 8.5, respectively. The results of Akhtar et al. [73] have also demonstrated the decreasing trend in groundwater storage not only in the KRB but also across the entire Indus Basin; the groundwater storage is directly linked to the amount of surface water flow as well as anthropogenic interventions that hamper the groundwater recharge as well as accelerate unsustainable groundwater withdrawal. The mean monthly minimum streamflow is anticipated to rise by 48.2% and 125.6% under the RCP 4.5 and 8.5, respectively; the monthly maximum streamflow is expected to fall by 17.2% and 41.8%, respectively. Furthermore, the peak hydrograph is expected to split into two peaks: one in March-April, indicating earlier snowmelt, and the second peak in August, indicating that the increase in summer or Monsoon rains, contrary to the base period, is contributing to the peak hydrograph in August. The information on a simulated considerable decrease of streamflow in the period from May until July and the increase from August until December can support water management in conceiving adaptation strategies in terms of (i) re-arranging irrigation schedules (and eventually even cropping patterns) towards lower water availability in the beginning and middle of the cropping season and higher water availability in the late season, and (ii) considering the introduction of facilities to store a part of the increased runoff at the end of and after the vegetation period for utilization in agriculture during demand periods taking into account the effects of changing low flow conditions on ecology.