Benchmarking Real-Time Streamﬂow Forecast Skill in the Himalayan Region

: Improving decision-making in various areas of water policy and management (e.g., ﬂood and drought preparedness, reservoir operation and hydropower generation) requires skillful streamﬂow forecasts. Despite the recent advances in hydrometeorological prediction, real-time streamﬂow forecasting over the Himalayas remains a critical issue and challenge, especially with complex basin physiography, shifting weather patterns and sparse and biased in-situ hydrometeorological monitoring data. In this study, we demonstrate the utility of low-complexity data-driven persistence-based approaches for skillful streamﬂow forecasting in the Himalayan country Nepal. The selected approaches are: (1) simple persistence, (2) streamﬂow climatology and (3) anomaly persistence. We generated the streamﬂow forecasts for 65 stream gauge stations across Nepal for short-to-medium range forecast lead times (1 to 12 days). The selected gauge stations were monitored by the Department of Hydrology and Meteorology (DHM) Nepal, and they represent a wide range of basin size, from ~17 to ~54,100 km 2 . We ﬁnd that the performance of persistence-based forecasting approaches depends highly upon the lead time, ﬂow threshold, basin size and ﬂow regime. Overall, the persistence-based forecast results demonstrate higher forecast skill in snow-fed rivers over intermittent ones, moderate ﬂows over extreme ones and larger basins over smaller ones. The streamﬂow forecast skill obtained in this study can serve as a benchmark (reference) for the evaluation of many operational forecasting systems over the Himalayas.


Introduction
Skillful streamflow forecasts are critically important in improving decision-making in water-related policy and management (e.g., flood and drought preparedness, reservoir operation and hydropower generation). Real-time streamflow forecasting over the Himalayas remains a critical challenge. (e.g., [1,2]) because of complex basin physiography, shifting weather patterns and sparse distribution of hydrometeorological monitoring stations. The verification of streamflow forecasts can provide ensemble of resampled historical rainfall [21], for operational forecast skill verification. Despite some inherent predictability of hydrologic systems due to coupling with the weather and climate system, streamflow shows some likelihood of repetition at seasonal to daily time scales, and hence some inherent predictability [19,20,22]. Den Dool [23] and Fraedrich and Ziehmann-Schlumbohm [24] iterate that only forecasts depicting skill better than persistence can handle the forecast of the time derivative. The degree of persistence, however, might demonstrate variation with location, time of the year and the system studied [23,25]. Den Dool [23] indicated that processes with a long-time scale show higher skill with persistence forecasts, which makes persistence a hard-to-beat method for many complex mechanistic models. Though some authors (e.g., [26]) attribute the streamflow persistence to the carryover storage of water in lakes, and below the land surface, the comprehensive evaluation of persistence-based forecasts was lacking until recently. Palash et al. [1], motivated by the concept of requisite simplicity, showed the utility of a simple linear flood-forecasting system for the Ganges, Brahmaputra and Meghna Rivers, using streamflow persistence as the mechanism of prediction. Recently, Ghimire and Krajewski [19] and Krajewski et al. [27] showed, through a comprehensive study of 140 mid-western agricultural watersheds in the United States, that persistence-based forecast skills show strong dependence with the basin scale, while weaker but non-negligible dependence with the properties of the river network.
In the context of a lack of comprehensive evaluation of persistence-based reference forecasts in the data-scarce Himalayan region, their quantitative assessment can provide the benchmark across scales for any operational forecasting system (e.g., DHM's forecasting system). This paper specifically explores the following questions: (1) what is the utility of data-driven persistence-based approaches for skillful streamflow forecasting in the Himalayan region? (2) Which forecast conditions, such as lead time, flow threshold, basin size and flow regime (e.g., perennial/snow-fed and intermittent), benefit potential increase in forecast skill? We organize the paper as follows: In Section 2, we discuss the materials and methods used in this study. Section 3 presents results of this study and Section 4 discusses these findings. In Section 5, we summarize and present some conclusions and limitations of this work.

Study Area and Data
Our domain of interest for this study was Nepal, which is in the central part of Himalayan mountain range. The total area of Nepal is about 147,520 km 2 , located approximately between 80 • 03 -88 • 12 E and 26 • 21 -30 • 26 N ( Figure 1). The topographic elevation ranges from 8848m (elevation of Mt. Everest) in the north to 70m in the south above mean sea level featuring diverse climatic conditions varying from polar to tropical [28]. Forest (39.1%), agriculture (29.8%), barren (10.7%) and snow/glaciers (8.2%) are the major land cover types in Nepal [29]. Most of the climate variability in Nepal is attributed to the high reliefs of river catchments e.g., [4]. The climate in Nepal is dominated by a southwestern monsoon (June-September) that originates from the Bay of Bengal and about 80% of precipitation in Nepal occurs only in the monsoon season [30]. The months of October-November occasionally experience post-monsoon rainfall, November-March typically remains dry, and April-May experiences pre-monsoon rainfall [4,5,9,31]. The summer monsoon is more active in the eastern part of Nepal as the monsoon enters and departs from that region, and the winter monsoon is more active in the western region because of the influence of western disturbances [28]. The average annual precipitation (1990-2010) ranges from~270 mm to~5500 mm [32]. The consequent runoff during the monsoon accounts for 70%-90% of the annual water balance, as shown in [33].
Forecasting 2020, 2 FOR PEER REVIEW 4 Figure 1. Location of the study domain, Nepal. The white patches in the terrain map (not to scale) depict the snow cover (Nepal Himalaya). The land cover is from Uddin et al. [29]. The red dots represent stream gauge stations monitored by the Department of Hydrology and Meteorology (DHM), Nepal.
Most rivers in Nepal drain from north to south and eventually to the Ganges River in India. The rivers in Nepal can be broadly categorized into five major river systems: Karnali, Narayani, Koshi, Mahakali, and the southern and Mahabharat rivers [33]. The former four river systems are combinations of rain-and snow-fed rivers, while the latter river system mostly comprise mid-size rivers, such as the Mechi, Kankai, Kamala, Bagmati, Babai and West Rapti, mostly fed by rainfall and characterized by frequent flash floods [33]. The Karnali, Gandaki and Koshi river systems highlighted in Figure 1 represent about two-thirds of the area of Nepal. These river systems show variability in their hydroclimatic conditions, in addition to the topographic characteristics such as average basin slope and snow cover. In this study, we used historic daily streamflow data at stream gauges stations (red dots in Figure 1) monitored by the DHM across Nepal. Daily streamflow data is the highestresolution data publicly available from the DHM. It corresponds to the mean values of three readings recorded daily at 8 AM, 12 noon and 4 PM, local time. The streamflow data across stations span between 1962 and 2010, with varying record lengths. Thirty-eight stations have more than 30 years of historical daily streamflow records ( Figure A1, Appendix A). We screened the stream gauge stations based on the availability criteria of at least 10 years of streamflow records and at least fair quality data categorized by DHM. The stations that are not shown within the three major river systems correspond to the rivers of the southern and Mahabharat river systems. In total, we used streamflow data from 65 unregulated stream gauge stations. The corresponding basin size varies Figure 1. Location of the study domain, Nepal. The white patches in the terrain map (not to scale) depict the snow cover (Nepal Himalaya). The land cover is from Uddin et al. [29]. The red dots represent stream gauge stations monitored by the Department of Hydrology and Meteorology (DHM), Nepal.
Most rivers in Nepal drain from north to south and eventually to the Ganges River in India. The rivers in Nepal can be broadly categorized into five major river systems: Karnali, Narayani, Koshi, Mahakali, and the southern and Mahabharat rivers [33]. The former four river systems are combinations of rain-and snow-fed rivers, while the latter river system mostly comprise mid-size rivers, such as the Mechi, Kankai, Kamala, Bagmati, Babai and West Rapti, mostly fed by rainfall and characterized by frequent flash floods [33]. The Karnali, Gandaki and Koshi river systems highlighted in Figure 1 represent about two-thirds of the area of Nepal. These river systems show variability in their hydroclimatic conditions, in addition to the topographic characteristics such as average basin slope and snow cover. In this study, we used historic daily streamflow data at stream gauges stations (red dots in Figure 1) monitored by the DHM across Nepal. Daily streamflow data is the highest-resolution data publicly available from the DHM. It corresponds to the mean values of three readings recorded daily at 8 AM, 12 noon and 4 PM, local time. The streamflow data across stations span between 1962 and 2010, with varying record lengths. Thirty-eight stations have more than 30 years of historical daily streamflow records ( Figure A1, Appendix A). We screened the stream gauge stations based on the availability criteria of at least 10 years of streamflow records and at least fair quality data categorized by DHM. The stations that are not shown within the three major river systems correspond to the rivers of the southern and Mahabharat river systems. In total, we used streamflow data from 65 unregulated stream gauge stations. The corresponding basin size varies from~17 to~54,100 km 2 , with more than half of the stations monitoring catchments greater than 1000 km 2 ( Figure A1, Appendix A). Note that the hydropower projects built in some of these rivers are run-off-river schemes, which typically do not store the water.

Experimental Design
Several persistence-based forecast approaches have been discussed in den Dool [19], Wu and Dickinson [23] and Ghimire and Krajewski [25]. Here, we present three of these approaches to generate the reference (benchmark) forecasts in the time domain and evaluate them; these are simple persistence, streamflow climatology and anomaly persistence. Our experimental setup mostly follows the methods outlined in Ghimire and Krajewski [19]. If t represents the time of observation of streamflow Q(t), the forecast for lead time ∆t is given by Note that the forecasts are generated for the entire streamflow time-series in the records for 65 stream gauge stations in Nepal. While considering the entire time-series, we account for the interannual variability of streamflow.
Here, we use the "climatology" as the average of streamflow on record at time t. The corresponding forecast for lead time ∆t is given by where Q(t + ∆t) refers to the streamflow at time t + ∆t from previous years of record and n refers to the number of years in record. The anomaly persistence forecast scheme assumes that streamflow anomalies persist over the lead time ∆t. The anomalies are computed with respect to the climatology. The anomalies at t and t + ∆t i.e., Q (t) and Q (t + ∆t), respectively, are computed as where clim(t) and clim(t + ∆t) are climatology forecasts obtained from Equation (2) at t and t + ∆t, respectively. The Equations (3) and (4) yield the forecast at t + ∆t as The forecasts computed in Equations (1)-(5) use the entire streamflow time-series in record, i.e., at least 10 years of streamflow data. Here, we are also interested in exploring the forecast skill associated with the direct runoff component of streamflow. In other words, we computed forecast skill from persistence-based approaches for the rainfall-runoff events. Acknowledging the fact that the separation of runoff components from baseflow is not easy, we used the automated separation technique used by the United States Geological Survey (USGS) called the hydrograph separation program (HYSEP) [34]. The HYSEP employs three methods to separate storm flow from baseflow in the hydrograph: fixed interval, sliding interval, and local minimum. The duration of the surface runoff is computed empirically by where K is the number of days after which the surface runoff stops and A is the upstream drainage area of the basin in sq. miles [34]. The interval used to separate the storm flow is 2K*, which should be the nearest odd integer to 2K. The hydrograph separation starts one interval, i.e., 2K* days, before the start of the date selected for the start of the separation, while it ends 2K* days after the selected date. For further details of the separation technique, refer to Sloto and Crouse [34]. Figure A2 of the Appendix A shows the demonstration of storm flow separation across two river basins (small and large scales) using the sliding interval method.

Evaluation Metrics
We used several standard hydrologic forecast evaluation measures to evaluate the forecasts described in Section 2.2. The metrics we computed were Kling-Gupta efficiency (KGE), mean absolute error (MAE), normalized MAE (nMAE), timing of the hydrographs (T H ) and peak difference (PD). The KGE comprises three components: Pearson's correlation coefficient (r), variance ratio (α) and mean ratio (β) (refer to [35]). It is given by µ o . σ f and σ o correspond to the standard deviations of forecasts and observed streamflow, respectively. µ f and µ o correspond to the means of forecasts and observed streamflow, respectively. The ideal value of KGE is equal to 1, which can be obtained when all three components attain values of 1. For example, if the values of α and β are close to 1, the corresponding KGE is dominated by the correlation component. By construction, the KGE associated with the persistence-based forecasts is dominated by correlation. When the mean of the observations is used as a benchmark, the frontier of the KGE separating between the good model and the bad model is equal to −0.41 see [36]. In our discussion, we primarily focus on the KGE.
We compute the MAE associated with the forecasts as below: where Q f is the streamflow forecast, Q o is the observation and N is the number of data pairs. For fair comparison across the basin scales, we normalize MAE by the upstream drainage area to compute nMAE. The hydrograph timing (T H ) is computed as the number of hours a hydrograph requires to be shifted so that the cross-correlation between the forecasts and the observations is maximized. By construction, T H is close to the lead time of the forecast. The peak difference (PD) in percentage is computed as where peak f and peak o are peaks of the forecast and the observations, respectively.

Results
In this section, we present key results from our experiment evaluating persistence-based reference forecasts at 65 stream gauge stations in the Himalayan region in Nepal. First, we present results for three major river systems (basin-wise forecasts), and then for the entire region (regional forecasts).

Basin-Wise Forecasts
Our notion behind exploring basin-wise forecast skill is due to the variability of snow/glacier cover, storage conditions and hydroclimatic conditions across three major river basins. In Figure 2, we show the variability of the KGE metric with drainage area for three major river systems of Nepal (Karnali, Narayani and Koshi) across forecast lead times of 1, 3, 6 and 12-days. The result corresponds to the simple persistence forecasts. For simple persistence forecasts, the KGE is dominated by the correlation component r. For all three basins across most of the stations, KGE > 0.8 for the lead time of 1-day. The stronger relationship between the KGE and basin size emerges with the increasing forecast lead times. Note that the pattern is not as strong in the Koshi river basin, but the KGE values are relatively higher across basin scales. In addition, there are not many smaller monitored basins in the Koshi river system to exactly decide on the pattern relative to other two river systems. The overall results across three river systems clearly show a strong spatio-temporal dependence of simple persistence-based forecast skill.

Regional Streamflow Forecasts
The results from basin-wise forecasts show that no distinct pattern of forecast skill emerges across major river systems in Nepal. Therefore, we decided to pool all stations from three major river systems, including the mid-sized river stations from other river basins (see Figure 1), and refer to their forecasts as regional forecasts. In Figure 3, we present results in terms of two skill metrics, i.e., KGE (first row) and nMAE (second row), based on simple persistence forecasts (Equation (1)). The relationship between the KGE and basin size is more prominent for pooled stations across lead times of the forecast. There are, however, few stations which diverge from the overall pattern of the KGE observed in the basin-wise forecast evaluation. We explore the forecast performance associated with these stations separately (see Figure A3, Appendix A). We find that their forecast skills show more variability and decay much faster with forecast lead times. In addition, these stations are mostly midsized river basins associated with the intermittent flow regime. We present the detailed discussion on the origin of their forecast skill later. Note that basins > 1000 km 2 show relatively higher forecast skill (KGE~0.7) even at the lead time of 12 days. The values of nMAE also depict similar dependence with basin size as the KGE. The values of nMAE show a much stronger relationship with basin scale at shorter lead times, while starting to diverge at longer lead times particularly for small basin scales of size < 1000 km 2 . By construction, PD for the simple persistence forecast is close to 0. In addition, by construction, PD for the climatology-based forecasts does not vary with lead times (see Figure A4, Appendix A). The corresponding median value of PD across basins in Nepal is about −76% (underestimation of the peak). For anomaly persistence, however, the median values of PD across basins range from about −1.5% to −3% at lead times of 1 day to 12 days, respectively (see Figure A4, Appendix A). Note that climatology-based forecasts show sizable a dependence of PD on basin size, while this is not apparent for anomaly persistence-based forecasts.

Regional Streamflow Forecasts
The results from basin-wise forecasts show that no distinct pattern of forecast skill emerges across major river systems in Nepal. Therefore, we decided to pool all stations from three major river systems, including the mid-sized river stations from other river basins (see Figure 1), and refer to their forecasts as regional forecasts. In Figure 3, we present results in terms of two skill metrics, i.e., KGE (first row) and nMAE (second row), based on simple persistence forecasts (Equation (1)). The relationship between the KGE and basin size is more prominent for pooled stations across lead times of the forecast. There are, however, few stations which diverge from the overall pattern of the KGE observed in the basin-wise forecast evaluation. We explore the forecast performance associated with these stations separately (see Figure A3, Appendix A). We find that their forecast skills show more variability and decay much faster with forecast lead times. In addition, these stations are mostly mid-sized river basins associated with the intermittent flow regime. We present the detailed discussion on the origin of their forecast skill later. Note that basins > 1000 km 2 show relatively higher forecast skill (KGE~0.7) even at the lead time of 12 days. The values of nMAE also depict similar dependence with basin size as the KGE. The values of nMAE show a much stronger relationship with basin scale at shorter lead times, while starting to diverge at longer lead times particularly for small basin scales of size < 1000 km 2 . By construction, PD for the simple persistence forecast is close to 0. In addition, by construction, PD for the climatology-based forecasts does not vary with lead times (see Figure A4, Appendix A). The corresponding median value of PD across basins in Nepal is about −76% (underestimation of the peak). For anomaly persistence, however, the median values of PD across basins range from about −1.5% to −3% at lead times of 1 day to 12 days, respectively (see Figure A4, Appendix A). Note that climatology-based forecasts show sizable a dependence of PD on basin size, while this is not apparent for anomaly persistence-based forecasts. In this section, we evaluate the forecast performance associated with various reference forecast schemes described in Section 2. Here, we present the one-to-one comparison of forecast skill in terms of KGE (first row) and nMAE (second row) between these schemes in Figure 4. Apparently, persistence-based forecasts outperform climatology-based forecasts at shorter lead times across all basin scales. Both simple persistence and climatology-based forecasts perform similarly for longer lead times. Anomaly persistence, however, performs similarly with simple persistence for shorter lead times, while it outperforms simple persistence for longer lead times at smaller basin scales (also see Figure 5). Note that climatology is an integral component of anomaly persistence forecasts. Therefore, the improved skill of anomaly persistence at longer lead times and smaller basin scales could be attributed to the improved performance of climatology-based forecasts at longer lead times.  In this section, we evaluate the forecast performance associated with various reference forecast schemes described in Section 2. Here, we present the one-to-one comparison of forecast skill in terms of KGE (first row) and nMAE (second row) between these schemes in Figure 4. Apparently, persistence-based forecasts outperform climatology-based forecasts at shorter lead times across all basin scales. Both simple persistence and climatology-based forecasts perform similarly for longer lead times. Anomaly persistence, however, performs similarly with simple persistence for shorter lead times, while it outperforms simple persistence for longer lead times at smaller basin scales (also see Figure 5). Note that climatology is an integral component of anomaly persistence forecasts. Therefore, the improved skill of anomaly persistence at longer lead times and smaller basin scales could be attributed to the improved performance of climatology-based forecasts at longer lead times. In this section, we evaluate the forecast performance associated with various reference forecast schemes described in Section 2. Here, we present the one-to-one comparison of forecast skill in terms of KGE (first row) and nMAE (second row) between these schemes in Figure 4. Apparently, persistence-based forecasts outperform climatology-based forecasts at shorter lead times across all basin scales. Both simple persistence and climatology-based forecasts perform similarly for longer lead times. Anomaly persistence, however, performs similarly with simple persistence for shorter lead times, while it outperforms simple persistence for longer lead times at smaller basin scales (also see Figure 5). Note that climatology is an integral component of anomaly persistence forecasts. Therefore, the improved skill of anomaly persistence at longer lead times and smaller basin scales could be attributed to the improved performance of climatology-based forecasts at longer lead times.   From the real-time streamflow forecasting point of view, it would be more informative to explore how KGE evolves over time across basin scales. In Figure 5, we show that the evolution of KGE skill is associated with three methods across four nested basins in the Karnali river system (refer inset). In addition to the apparent strong dependence of KGE with basin scales, the temporal evolutive pattern of KGE shows variability across basin scales. For example, the smallest basin (A~159 km 2 ) depicts the sharp drop in the KGE metric up to lead time of four days and remains stable from thereon. However, as the basin size increases, the rate of decrease of the KGE metric becomes slower. The other important observation is that up to about four days of lead time, both persistence-based forecast schemes outperform climatology-based forecasts. After four days of lead time, the difference between climatology and anomaly persistence-based forecast skill is negligible. Note, however, that the difference in KGE between simple persistence and other two forecast schemes shows systematic decrease with the increasing basin scales.
The results presented above are associated with the entire streamflow time-series. In other words, the forecast skills computed originate from the contribution of both baseflow and stormflow. However, the intrinsic question is whether we can achieve similar forecast performance for the storm flow (i.e., direct runoff component of hydrographs). In Figure 6, we present the forecast performance based on simple persistence for the direct runoff obtained using HYSEP described in Section 2. As we demonstrate through stormflow hydrographs in Figure A5, Appendix A, at a lead time of one day, the simple persistence-based forecasts are essentially delayed by one day. In other words, the timing of forecast hydrographs and the timing of the extreme event peak are delayed by one day, while preserving the magnitude of the peak of the hydrograph. Note the damping of smaller rainfallrunoff event stormflows in the smaller nested basin (see a, Figure A5, Appendix A) by the river network aggregation process at the larger basin (see b, Figure A5, Appendix A). The resulting pattern of both KGE and nMAE with basin scales is similar to the one presented before. Notably, reasonable KGE could be achieved for a one-day-ahead stormflow forecast, while showing significant decline for longer lead times. Note that the basins of size > 1000 km 2 still show KGE > 0.3 for 12 days ahead forecast, illustrating the potential of using it as reference forecast for the evaluation of stormflow forecasts particularly at short-range. From the real-time streamflow forecasting point of view, it would be more informative to explore how KGE evolves over time across basin scales. In Figure 5, we show that the evolution of KGE skill is associated with three methods across four nested basins in the Karnali river system (refer inset). In addition to the apparent strong dependence of KGE with basin scales, the temporal evolutive pattern of KGE shows variability across basin scales. For example, the smallest basin (A~159 km 2 ) depicts the sharp drop in the KGE metric up to lead time of four days and remains stable from thereon. However, as the basin size increases, the rate of decrease of the KGE metric becomes slower. The other important observation is that up to about four days of lead time, both persistence-based forecast schemes outperform climatology-based forecasts. After four days of lead time, the difference between climatology and anomaly persistence-based forecast skill is negligible. Note, however, that the difference in KGE between simple persistence and other two forecast schemes shows systematic decrease with the increasing basin scales.
The results presented above are associated with the entire streamflow time-series. In other words, the forecast skills computed originate from the contribution of both baseflow and stormflow. However, the intrinsic question is whether we can achieve similar forecast performance for the storm flow (i.e., direct runoff component of hydrographs). In Figure 6, we present the forecast performance based on simple persistence for the direct runoff obtained using HYSEP described in Section 2. As we demonstrate through stormflow hydrographs in Figure A5, Appendix A, at a lead time of one day, the simple persistence-based forecasts are essentially delayed by one day. In other words, the timing of forecast hydrographs and the timing of the extreme event peak are delayed by one day, while preserving the magnitude of the peak of the hydrograph. Note the damping of smaller rainfall-runoff event stormflows in the smaller nested basin (see a, Figure A5, Appendix A) by the river network aggregation process at the larger basin (see b, Figure A5, Appendix A). The resulting pattern of both KGE and nMAE with basin scales is similar to the one presented before. Notably, reasonable KGE could be achieved for a one-day-ahead stormflow forecast, while showing significant decline for longer lead times. Note that the basins of size > 1000 km 2 still show KGE > 0.3 for 12 days ahead forecast, illustrating the potential of using it as reference forecast for the evaluation of stormflow forecasts particularly at short-range. As we highlighted in Section 2, our region predominantly contains rivers in the perennial flow regime (snow/glacier fed). There are, however, some rivers originating in the southern plains of Nepal which are predominantly in the intermittent flow regime. Since we established the fact that forecast skill shows strong spatial scale dependence, it would be of interest to the forecasting community to explore the dependence on the flow regimes. The ideal comparison would be to evaluate forecasts conditional on the basin scale. We identified four basins of a size of ~2500 km 2 ; two perennial river basins in northern Nepal and the other two intermittent river basins in southern Nepal (see inset of Figure 7). Figure 7 demonstrates the evolution of KGE (simple persistence-based) with lead times for these four basins. A distinct pattern of KGE evolution emerges. The perennial rivers depict higher forecast performance with gradual decline in the forecast performance with forecast horizon (lead time) while the intermittent rivers show sharp decline in the forecast performance with forecast horizon. Since intermittent rivers are mostly driven by rainfall-runoff events, their forecast skill evolution shows similar behavior elucidated by the results in Figure 6. Note a sudden spike of KGE for the Babai river at four days of lead time. Though it is hard to point out explicitly the reason, one could attribute it to the ability of four days ahead forecast to capture two consecutive historic floods (separated by 4 days) in the year 2015. As we highlighted in Section 2, our region predominantly contains rivers in the perennial flow regime (snow/glacier fed). There are, however, some rivers originating in the southern plains of Nepal which are predominantly in the intermittent flow regime. Since we established the fact that forecast skill shows strong spatial scale dependence, it would be of interest to the forecasting community to explore the dependence on the flow regimes. The ideal comparison would be to evaluate forecasts conditional on the basin scale. We identified four basins of a size of~2500 km 2 ; two perennial river basins in northern Nepal and the other two intermittent river basins in southern Nepal (see inset of Figure 7). Figure 7 demonstrates the evolution of KGE (simple persistence-based) with lead times for these four basins. A distinct pattern of KGE evolution emerges. The perennial rivers depict higher forecast performance with gradual decline in the forecast performance with forecast horizon (lead time) while the intermittent rivers show sharp decline in the forecast performance with forecast horizon. Since intermittent rivers are mostly driven by rainfall-runoff events, their forecast skill evolution shows similar behavior elucidated by the results in Figure 6. Note a sudden spike of KGE for the Babai river at four days of lead time. Though it is hard to point out explicitly the reason, one could attribute it to the ability of four days ahead forecast to capture two consecutive historic floods (separated by 4 days) in the year 2015.
It is extremely important for forecasters to achieve good predictability of higher flow quantiles for flood preparedness and mitigation efforts. For the same four basins, we evaluate the forecast performance associated with the different streamflow quantiles. We refer to flow quantiles as flow threshold. Figure 8 shows the evolution of KGE across lead times for various flow quantiles. KGE reported here is computed for the streamflow forecasts exceeding the corresponding flow quantile. Note that as the flow quantile increases, the sample size used to compute the forecast skill systematically decreases. Given that we evaluate the forecast skill using the continuous streamflow time-series in record, we consider the sample size to be enough for the evaluation. Clearly, the KGE for the intermittent rivers (lower panel) shows sharp decline for the higher flow quantiles. The perennial rivers (upper panel), however, depict much better forecast performance even for the higher flow quantiles at longer lead times. However, of the two, the Marshyangdi basin at Bhakundebesi, shows relatively faster decline in the performance of flows exceeding 60th percentile. As Ghimire and Krajewski [19] highlighted, this variability in KGE could be explained by the difference in their river network geometries, typically explained by the network width function see [37,38] for detail. Explicit attribution is beyond the scope of this paper. horizon (lead time) while the intermittent rivers show sharp decline in the forecast performance with forecast horizon. Since intermittent rivers are mostly driven by rainfall-runoff events, their forecast skill evolution shows similar behavior elucidated by the results in Figure 6. Note a sudden spike of KGE for the Babai river at four days of lead time. Though it is hard to point out explicitly the reason, one could attribute it to the ability of four days ahead forecast to capture two consecutive historic floods (separated by 4 days) in the year 2015.  It is extremely important for forecasters to achieve good predictability of higher flow quantiles for flood preparedness and mitigation efforts. For the same four basins, we evaluate the forecast performance associated with the different streamflow quantiles. We refer to flow quantiles as flow threshold. Figure 8 shows the evolution of KGE across lead times for various flow quantiles. KGE reported here is computed for the streamflow forecasts exceeding the corresponding flow quantile. Note that as the flow quantile increases, the sample size used to compute the forecast skill systematically decreases. Given that we evaluate the forecast skill using the continuous streamflow time-series in record, we consider the sample size to be enough for the evaluation. Clearly, the KGE for the intermittent rivers (lower panel) shows sharp decline for the higher flow quantiles. The perennial rivers (upper panel), however, depict much better forecast performance even for the higher flow quantiles at longer lead times. However, of the two, the Marshyangdi basin at Bhakundebesi, shows relatively faster decline in the performance of flows exceeding 60th percentile. As Ghimire and Krajewski [19] highlighted, this variability in KGE could be explained by the difference in their river network geometries, typically explained by the network width function see [37,38] for detail. Explicit attribution is beyond the scope of this paper.

Discussion
A clear signature emerging out of our persistence-based streamflow forecasts skill is its strong spatio-temporal dependence. Our results substantiate it further. The results from Figures 3 and 6 demonstrate that significant contribution to the persistence-based forecast skill is from the baseflow contribution. In other words, the contribution from snowpack (snow/glacier), groundwater and

Discussion
A clear signature emerging out of our persistence-based streamflow forecasts skill is its strong spatio-temporal dependence. Our results substantiate it further. The results from Figures 3 and 6 demonstrate that significant contribution to the persistence-based forecast skill is from the baseflow contribution. In other words, the contribution from snowpack (snow/glacier), groundwater and storage in the contributing catchments plays an important role in the streamflow hydrographs, particularly for the perennial rivers in the Himalayan region of Nepal. Typically, the streamflow predictability at small basin scales is more tied to the rainfall than the large basin scales. We highlighted this fact through our results for the intermittent rivers. The perennial rivers, however, receive sizable contributions from the base flow relative to the rainfall even in the small basin scales, hence the relatively good forecast skill. Moreover, the persistence-based forecast derives skill from the memory of the system, in this case river catchments. The larger the basin scales, the longer the memory of the catchments, and hence the long-range persistence. We show from our results that higher forecast skill (KGE > 0.7) could be achieved for a basin size larger than 1000 km 2 , even at a longer forecast horizon. Another important contribution to the skillful forecasts of larger basins at longer lead times is from the water transport in the river network, where the streamflow aggregation process controls the shape of the streamflow fluctuations [27,[39][40][41]. In addition to the land surface (e.g., catchment) memory, the streamflow predictability has a strong connection to the persistence in the land surface initial hydrologic conditions (e.g., soil moisture, groundwater, current streamflow and snowpack) [42][43][44], providing the main source for the skillful streamflow forecasts.
The dependence of forecast skill depicted in this study with the basin size and forecast lead times is consistent with the results presented in Ghimire and Krajewski [19]. Given a large number of glacierized basins in the region, as opposed to agricultural watersheds used in the study of Ghimire and Krajewski [19], more stations depict higher streamflow forecast skill. For instance, at a lead time of one day, we show a median value of KGE of about 0.92 in the region, compared to the median value of KGE of about 0.78 presented in their study. Therefore, our results demonstrate a clear implication of persistence-based forecasts for the real-time streamflow forecasting in the Himalayan region. Our illustration of reasonably good streamflow forecast skill across spatial and temporal scales provides a reliable benchmark (reference) to evaluate the efficacy of the operational real-time streamflow forecasting. Particularly for the larger basin scales, we are able to show that it is difficult for many operational mechanistic hydrologic models to outperform persistence-based forecast skills. Any operational forecasting scheme that can aptly depict forecast skill better than the three-reference forecast schemes presented in this study, can be considered skillful. We are able to show quantitatively that it can serve as a skillful reference for the evaluation of real-time flood forecasts (higher flow quantiles) up to the medium range for perennial rivers while up to the short-range for the intermittent rivers. Our findings clearly show that persistence-based forecast scheme, anomaly persistence in particular, could provide skillful reference forecasts for the evaluation of current operational streamflow forecasting systems in the Himalayan country of Nepal. Moreover, it could provide guidance to the flood related decision-making process, especially in the short-medium forecast range.

Conclusions
In this study, we explored the utility of persistence-based forecasting schemes to benchmark the real-time streamflow forecasting system in the Himalayan region of Nepal. We used the daily streamflow data at 65 stream gauge stations monitored by the DHM, Nepal, to generate the forecasts and evaluate the associated skills in the short-to-medium forecast horizon. To this end, we employed three reference forecast schemes: (i) simple persistence, (ii) streamflow climatology and (iii) anomaly persistence. Based on the results from this study, we could reach at following conclusions: Persistence-based forecast skill shows strong dependence with the basin scale and forecast lead time. Anomaly persistence forecasts outperform others at small basin scales and longer lead times and hence can be a better selection for benchmarking the real-time streamflow forecasting system in the Himalayan region of Nepal. The forecast skill shows strong dependence with the flow regime and flow threshold. The verification results show higher forecast skill for perennial rivers over intermittent rivers and moderate flow quantiles over high flow quantiles.
This study highlights the fact that a persistence-based forecast scheme is difficult to outperform using many mechanistic hydrologic models, particularly for larger basin scales in the Himalayan region. The findings from this study have implications for evaluating real-time streamflow forecasts from the operational forecasting system across basin scales and lead times in this region. Moreover, it provides insights on designing the streamflow monitoring network for future applications.
Our study, however, is not without limitations. We did not account for the measurement uncertainty associated with streamflow observations. We considered the published streamflow data to be within the acceptable limit of observational uncertainty. We did not account for the uncertainty associated with the rating curves. Moreover, we could not perform the sub-daily forecast skill evaluation due to the unavailability of published sub-daily streamflow observations. We expect that the analysis using sub-daily streamflow data will not lead to a significantly different inference regarding the forecasting performance of persistence-based systems, which we could explore further in our future research.