Improving Seasonal Forecasts for Basin Scale Hydrological Applications

: Seasonal forecasting is a fast-growing climate prediction application that puts into practice the latest improvements in the climate modeling research. Skillful seasonal forecasts can drastically aid practical applications and productive sectors by reducing weather-related risks such as water availability. In this study two operational seasonal forecasting systems are tested in a water resource important watershed on the island of Crete. Hindcast precipitation and temperature data from the European Centre for Medium-Range Weather Forecasts (ECMWF) System 4 and Met Ofﬁce GloSea5 systems are tested for their forecast skill up to seven months ahead. Data of both systems are downscaled and corrected for biases towards the observations. Different correction methods are applied and evaluated. Post-processed data from these methods are used as an input to the hydrological model HYPE, to provide streamﬂow forecasts. Results show that a prior adjustment of the two systems’ precipitation and temperature may improve their forecast skill. Adjusted GloSea5 forecasts are slightly better estimates than the corresponding forecasts based on System 4. The results show that both systems provide a skillful ensemble streamﬂow prediction for one month ahead, with the skill decreasing rapidly beyond that. Update of the initial state of HYPE results in the reduction of the variability of the ensemble ﬂow predictions and improves the skill but only as far as two months of forecast. Finally, the two systems were tested for their ability to capture a limited number of historical streamﬂow drought events, with indications that GloSea5 has a slightly better skill.


Introduction
Seasonal forecasting has advanced in the last decade. With improved skill, seasonal forecasting is able to provide valuable information to management authorities. Exploiting forecasts can result in better anticipation of water-and climate-related risks in the near future and improved preparedness. At a regional scale, examples can be found in the management of European hydrological extremes [1,2] and the European Forest Fire Information System (EFFIS) [3], as well as in triggering risk reduction and relief actions in flood prone areas [4]. Along with these advances, there is a growing pool of recent research that assesses forecast skill at regional or watershed level [5][6][7][8]. Seasonal streamflow forecasting is a challenging task for climate forecast systems, but also for pre-hydrological simulation processing methods. Marco et al. [9] assess seasonal forecasts of the inflow and outflow of a reservoir in northeastern Spain, finding that the skill in inflow prediction is limited to the first month of the forecast. In a pan-European work, Pechlivanidis et al. [7] analyzed the skill of streamflow forecasts across a large number of sub-basins in Europe using the E-HYPE hydrological model. They found that the forecasting system exhibits good performance in water volume estimation in the majority of the tested basins, evaluated, as well as the effect of the initialization of the hydrological model state. Finally, the results are assessed for their skill to predict the streamflow drought state and a small number of historical streamflow drought events.

Hindcast Data
Daily precipitation and temperature hindcast data from the ECMWF S4 [10] and the UK Met Office' system GloSea5 [48] were utilized for this study. The data were available from between 1981 and 2015 for ECMWF S4 and 1996-2010 for GloSea5. The data were bilinearly interpolated at sub-basin level, as this is considered a straight-forward methodology that does not add redundant information to the results [49], while it considers information from the surrounding grid-cells that reinforces the robustness of the forecast. The skill of the forecast data is evaluated by comparing observations prior and after the adjustment of the biases. The forecast data are rearranged according to Figure 1, to create seamless lagged ensemble time series with similar lead time characteristics [48]. This pre-processing step serves two different purposes. First, to determine the forecast skill at different lead times, which is one of the objectives of this study. Second, to apply bias adjustment at the different lead-times between one and seven months (LT1 to LT7), as different lead-times of the forecast exhibit different bias characteristics. Figure 2 exhibits the average bias of the raw forecast data per lead time, comparing the basin average observations. For the adjustment, two different strategies were applied. The first considers the adjustment to the different forecast realizations individually, while in the second, all the forecast realizations are adjusted together. The former approach considers that the potential biases are realization-dependent and, hence, each one is treated independently (hereby referred to as independent realization correction-IRC). This provides the flexibility to the adjustment method to correct potentially unrealistic values in a single member, without tampering the correction of the other ensemble members. On the other hand, since it can be argued that while the forecast systems biases are time dependent (Figure 2), they are ensemble independent. In this approach, the ensemble members that exhibit special characteristics, e.g., wetter conditions, will retain their relative difference compared to the rest of the members. Nonetheless, this may trigger more false alarms in the case of a risk-averse decision maker. In this study, tests were conducted in the case study watershed to assess each strategy's performance for the adjustment of precipitation and temperature variable biases.

Hindcast Data
Daily precipitation and temperature hindcast data from the ECMWF S4 [10] and the UK Met Office' system GloSea5 [48] were utilized for this study. The data were available from between 1981 and 2015 for ECMWF S4 and 1996-2010 for GloSea5. The data were bilinearly interpolated at subbasin level, as this is considered a straight-forward methodology that does not add redundant information to the results [49], while it considers information from the surrounding grid-cells that reinforces the robustness of the forecast. The skill of the forecast data is evaluated by comparing observations prior and after the adjustment of the biases. The forecast data are rearranged according to Figure 1, to create seamless lagged ensemble time series with similar lead time characteristics [48]. This pre-processing step serves two different purposes. First, to determine the forecast skill at different lead times, which is one of the objectives of this study. Second, to apply bias adjustment at the different lead-times between one and seven months (LT1 to LT7), as different lead-times of the forecast exhibit different bias characteristics. Figure 2 exhibits the average bias of the raw forecast data per lead time, comparing the basin average observations. For the adjustment, two different strategies were applied. The first considers the adjustment to the different forecast realizations individually, while in the second, all the forecast realizations are adjusted together. The former approach considers that the potential biases are realization-dependent and, hence, each one is treated independently (hereby referred to as independent realization correction-IRC). This provides the flexibility to the adjustment method to correct potentially unrealistic values in a single member, without tampering the correction of the other ensemble members. On the other hand, since it can be argued that while the forecast systems biases are time dependent (Figure 2), they are ensemble independent. In this approach, the ensemble members that exhibit special characteristics, e.g., wetter conditions, will retain their relative difference compared to the rest of the members. Nonetheless, this may trigger more false alarms in the case of a risk-averse decision maker. In this study, tests were conducted in the case study watershed to assess each strategy's performance for the adjustment of precipitation and temperature variable biases.

Bias Adjustment
A quantile mapping (QM) and a simpler multiplicative/additive bias correction methodology were used to adjust the biases of the precipitation and temperature forecasts. The QM is based on the well-established multi-segment statistical bias correction of daily Global Climate Model (GCM) and Regional Climate Model (RCM) precipitation and temperature parameters. Technical details of those methodologies can be found in [50,51]. These methodologies have been used mainly in a series of climate change impact studies [52][53][54][55][56][57][58]. Additionally, less intrusive multiplicative/additive corrections for precipitation and temperature data were tested: the temperature average was scaled by adding the difference between the forecast and the observed data and, accordingly, the precipitation was multiplied by a factor equal to the fraction of the average observed over the forecast average precipitation. Equations (1) and (2) describe the methodology.
where and are the daily corrected values, and are the raw forecast data, and are the mean of the calendar month to which day i belongs, and and are the respective raw values of the calendar month.
The methodology can be found in [59] who refers to it as a "simple unbiasing" method (SU). Both methodologies are applied explicitly for each calendar month. The two different adjustment methods (QM and SU) and the two different adjustment strategies (individual or joint member correction) were combined and applied to the daily data as shown in Figure 3.

Bias Adjustment
A quantile mapping (QM) and a simpler multiplicative/additive bias correction methodology were used to adjust the biases of the precipitation and temperature forecasts. The QM is based on the well-established multi-segment statistical bias correction of daily Global Climate Model (GCM) and Regional Climate Model (RCM) precipitation and temperature parameters. Technical details of those methodologies can be found in [50,51]. These methodologies have been used mainly in a series of climate change impact studies [52][53][54][55][56][57][58]. Additionally, less intrusive multiplicative/additive corrections for precipitation and temperature data were tested: the temperature average was scaled by adding the difference between the forecast and the observed data and, accordingly, the precipitation was multiplied by a factor equal to the fraction of the average observed over the forecast average precipitation. Equations (1) and (2) describe the methodology.
where T c i and P c i are the daily corrected values, T r i and P r i are the raw forecast data, T O Mi and P O Mi are the mean of the calendar month to which day i belongs, and T r Mi and P r Mi are the respective raw values of the calendar month.
The methodology can be found in [59] who refers to it as a "simple unbiasing" method (SU). Both methodologies are applied explicitly for each calendar month. The two different adjustment methods (QM and SU) and the two different adjustment strategies (individual or joint member correction) were combined and applied to the daily data as shown in Figure 3.

Hydrological Model HYPE
The hydrological simulations were performed by hydrological model HYPE [60]. The model simulates water flow from precipitation through soil to the river outlet. The HYPE model has been assessed in seasonal forecast applications in basin scale Sweden [61], the Niger River basin [62] and at pan-European scale [63]. The HYPE model (version 5.2) was calibrated and validated by using observed precipitation and temperature data. The model considers different hydrological response units (HRUs) that are lump treated within the model.

Performance Estimation
The hydrological model calibration and validation was performed by using the Nash-Sutcliffe efficiency coefficient (NSE) [64] and the Kling-Gupta efficiency (KGE) [65]. Both aforementioned metrics have a perfect value of 1 while values below zero indicate that the simulation has no skill. The forecast results of the hydrological simulations were assessed using the KGE, and continuous ranked probability skill score (CRPSS) [66]. The latter metric evaluates the probabilistic skill of the forecast by measuring the weighted average skill over a range of discrete threshold levels for which exceedance probabilities are computed [67]. The skill score estimation is based on CRPSS values. A perfect score equals 1, while negative values indicate no skill. The evaluation is performed at daily and monthly timescales, to exhibit dependencies of the skill in the timescale.

Case Study Area
The island of Crete covers more than 6% of the area of Greece and is the fifth largest Mediterranean island. The island's climate is dry sub-humid Mediterranean, with warm dry summers and cold humid winters. More than 40% of annual precipitation occurs in the winter months. The average precipitation of the island ranges between 440 mm/year on the eastern part of the island to more than 2000 mm/year on the western mountainous areas. Moreover, orographic precipitation effects tend to increase both frequency and intensity of winter precipitation [11,55,68,69]. The Messara valley is located in the central-south area of Crete and encompasses an area of 400 km 2 ( Figure 3). About 250 km 2 of the total valley area is cultivated while the remaining area (higher ground) is used for livestock. Agriculture on the Messara plain has a significant impact on water resources and ecosystem services of the area by substantially increasing water demand. The economy of the region is based on agriculture, with intensive cultivation of mainly olive trees, grapes, citrus, and vegetables in greenhouses. The overexploitation of the aquifer has reduced water availability, as groundwater is a major resource for irrigation. Nonetheless, Faneromeni dam located in the northeast part of the valley stores the runoff water of the Koutsoulidis watershed for irrigation. Since its construction, the dam has served as a major water resource for the summer period. Also, as Messara valley often experiences dry years, dam-stored water mitigates the water deficit. The dry hydrological year 2016-2017 (1 September 2016-31 August 2017) for the Messara region, in combination with the ineffectiveness of water management policy, led to a major reduction of the water stored in the Faneromeni dam ( Figure 4).

Hydrological Model HYPE
The hydrological simulations were performed by hydrological model HYPE [60]. The model simulates water flow from precipitation through soil to the river outlet. The HYPE model has been assessed in seasonal forecast applications in basin scale Sweden [61], the Niger River basin [62] and at pan-European scale [63]. The HYPE model (version 5.2) was calibrated and validated by using observed precipitation and temperature data. The model considers different hydrological response units (HRUs) that are lump treated within the model.

Performance Estimation
The hydrological model calibration and validation was performed by using the Nash-Sutcliffe efficiency coefficient (NSE) [64] and the Kling-Gupta efficiency (KGE) [65]. Both aforementioned metrics have a perfect value of 1 while values below zero indicate that the simulation has no skill. The forecast results of the hydrological simulations were assessed using the KGE, and continuous ranked probability skill score (CRPSS) [66]. The latter metric evaluates the probabilistic skill of the forecast by measuring the weighted average skill over a range of discrete threshold levels for which exceedance probabilities are computed [67]. The skill score estimation is based on CRPSS values. A perfect score equals 1, while negative values indicate no skill. The evaluation is performed at daily and monthly timescales, to exhibit dependencies of the skill in the timescale.

Case Study Area
The island of Crete covers more than 6% of the area of Greece and is the fifth largest Mediterranean island. The island's climate is dry sub-humid Mediterranean, with warm dry summers and cold humid winters. More than 40% of annual precipitation occurs in the winter months. The average precipitation of the island ranges between 440 mm/year on the eastern part of the island to more than 2000 mm/year on the western mountainous areas. Moreover, orographic precipitation effects tend to increase both frequency and intensity of winter precipitation [11,55,68,69]. The Messara valley is located in the central-south area of Crete and encompasses an area of 400 km 2 ( Figure 3). About 250 km 2 of the total valley area is cultivated while the remaining area (higher ground) is used for livestock. Agriculture on the Messara plain has a significant impact on water resources and ecosystem services of the area by substantially increasing water demand. The economy of the region is based on agriculture, with intensive cultivation of mainly olive trees, grapes, citrus, and vegetables in greenhouses. The overexploitation of the aquifer has reduced water availability, as groundwater is a major resource for irrigation. Nonetheless, Faneromeni dam located in the northeast part of the valley stores the runoff water of the Koutsoulidis watershed for irrigation. Since its construction, the dam has served as a major water resource for the summer period. Also, as Messara valley often experiences dry years, dam-stored water mitigates the water deficit. The dry hydrological year 2016-2017 (1 September 2016-31 August 2017) for the Messara region, in combination with the ineffectiveness of water management policy, led to a major reduction of the water stored in the Faneromeni dam ( Figure 4). grounds of the watershed (1200 m a.s.l.) for the better representation of the snow accumulation and snowmelt, and one for the lower grounds (440 m a.s.l.). The different hydrological response units (HRUs) were determined by the nine most common land use classes of the Corine Land Cover 2000 (Table 1). Observed precipitation, temperature and discharge data for the hydrological years 1973 and 1993 were split into two independent 10-year long sets for the calibration and the validation periods. The HYPE model was calibrated by using its built-in differential evolution Markov chain method. A thorough description of the HYPE model can be found at the SMHI site (http://www.smhi.net/hype/wiki/doku.php?id=start:hype_model_description).

Forecast Precipitation and Temperature Performance
First, the two forecast systems' skill was assessed. The CRPSS score was evaluated for the two sub-basins for the skill assessment of the forecasting parameters with and without the data adjustment. In Figure 5, the daily CRPSS is shown before and after the different adjustments. In the case of ECMWF precipitation, the QM methods perform better, increasing the CRPSS at least to the lead-time of the first month (LT1), while the SU degraded the skill for LT2 and higher. In both cases and sub-basins, the corrected forecasts remained unskillful. In the case of GloSea5, the QM methods were found to provide a significant improvement of the skill for the first three LTs, but in LT5 to LT7, In this work, Koutsoulidis catchment was divided into two sub-basins, one for the higher grounds of the watershed (1200 m a.s.l.) for the better representation of the snow accumulation and snowmelt, and one for the lower grounds (440 m a.s.l.). The different hydrological response units (HRUs) were determined by the nine most common land use classes of the Corine Land Cover 2000 (Table 1). Observed precipitation, temperature and discharge data for the hydrological years 1973 and 1993 were split into two independent 10-year long sets for the calibration and the validation periods. The HYPE model was calibrated by using its built-in differential evolution Markov chain method. A thorough description of the HYPE model can be found at the SMHI site (http://www.smhi.net/hype/wiki/ doku.php?id=start:hype_model_description).

Forecast Precipitation and Temperature Performance
First, the two forecast systems' skill was assessed. The CRPSS score was evaluated for the two sub-basins for the skill assessment of the forecasting parameters with and without the data adjustment. In Figure 5, the daily CRPSS is shown before and after the different adjustments. In the case of ECMWF precipitation, the QM methods perform better, increasing the CRPSS at least to the lead-time of the first month (LT1), while the SU degraded the skill for LT2 and higher. In both cases and sub-basins, the corrected forecasts remained unskillful. In the case of GloSea5, the QM methods were found to provide a significant improvement of the skill for the first three LTs, but in LT5 to LT7, the SU methods slightly outperformed the former. Moreover, an increase in the skill with the lead time was observed, in both SU and QM methods, following the behaviour of the raw data. After these improvements, the skill remained low (CRPSS below 0.1). For the ECMWF temperature skill, the results are comparable with those for precipitation. A deterioration of the skill is observed by all tested methods compared to the raw data skill, especially beyond LT1. As an exception, the SU IRC of the lower sub-basin showed less deterioration. In the case of GloSea5 temperature, all methods provided an increase similar to the skill of the temperature for all LTs. In all the above cases, the individual realization adjustment provided similar results to the joint realization adjustment.
Water 2018, 10, x FOR PEER REVIEW 7 of 17 the SU methods slightly outperformed the former. Moreover, an increase in the skill with the lead time was observed, in both SU and QM methods, following the behaviour of the raw data. After these improvements, the skill remained low (CRPSS below 0.1). For the ECMWF temperature skill, the results are comparable with those for precipitation. A deterioration of the skill is observed by all tested methods compared to the raw data skill, especially beyond LT1. As an exception, the SU IRC of the lower sub-basin showed less deterioration. In the case of GloSea5 temperature, all methods provided an increase similar to the skill of the temperature for all LTs. In all the above cases, the individual realization adjustment provided similar results to the joint realization adjustment. Additionally, the CRPSS was assessed on the monthly precipitation and temperature aggregates ( Figure 6). The aggregation was performed after the adjustment in daily time-step data. The results are diverse among the two systems and the different data treatment methods. For most of the treatments of the ECMWF data and both variables, the skill decreases relative to the raw data skill. As an exception, S4 temperature for the lower basin was slightly improved by the SU methods. On the other hand, the GloSea5 forecast data largely benefit from the correction procedures, especially QM, which outperformed in the first and second month LTs. Finally, the results of the individual realization adjustment showed almost no difference to those from the joint realization adjustment. Additionally, the CRPSS was assessed on the monthly precipitation and temperature aggregates ( Figure 6). The aggregation was performed after the adjustment in daily time-step data. The results are diverse among the two systems and the different data treatment methods. For most of the treatments of the ECMWF data and both variables, the skill decreases relative to the raw data skill. As an exception, S4 temperature for the lower basin was slightly improved by the SU methods. On the other hand, the GloSea5 forecast data largely benefit from the correction procedures, especially QM, which outperformed in the first and second month LTs. Finally, the results of the individual realization adjustment showed almost no difference to those from the joint realization adjustment. Water 2018, 10, x FOR PEER REVIEW 8 of 17

Hydrological Skill of the Forecast Systems
The S4 and GloSea5 data between 1981-2004 and 1996-2004 were used as an input to the calibrated HYPE model. Two types of hydrological simulations were performed. The first considered the continuous simulation of the lagged ensemble time series of different lead times, while the second considered the initial hydrological conditions update of the model towards the reference run and then the hydrological simulation of the seven-month-long forecasts. The initial conditions updated were snow water, soil water and water in the river. The former approach unveils the theoretical skill

Hydrological Skill of the Forecast Systems
The S4 and GloSea5 data between 1981-2004 and 1996-2004 were used as an input to the calibrated HYPE model. Two types of hydrological simulations were performed. The first considered the continuous simulation of the lagged ensemble time series of different lead times, while the second considered the initial hydrological conditions update of the model towards the reference run and then the hydrological simulation of the seven-month-long forecasts. The initial conditions updated were snow water, soil water and water in the river. The former approach unveils the theoretical skill

Hydrological Skill of the Forecast Systems
The S4 and GloSea5 data between 1981-2004 and 1996-2004 were used as an input to the calibrated HYPE model. Two types of hydrological simulations were performed. The first considered the continuous simulation of the lagged ensemble time series of different lead times, while the second considered the initial hydrological conditions update of the model towards the reference run and then the hydrological simulation of the seven-month-long forecasts. The initial conditions updated were snow water, soil water and water in the river. The former approach unveils the theoretical skill of the modelling cascade of the climatic forecast, post processing and hydrological simulation, while Water 2018, 10, 1593 9 of 16 the latter approach unveils the potential skill of the hydrological forecast in an operational system. The difference of the skill obtained by the two approaches shows the effect of the initial conditions to the hydrological forecast system. First, the CRPSS was estimated for the monthly aggregated runoff. S4 and GloSea5 provided similar results, with the first month lead time having the highest CRPSS and a skill reduction from the second month onwards (Figure 8). QM methods were shown to provide marginally better results for S4, while SU performed better in the case of GloSea5. Nevertheless, CRPSS had negative values in all the lead times of both systems in the continuous simulations. The initial condition update provided a noteworthy improvement in the first three lead time months of the forecast in the case of S4 and the first two lead time months in the case of GloSea5.
The difference of the skill obtained by the two approaches shows the effect of the initial conditions to the hydrological forecast system. First, the CRPSS was estimated for the monthly aggregated runoff. S4 and GloSea5 provided similar results, with the first month lead time having the highest CRPSS and a skill reduction from the second month onwards (Figure 8). QM methods were shown to provide marginally better results for S4, while SU performed better in the case of GloSea5. Nevertheless, CRPSS had negative values in all the lead times of both systems in the continuous simulations. The initial condition update provided a noteworthy improvement in the first three lead time months of the forecast in the case of S4 and the first two lead time months in the case of GloSea5.
The hydrological results were also assessed in terms of KGE relative to the reference run. The seven days runoff was used, and the results showed that in the first month of the forecast the skill is significant with the KGE ranging between 0.4 and 0.6 for S4 and 0.3 to 0.6 for GloSea5. After the first month, the KGE for both systems reduced to 0.2-0.4 (Figure 9). Regarding the different adjustment methodologies, S4 results of the four different treatments showed consistent results with SU methods, exhibiting a slightly better performance. Moreover, the IRC methods show less spread in the results. On the other hand, the GloSea5-based simulation results showed a large uniformity among the different adjustment treatments, with a slightly better performance of QM methods.  The hydrological results were also assessed in terms of KGE relative to the reference run. The seven days runoff was used, and the results showed that in the first month of the forecast the skill is significant with the KGE ranging between 0.4 and 0.6 for S4 and 0.3 to 0.6 for GloSea5. After the first month, the KGE for both systems reduced to 0.2-0.4 (Figure 9). Regarding the different adjustment methodologies, S4 results of the four different treatments showed consistent results with SU methods, exhibiting a slightly better performance. Moreover, the IRC methods show less spread in the results. On the other hand, the GloSea5-based simulation results showed a large uniformity among the different adjustment treatments, with a slightly better performance of QM methods.
The initial conditions update provided significant improvements in the forecast ability of the two systems, especially in the first and second lead time months, with a simultaneous reduction of the variability for the entire forecast span. The initial conditions update provided significant improvements in the forecast ability of the two systems, especially in the first and second lead time months, with a simultaneous reduction of the variability for the entire forecast span. Beyond the performance of the hydrological simulation, the results were assessed for their skill to reproduce the historical dry hydrological years. The reference runoff simulation was aggregated in a hydrological year basis into three categories based on the 33th and 66th percentile of the hydrological year runoff (wet-normal-dry). The same percentiles were also used to categorize the forecast-based hydrological simulations. Then, the ratio of correctly forecasted drought (or wetness) categorization was estimated. In Figure 10a-d, the correct state detection rate was estimated for S4 and GloSea5. The shaded area represents 33%, which is the ratio of random correctly selected drought states if there was no skill to the systems. S4 exhibits a marginal ability to capture drought state, when the simulation does not include the initial state update (Figure 10, blue bar graphs), with GloSea5 providing a better drought state detection ability for the first lead time month. The skill of both systems is highly enhanced by the initial state updating, with the forecast reaching 60% for the first lead time month. Beyond the first month, S4 skill reduces significantly, while GloSea5 has a good skill (~45%) until the fourth month of the forecast. Moreover, the skill of the systems to forecast a drought state that was at least more severe than what really occurred was assessed, as this information is helpful for risk averse decision making (Figure 10e-h). It is shown that both systems have a limited skill without the initial conditions updating (about 70-75% for S4 and about 60-65% for GloSea5). However similarly to the previous results, initial condition updated hydrological forecasts largely enhance results for the first two months lead time for S4 and five months for GloSea5. Beyond the performance of the hydrological simulation, the results were assessed for their skill to reproduce the historical dry hydrological years. The reference runoff simulation was aggregated in a hydrological year basis into three categories based on the 33th and 66th percentile of the hydrological year runoff (wet-normal-dry). The same percentiles were also used to categorize the forecast-based hydrological simulations. Then, the ratio of correctly forecasted drought (or wetness) categorization was estimated. In Figure 10a-d, the correct state detection rate was estimated for S4 and GloSea5. The shaded area represents 33%, which is the ratio of random correctly selected drought states if there was no skill to the systems. S4 exhibits a marginal ability to capture drought state, when the simulation does not include the initial state update (Figure 10, blue bar graphs), with GloSea5 providing a better drought state detection ability for the first lead time month. The skill of both systems is highly enhanced by the initial state updating, with the forecast reaching 60% for the first lead time month. Beyond the first month, S4 skill reduces significantly, while GloSea5 has a good skill (~45%) until the fourth month of the forecast. Moreover, the skill of the systems to forecast a drought state that was at least more severe than what really occurred was assessed, as this information is helpful for risk averse decision making (Figure 10e-h). It is shown that both systems have a limited skill without the initial conditions updating (about 70-75% for S4 and about 60-65% for GloSea5). However similarly to the previous results, initial condition updated hydrological forecasts largely enhance results for the first two months lead time for S4 and five months for GloSea5.
As a last stage of assessment, the ability of the hydrological simulations to forecast specific historical streamflow drought events was estimated. Four drought events for the hydrological years 1985-1986, 1989-1990, 1990-1991 and 1999-2000 were considered ( Figure 11). All four droughts were covered by the S4 forecast, while GloSea5 included only the latter event. The results show that S4 has a significant skill for one month ahead in all cases, however, it did not perform well in larger lead times. The variability among the type of the forecast data pre-processing was found to be increased, missing a clear pattern about which combination of adjustment and treatment methods would provide the best forecast. As an example, the 1985-1986 event was better predicted when the initial conditions of the hydrological model were updated; however in the 1989-1990 case, hydrological prediction without the initial conditions update provided better results. Also, in many lead times of all the drought events, the prediction was not accurate. The drought event of 1999-2000, which was forecasted by both systems, served as a benchmark. The GloSea5 was found to predict the drought event well, especially using the QM methods, where the successful forecast reached as far as five months ahead. On the other hand, S4 exhibited a poor skill, with the successful prediction limited mainly to the first month. As a last stage of assessment, the ability of the hydrological simulations to forecast specific historical streamflow drought events was estimated. Four drought events for the hydrological years 1985-1986, 1989-1990, 1990-1991 and 1999-2000 were considered ( Figure 11). All four droughts were covered by the S4 forecast, while GloSea5 included only the latter event. The results show that S4 has a significant skill for one month ahead in all cases, however, it did not perform well in larger lead times. The variability among the type of the forecast data pre-processing was found to be increased, missing a clear pattern about which combination of adjustment and treatment methods would provide the best forecast. As an example, the 1985-1986 event was better predicted when the initial conditions of the hydrological model were updated; however in the 1989-1990 case, hydrological prediction without the initial conditions update provided better results. Also, in many lead times of all the drought events, the prediction was not accurate. The drought event of 1999-2000, which was forecasted by both systems, served as a benchmark. The GloSea5 was found to predict the drought event well, especially using the QM methods, where the successful forecast reached as far as five months ahead. On the other hand, S4 exhibited a poor skill, with the successful prediction limited mainly to the first month. . Historical runoff droughts (a) as simulated by the reference simulation. S4 and GloSea5 ability to capture major drought events of hydrological years 1985-1986, 1989-1990, 1990-1991 and 1999-2000 (b) at different lead times.

Discussion
This study investigates the current skill of two operational seasonal forecast systems to provide valuable seasonal discharge forecasting information for a typical Mediterranean watershed. Different types of downscaling and bias adjustment were assessed, i.e., a quantile mapping and a simple additive/multiplicative approach. Moreover, different approaches of adjustment were tested, with the first to consider each forecast realization run individually and the second to consider the joint adjustment of all realizations. Additionally, the effect of the initial hydrological model state was assessed for its effect on the forecast skill. The adjusted data were used to simulate the watershed Figure 11. Historical runoff droughts (a) as simulated by the reference simulation. S4 and GloSea5 ability to capture major drought events of hydrological years 1985-1986, 1989-1990, 1990-1991 and 1999-2000 (b) at different lead times.

Discussion
This study investigates the current skill of two operational seasonal forecast systems to provide valuable seasonal discharge forecasting information for a typical Mediterranean watershed. Different types of downscaling and bias adjustment were assessed, i.e., a quantile mapping and a simple additive/multiplicative approach. Moreover, different approaches of adjustment were tested, with the first to consider each forecast realization run individually and the second to consider the joint adjustment of all realizations. Additionally, the effect of the initial hydrological model state was assessed for its effect on the forecast skill. The adjusted data were used to simulate the watershed surface runoff by using the HYPE hydrological model. The forecast skill was assessed for the climatic parameters of precipitation and temperature and for the flow simulations.
Regarding the precipitation and temperate adjustment operations, a key finding is that the adjustment does not necessarily improve the forecast skill, despite the improvement of the climatological statistical parameters such as the mean and variability. This is explained by the type of adjustment methods, which do not alter the forecast skill, but rather adjust the climatological statistics towards the observations. In the cases that the skill of precipitation and temperature was increased, the QM methods perform better or equal to the SU method. Another important outcome is that the joint realization adjustment provided similar skill to the individual realization adjustment in the vast majority of the experiments considered, for precipitation and temperature, as well as for discharge. Upper and lower basin results showed similar results in the skill, which was partly expected due to the similarity of the forecast data in the two sub-basins.
The discharge forecast skill of the two systems was also assessed compared to the reference simulation. Results show that the first and second months of the forecast exhibited noteworthy skill, which thereafter decreased rapidly as lead time increased. Nonetheless, ECMWF showed greater consistency among the results of the different adjustment methodologies, while GloSea5 showed slightly better results up to five months ahead.
Furthermore, the ability of the two systems to forecast the streamflow drought state was assessed, revealing that GloSea5 showed a slightly better skill. Additionally, it was shown that initial hydrological state updating provides a large improvement to the drought state forecast. Regarding the skill of the two systems to predict four past runoff drought events, results showed that ECMWF had a limited predictability, while GloSea5 had better skill in the single event for which it was evaluated.
Attribution of the two systems' skill differences is difficult due to the local extent of the study region. Nonetheless, GloSea5 has been found to be skillful in the prediction of large scale events, such as the winter North Atlantic Oscillation, Arctic Oscillation and El Niño-Southern Oscillation, which have strong correlations with year-to-year precipitation and temperature variations [48,70,71]. Both forecast systems are continuously under development, delivering more skillful predictions. S4 has already been upgraded to System 5, with upgrades including an increase in spatial resolution, which has a proven positive correlation to the simulation performance in climate simulations [72].
Seasonal forecasting has a great potential to provide valuable water resource information that will aid in climate change mitigation plans. The results of this study confirm the potential usefulness of the seasonal forecast in the decision-making chain. Nonetheless, there is still need to increase the forecast skill, especially for the longer lead times. In the Mediterranean environment, in particular, where precipitation is highly seasonal, forecasting would be more useful if it could provide skillful predictions for the forthcoming wet season conditions from the beginning of the dry season (7-9 months earlier). This would allow for a more effective reaction time to adopt water-saving measurements. Finally, it has to be noted that this study does not aim to advocate a specific prediction system or pre-processing methodology, as each has its strengths and weaknesses. Due to the limited extent of the case study region, the findings of this study cannot be widely extrapolated beyond the region of study. Nonetheless, they indicate the current level of skill in the seasonal forecast at a local decision-making level for an eastern Mediterranean island.