Quantile-based hydrological modelling

Predictive uncertainty in hydrological modelling is quantified by using post-processing or Bayesian-based methods. The former methods are not straightforward and the latter ones are not distribution-free (i.e. assumptions on the probability distribution of the hydrological model's output are necessary). To alleviate possible limitations related to these specific attributes, in this work we propose the calibration of the hydrological model by using the quantile loss function. By following this methodological approach, one can directly simulate pre-specified quantiles of the predictive distribution of streamflow. As a proof of concept, we apply our method in the frameworks of three hydrological models to 511 river basins in contiguous US. We illustrate the predictive quantiles and show how an honest assessment of the predictive performance of the hydrological models can be made by using proper scoring rules. We believe that our method can help towards advancing the field of hydrological uncertainty.


Introduction
One purpose of hydrological models is to provide predictions of streamflow [1]. A splitsample scheme is implemented to assess the quality of predictions. The hydrological model is calibrated in the first segment of the sample, while the second segment is used for validation (testing) [2,3]. The validation procedure should be applied by using information not available during the calibration procedure, while the model's performance can be assessed using some quantitative criteria (see [3] for guidelines on hydrological model validation). A key component of a hydrological model is its objective function, also called loss function, cost function or scoring function. The choice of the objective function depends on the purpose of modelling, which in most cases is to provide predictions close to the observations [1]. To this end, objective functions, such as the squared error or the Nash-Sutcliffe efficiency [4], are implemented.
Besides predicting the observations as closely (i.e. accurately) as possible, quantifying predictive uncertainty is also of high interest [5,6]. Predictive uncertainty is defined here as the uncertainty of the hydrological predictions (simulations) [7] and can be quantified by a probability function. Quantification of uncertainty of hydrological predictions is closely related to the 20 th unsolved problem in hydrology: "How can we disentangle and reduce model structural/parameter/input uncertainty in hydrological prediction?" [8].
Numerous methods have been proposed and implemented for quantifying predictive uncertainty by providing predictive quantiles or the full predictive distribution, including: a. Post-processing (two-stage) techniques using stochastic or regression-based methods (including machine learning ones). These techniques model residual errors.
Post-processing techniques are popular in forecasting applications, in which a deterministic (point) forecast by the hydrological model is delivered and the modeller conditions the predictand on the forecast as well as past input and output observations to obtain a predictive distribution [19]. In simulation mode, post-processing techniques are also applicable. Joint inference techniques, are popular in hydrological simulation and skip the two-stage procedure, e.g. using Bayesian methods [16]. In both cases (simulation and forecasting), the predictive distribution of the output [50, p.22] is delivered and can be used to quantify the predictive uncertainty.
Of special interest in hydrological modelling is the prediction of specific low or high quantiles of streamflow. This problem can be formed as the one in which the modeller receives a directive in the form of a specific statistical functional, i.e. a quantile of the predictive distribution [51]. In this case, it is critical that the objective function "is consistent for it, in the sense that the expected score is minimized when following the directive" ( [51], see the definition of consistency of scoring functions in Methods Section 2).
Following the above conversation, the aim of this work is to solve a problem that we define as follows: "A hydrological modeller receives a directive to predict a quantile of the distribution of streamflow. How could she/he adapt its model to do so?". To this end, we propose the use of the quantile loss [52] as the objective function of the hydrological model at the requested (by the end user) quantile levels.
We note here that compared to the previous mainstream approaches to quantifying predictive uncertainty we differentiate in the following (while implications are discussed in detail in Discussion and Conclusions Sections 6 and 7): a. Post-processing approaches: The quantile loss function is directly implemented by the hydrological model. On the other hand, post-processing methods model the residuals of the fitted hydrological model (that have been obtained by implementing a squared error type loss function), at a second stage (i.e. after applying the hydrological models), using a statistical or machine learning method.
b. Monte Carlo of Bayesian joint inference approaches: These methods are applied directly to the hydrological model; therefore, there is some resemblance with our approach. In this case, the differences of the two approaches (Bayesian joint inference and our proposed approach) are identical with those identified in the statistical literature regarding the differences between Bayesian statistics and quantile regression modelling.
These differences are thoroughly discussed in Discussion Section 6.
Statistical modelling based on the quantile loss function is met frequently in the statistical literature and is well received by practitioners, e.g. in the optimization of linearin-parameters models (see e.g. the book by [53]), neural networks [54], random forests (see e.g. the review by [55]) and boosting algorithms (see e.g. the review by [56]). Therefore, we believe that it will also be of practical interest to hydrologists. Calibration of a hydrological model with the quantile loss function has also been proposed by [57], [58] in the context of model selection and model structure deficiency assessment, however the value of the quantile loss function for predictive uncertainty assessment has been minimally examined, while here we frame assessments of predictive uncertainty in a formal framework.
The remainder of the paper is structured as follows. The methods, with emphasis on concepts related to the quantile loss function, are presented in Section 2. In the same Section, the suite of the three lumped GR (Génie Rural) hydrological models [59] used in the study is briefly presented. The data used to apply the hydrological models are presented in Section 3. In brief, this data originates from 511 river basins in the CONUS (contiguous US). The dataset description is followed by a summary and implementation directives for the proposed framework, which are provided in Section 4. The results of the application of the hydrological models are presented in Section 5. Lastly, the paper closes with the Discussion and Conclusions Sections 6 and 7, respectively.

Methods
Here, we present the concept of the quantile loss function whose use is proposed in the manuscript for supporting the application of a hydrological model for directly predicting the predictive quantiles of interest. An outline of the properties of this loss function is also provided, followed by a brief presentation of the three lumped GR hydrological models.

Quantile loss function
A real-valued random variable x might be characterized by its distribution function F defined by: Then, F -1 (a) is defined by: F -1 (a) is referred to as the a th quantile of x, while inf denotes the infimum of a set of real numbers. For instance, F -1 (1/2) is the median or 0.50 th quantile. In regression modelling, one minimizes the sum of absolute errors to estimate the median of the conditional distribution. The natural question then is "are there analogs for regression of the other quantiles?" [53, p.5]. The idea, elaborated by [52] is to apply the quantile loss function defined by eq. (3), instead of using the absolute error function: Here 1(•) denotes the indicator function, x is the materialization of the variable x, a is the quantile level of interest, and r is the respective predictive quantile. For a = 1/2, eq.
(3) reduces to: which is half the absolute error function. Obviously, when a sample is provided, the objective is to minimize the average score, i.e. the quantile loss function averaged over a fixed set of observations, equal to ∑iL(ri; xi, a)/n, i ∈ {1, …, n}.
The quantile loss function, defined by eq. (3), is positive and negatively oriented, i.e.
the objective is to minimize it. Figure

Theoretical properties of the quantile loss function
We note that linear-in-parameters quantile regression (or simply quantile regression when referring to non-linear models) is based on the minimization of the quantile loss function; therefore, in what follows, we refer to both the properties of quantile regression and quantile loss. A history of concepts related to quantile regression can be found in [60].
Following [51] and the intuitive explanation by [61], if a modeller "is asked to report a certain functional of the predictive distribution, then a key requirement on the loss function is to be strictly consistent, in the sense that the expected loss or score is uniquely minimized if the directive asked for is followed" [51]. It is proved [51] that the quantile loss function is consistent for the quantiles of the predictive distribution.
In the context of probabilistic predictions, scoring rules are used to quantify predictive performance [62]. "A scoring rule is proper if truth telling is an optimal strategy in expectation" [63]. Scoring rules support the modeller in being honest about the assessment of his predictions [62]. It is proved that the quantile loss function is a proper scoring rule [62].
Moving from the concepts of consistency and propriety, the quantile loss function has been the focus of intensive research, including optimization algorithms for parameter estimation in quantile regression settings [64,65], goodness-of-fit processes [66], decomposition in reliability, resolution and uncertainty [67], and more.

Hydrological models
The Génie Rural GR4J, GR5J and GR6J daily lumped hydrological models were used in the study [59]. These models are widely used in hydrology, while their detailed description is out of the scope of the study. Their modular implementation in R programming language allows the minimization of a customized loss function.
The GR hydrological models have evolved over time from the daily GR3J model with three parameters [68] to the GR4J model with four parameters [69], and the GR5J and GR6J models with five and six parameters, respectively [70]. Other versions of the GR models are available at the monthly [71] and annual [72] time scales. The early history of the GR models can be found in [59].
The estimation of the parameters of the hydrological model is done using the Michel's [73] optimization algorithm, while the here models are calibrated and evaluated using the quantile loss function. The GR models have some interesting properties that differentiate them from one another. The additional parameter of the GR5J model with respect to the GR4J model considers more complex inter-catchment water exchanges, while the GR6J model offers improvements in the simulation of low flows [59]. Presentations of the parameters of the models can be found in [69], and [70].

Data
We applied our method to daily hydrometeorological data from 511 small-to mediumsized river basins. These river basins represent most climate types over CONUS, and their changes due to human influences are minimal. The data and detailed information about the river basins are available in the CAMELS dataset, which is fully described in [74,75,76,77,78,79]; see Figure 2 for the geographical locations of the river basins. Daily time series of minimum and maximum temperatures and precipitation are available for each river basin [79]. The same applies for daily runoff time series. Here, we focus on the 34- year period 1980-2013. The same 34-year period and river basins have been previously examined in the hydrological post-processing study by [33], as for them the available time series records are complete. Mean daily temperatures were computed for each river basin by averaging its minimum and maximum daily temperatures, separately for each day included in the 34-year period. Lastly, a time series of daily potential evapotranspiration (PET) were estimated for each river basin by applying the Oudin's formula [80] to the previously obtained daily mean temperature time series.

Implementation and key components
Here, we present the key components and steps of the conducted large-scale application, thereby summarizing Sections 2 and 3. We applied three hydrological models, specifically the GR4J, GR5J and GR6J ones. The inputs to these models are daily precipitation and In what follows, the period 1998-2013 will be used for model testing purposes, while all results refer to the validation period.

Results
To facilitate the desired understanding of the outputs of the quantile-based hydrological models, in Figure 3 we present the predicted streamflow at an arbitrary river basin and for an arbitrary two-year period. We note that the models simulated streamflow in the  The most frequently used objective functions in hydrological modelling are those of squared-error type. To perceive the differences between the squared error and the L(r; x, 1/2) loss functions, in Figure 4 we present the average score L(r; x, 1/2) in the test period at each river basin for the predictions issued by the GR4J model and specifically for the case that this model is calibrated by using the L(r; x, 1/2) loss function (x-axis) in comparison to the squared error function (y-axis). It is evident that when the calibration of the hydrological model is based on the L(r; x, 1/2) loss function, one receives lower average scores in the test period compared to the case in which the calibration is based on the squared error loss function; recall also that the quantile loss function is negatively oriented; therefore the lower the score the better the prediction. On the other hand, it is evident that the average scores do not differ dramatically. This is reasonable, given that both loss functions aim at simulating closely the observed streamflow. hand, the GR6J model is 1.55% worse compared to the GR4J model (see Figure 5b). It is interesting to know if and how much each model improves over the performance of the benchmark at each quantile level. Figure 6 presents the median relative scores averaged over all river basins for the GR5J and GR6J models against the GR4J model. The performance of each model depends on the quantile level. For instance, the GR5J model improves over the performance of the GR4J model for quantile levels lower than 0.500, but its performance deteriorates for higher quantile levels. Figure 6. Heatmap of the median of the relative improvements summarizing the results for the 511 river basins for the performance of the GR5J and GR6J models against the performance of the GR4J hydrological model in terms of quantile scores.
Quantile scores are scale-dependent; therefore, intuition about the results across river basins, presented as made above (i.e. in terms of average scores), is somewhat limited. On the contrary, the coverage of the predictions issued by each model at each quantile level is a scale-independent measure. This measure counts the percentage of observations that are lower compared to the simulations at a specified quantile level. For instance, when simulating streamflow at quantile level 0.025, 2.50% of the observations will be lower than the simulated ones for a perfect model.
The median coverages of the predictions issued at all the examined river basins by the three models are provided in Figure 7. The simulations at high quantile levels seem to be better compared to those at the lower quantile levels. This could be attributed to the fact that hydrological models are less skilled to simulate low and intermittent flows. Although coverages are intuitive, we note here that they are not consistent for reporting a certain quantile of the predictive distribution of streamflow. Note that it is sufficient to recall, at this point, the definition of consistency (see Section 2.2). Therefore, they cannot support the modeller in being honest in the assessment of the hydrological models, in the sense explained in Section 2.2, and one should use proper scoring rules (e.g. the quantile score) to provide honest assessments. As an illustrative example, consider the case of obtaining predictive intervals (e.g. by combining predictive quantiles at 0.025 and 0.975 levels). One needs to have exact coverage, but also to obtain small-width intervals [82]; therefore, he has to rely on proper scoring rules. Figure 7. Heatmap of the median of the coverages of the predictions issued by the GR4J, GR5J and GR6J hydrological models at varying quantile levels summarizing the results for the 511 river basins.

Discussion
From a conceptual point of view, the differences between Bayesian approaches and our methodological approach to quantifying predicative uncertainty in hydrological modelling can be summarized by knowledge available in the field of statistics, as detailed in [83]. In particular, quantile regression (and, therefore, our methodological approach as well) is suitable in the following situations: a. When one is interested in events at the "limits of probability".
b. When the conditional distribution does not follow a known distribution.
c. Possible presence of many outliers of the dependent variable (recall also that median regression is more robust compared to mean regression in the presence of outliers).
d. Presence of heteroscedasticity.
Obviously, streamflow attributes can be placed in the above situations; therefore, quantile-based hydrological modelling is an appropriate choice. Furthermore, compared to Bayesian approaches, quantile-based hydrological modelling is faster, as it practically constitutes an optimization problem.
On the other hand, some drawbacks of our methodological approach are the following: a. Estimating the parameters of the model is harder compared to Gaussian regression.
b. Inference on the parameters (e.g. the computation of confidence intervals) is complicated.
c. Possible presence of quantile crossing, i.e. estimated quantiles at higher levels might be lower than respective quantiles at lower levels.
d. The full conditional distribution is not available, although the computation of multiple quantiles can substitute predictive distributions [61]. In this case, a drawback of the method is that it requires the estimation of a high number of set of model parameters (one set for each quantile).
Compared to adopting post-processing approaches, changing directly the objective function is more straightforward; therefore, the predictions issued by the proposed method could be more interpretable. In particular, post-processing approaches resemble boosting algorithms with the difference that the base models of the former (i.e. the hydrological model and the post-processors) might be strong (i.e. they might provide good predictions), albeit they do not implement the loss function of interest. Given that boosting algorithms are designed to boost performance, it might be possible that postprocessing will yield better or equivalent performance in practice. This could be investigated in the future by conducting independent large experiments (see e.g. the discussions in [84,85,86,87] on the value of big data experiments).
We also note that hydrological modellers are usually interested in low or high quantiles of streamflow. Although, the model's structure can be customized to target such quantities, it is frequent to combine the model with a loss function suitable for modelling means (e.g. the squared error function or the Nash-Sutcliffe efficiency [88,89]). In such cases, one should not expect to obtain reliable estimates of quantiles of the predictive distribution, but rather customized estimates of the mean flow at low or high flow conditions.
Reporting some point predictions according to some vague request is a common practice that should be avoided, while furthermore it is not meaningful to evaluate the predictions using a set of scoring functions [51]. One should disclose the scoring function to the modeller or request a specific functional of the predictive distribution [51]. For instance, if the modeller receives a directive to report the mean functional, she/he should implement the squared error loss function which is consistent for this functional.
Obviously, results could be reported for alternative scores, but they will be irrelevant for the task. Therefore, the comparison between the absolute error function and the squared error loss functions reported in Figure 4 is done for illustration purposes and is not intended to prove that one function is better with respect to the other.
Regarding the relative performance of the hydrological models at different quantile levels, more complex models provided higher performances for the 0.50 quantile although at other quantile levels a deterioration is observed with the exception of GR5J model for quantile level lower than 0.50. The reason for the performances of the models is unclear; and could be investigated in a following study. However, it should be noted that GR hydrological models were designed with a focus on the mean functional; with more complex models delivering better predictions, however these properties do not necessarily transfer to other quantile levels.
The quantile loss function can substitute the squared error function (or its equivalents) in hydrological models tailored to deliver forecasts; therefore, one can obtain directly quantile forecasts by a single model or post-process quantile simulations in the data assimilation procedure. Finally, similar methodological themes to those proposed in this work, including several ones for issuing point [90] and probabilistic [91] predictions of hydrological signatures, could be provided by exclusively using hydrological models, instead of relying on data driven-ones. Other similar themes for improved quantile-based predictions are those combining multiple hydrological models and more [30].

Conclusions
In this paper, we proposed a new methodological approach to quantifying predictive uncertainty in hydrological modelling. The key idea is to calibrate the hydrological model by using the quantile loss function. By doing so, one can directly simulate the targeted quantiles of the predictive distribution of streamflow. Compared to post-processing techniques, our approach is straightforward and, compared to Bayesian techniques, it is distribution-free and faster.
To demonstrate the effectiveness of the new method and its wide applicability far from computational limitations, we applied it to 511 river basins in CONUS in the wider frameworks of three hydrological models. With this large-scale application, we showed how the performances of different hydrological models can be compared with respect to simulating predictive quantiles at pre-defined levels. Considering its simplicity, large applicability and other advantages, we believe that our method opens new avenues in the field of hydrological modelling, with its possible extensions including those quantifying uncertainty when predicting streamflow at ungauged basins.

Conflicts of interest:
The authors declare no conflict of interest.

Author contributions:
The authors contributed equally to the work.
Acknowledgements: HT is sincerely grateful and appreciative to the Journal's award committee for having been awarded the "Water 2021 Best Paper Award" and for having been invited to submit an article to the Journal. This work has been submitted in response to this invitation.

Appendix A Used software
The computations and visualizations were conducted in R Programming Language [92] by using the following packages: airGR [59,93], data.