Explanation and Probabilistic Prediction of Hydrological Signatures with Statistical Boosting Algorithms

: Hydrological signatures, i.e., statistical features of streamﬂow time series, are used to characterize the hydrology of a region. A relevant problem is the prediction of hydrological signatures in ungauged regions using the attributes obtained from remote sensing measurements at ungauged and gauged regions together with estimated hydrological signatures from gauged regions. The relevant framework is formulated as a regression problem, where the attributes are the predictor variables and the hydrological signatures are the dependent variables. Here we aim to provide probabilistic predictions of hydrological signatures using statistical boosting in a regression setting. We predict 12 hydrological signatures using 28 attributes in 667 basins in the contiguous US. We provide formal assessment of probabilistic predictions using quantile scores. We also exploit the statistical boosting properties with respect to the interpretability of derived models. It is shown that probabilistic predictions at quantile levels 2.5% and 97.5% using linear models as base learners exhibit better performance compared to more ﬂexible boosting models that use both linear models and stumps (i.e., one-level decision trees). On the contrary, boosting models that use both linear models and stumps perform better than boosting with linear models when used for point predictions. Moreover, it is shown that climatic indices and topographic characteristics are the most important attributes for predicting hydrological signatures.


Introduction
Hydrological signatures are estimates of statistics that are used to characterize streamflow time series [1,2]. The concept of hydrological signatures was first introduced and explicitly described in [3]. Relevant signature-based characterizations may be related, for example, to the average or the extreme streamflow behavior, while some guidelines for selecting hydrological signatures can be found in [2]. Examples of hydrological signatures include the mean flow, the total runoff ratio, the baseflow index, the number of flow peaks over a threshold, the time lag between rainfall and flow series and more [1].
Hydrological signatures are useful in ecological and hydrological applications, for instance as proxies of hydrological processes [1] or in hydrological simulations [4]. In

Data and Methods
Here we present the dataset, the implemented boosting algorithms, the performance metrics and an overview of the methodology. We address the problem of constructing a model that will take basin attributes as inputs and provide probabilistic predictions of hydrological signatures. To this end, a boosting regression algorithm is fitted to available data of catchments located in CONUS, and the quantile loss function is minimized. Furthermore, the predictive performance of the algorithm is assessed in a 10-fold cross-validation.

Data
We used the Catchment Attributes and MEteorology for Large-sample Studies (CAMELS) dataset [31], which includes 671 basins. This dataset is open and appropriate for benchmarking and investigations in large-sample hydrology studies [32]. Furthermore, it is appropriate for studying catchments with diverse characteristics (e.g., climatic and geological) due to the extended spatial coverage of CONUS. The data can be found online in [33,34]. Documentation of the data is available in [31,35].
We selected 667 basins (the remaining ones had missing values) which cover the entire CONUS, as presented in Appendix B. The basins are characterized by minimal human influence; consequently, the use of regression algorithms is a reasonable option for the analysis. The spatial coverage is representative of the large range of hydroclimatic conditions met in CONUS. An overview of the catchment attributes can be found in Table A1 and their explanation can be found in Appendix A. The catchment attributes include hydrological signatures as described in Table A2 (attributes were computed using data collected by Newman et al. [35]), topographic characteristics as described in Table A3 (attributes were computed using data from Newman et al. [35]), climatic indices as described in Table A4 (attributes were computed using data by Thornton et al. [36]), land cover characteristics as described in Table A5 (attributes were computed using Moderate Resolution Imaging Spectroradiometer (MODIS) data), soil characteristics as described in Table A6 (attributes were computed using data by Miller and White [37] and Pelletier et al. [38]) and geological characteristics as described in Table A7 (attributes were computed using data by Gleeson et al. [39] and Hartmann and Moosdorf [40]) related to the basin of interest.

Boosting Algorithms
We are interested in boosting for statistical modelling [41] and, in particular, in some further developments related to the interpretability of the model. These developments are summarized in [29]. The overall approach is related to a general gradient descent "boosting" paradigm developed for additive expansions and any loss function [42] (see Algorithm 1 for this formulation). Algorithm 1 Formulation of the gradient boosting algorithm, adapted from [29,30,42,43].
Step 1: Initialize f 0 with a constant.
Step 2: For m = 1 to M: a. Compute the negative gradient g m (x i ) of the loss function L at f m-1 (x i ), i = 1, . . . , n.

b. Fit a new base learner function
Step 3: Predict f M (x).
Here M is the number of iterations, h m (x) is the base learner fitted at each iteration and ρ is the step-length factor. Explanation of the role of parameters M and ρ can be found in the following.
An intuitive explanation of the concept of statistical boosting can be found in [44]. Statistical boosting can be seen as an algorithm to fit a regression model. Two properties of statistical boosting are typical. The first one, is that one can model the effect of each predictor variable (i.e., an element of the vector x i ; see Algorithm 1) using simultaneously different base learners, e.g., linear models or decisions trees (commonly base learners are Remote Sens. 2021, 13, 333 4 of 20 weak regression algorithms, i.e., they can predict slightly better than random guessing). The second one is that statistical boosting is in essence a function approximation procedure; therefore, diverse loss functions can be used to fit a model and assess the degree of approximation. Benefits related to the above two properties and the iterative fitting procedure of Algorithm 1 [44,45] are related to our problem.
Regarding the iterative procedure of statistical boosting, the final fitted model is additive with respect to the implemented base learners; therefore, it allows straightforward interpretation [46]. In addition, a base learner can be used multiple times, while a predictor variable can be modelled simultaneously by diverse base learners. The procedure of Algorithm 1 will decide how many times each predictor variable and each base learner will be included in the final additive model (we note that at step m, various base learners h m (x) can be applied, e.g., a linear model or a smooth function, however a single base learner will be selected, i.e., the one that minimizes the fitting error). Thus, statistical boosting performs simultaneously variable and model selection [47]. Variable selection is particularly important in the presence of multiple predictor variables (and more relevant in the context of high-dimensional problems [48][49][50]), while model selection is important when one does not know the type of appropriate model for the problem at hand.
Regarding the flexibility on the choice of loss functions, one may be interested in average properties of the dependent variable, in which case the L 2 (squared error) loss function may be preferable amongst others [51]. In case one is interested in probabilistic predictions, the quantile loss function is appropriate (see Section 2.4).
Another important property of statistical boosting algorithms is that they are robust against multicollinearity issues due to regularizing the estimates of f using shrinkage techniques [44,52]. This property is important in the presence of a high number of predictor variables and in the context of high-dimensional problems.
The most important parameter to be estimated in statistical boosting is the number of boosting iterations M. A low number of iterations may result in underfitting, while a high number of iterations may result in overfitting. The optimal value of M can be estimated with k-fold cross validation in which the empirical risk, i.e., the loss function averaged over the observations, is minimized [52]. In particular, the early stopping optimizes predictive performance by regularizing the estimates of f using shrinkage techniques [52]. The value of ρ is of minor importance, while small values are preferable. Setting a small ρ, effect estimates increase "slowly" in the boosting procedure, and they stop increasing after the optimal stopping iteration. Here we set ρ = 1 [52].
Here, boosting algorithms were applied using the R programming implementations by [52,53]. The mboost R package implements model-based boosting methods, while its modular nature allows it to combine diverse base learners and loss-functions.

Base Learners
Boosting algorithms are designed to improve the predictive performance of weak base learners based on the iterative framework of Algorithm 1, in the sense that weak base learners are boosted to become strong ones [54]. Here we used linear models and stumps (i.e., one-level decision trees) to model basin attributes. Geographical coordinates were not explicitly modelled, because part of this information is included in other basin attributes. The idea of using simple models to model basin attributes is consistent with the basic concept of boosting algorithms.

Metrics
Our boosting algorithms were implemented by minimizing the quantile loss function proposed by [55]. At level a ∈ (0, 1), the quantile loss function imposes a penalty equal to L(r; x) to a prediction quantile r, when x materializes according to Equation

Metrics
Our boosting algorithms were implemented by minim proposed by [55]. At level a ∈ (0, 1), the quantile loss func L(r; x) to a prediction quantile r, when x materializes acco where (•) denotes the indicator function. The quantile loss function is a proper scoring rule [5 one aims to predict conditional quantiles. It is related to l gression (i.e., linear regression with quantile loss) [57,58]. I cases of quantile losses and are particularly useful when o but here we aim to evaluate separately predictive quantil sions in hydrology has been highlighted in [61].

Metrics
Our boosting algorithms were implemented by minimizing the quantile loss function proposed by [55]. At level a ∈ (0, 1), the quantile loss function imposes a penalty equal to L(r; x) to a prediction quantile r, when x materializes according to Equation (1): where (•) denotes the indicator function.
The quantile loss function is a proper scoring rule [56] and is especially useful when one aims to predict conditional quantiles. It is related to linear-in-parameters quantile regression (i.e., linear regression with quantile loss) [57,58]. Interval scores [59,60] are special cases of quantile losses and are particularly useful when one provides prediction intervals, (·) denotes the indicator function. The quantile loss function is a proper scoring rule [56] and is especially useful when one aims to predict conditional quantiles. It is related to linear-in-parameters quantile regression (i.e., linear regression with quantile loss) [57,58]. Interval scores [59,60] are special cases of quantile losses and are particularly useful when one provides prediction intervals, but here we aim to evaluate separately predictive quantiles. The value of quantile regressions in hydrology has been highlighted in [61].

Summary of Methods
Here we summarize the framework of our study. Firstly, selected attributes and signatures are transformed using the logarithm or the square root function before performing any computations. The selection of those attributes and signatures is based on some preliminary exploratory data analysis that seeks for possibly skewed data and aims to approximately normalize the data. The inverse transform is applied to all final predictions, and all results on predictive performance are reported with respect to back-transformed values. We note here that data preprocessing is a purely empirical procedure aiming to improve predictive ability of the algorithm. A summary of variables that are transformed can be found in Section 2.1.
The sample of 667 basins is divided randomly into 10 folds. Then the boosting algorithm is trained in nine folds and tested in the remaining fold. The procedure is repeated 10 times, so that all folds are included in the test set. All predictive performances are reported for the test set. Now assume that one aims to predict a specific hydrological signature at all basins included in a single random fold. The boosting algorithm is trained in the remaining nine folds in a 10-fold cross-validation framework (this 10-fold cross-validation aims to estimate the optimal stopping parameter M and should not be confused with the 10-fold cross-validation mentioned in the previous paragraph). The procedure terminates in 2000 iterations. Predictions of negative sign are transformed to 0, for hydrological signatures that are known to be a priori positive (e.g., frequency of high-flow days).
Two cases are examined, i.e., boosting (a) using linear models to model each predictor variable and (b) using a linear model and a stump to model each predictor variable. The predictive performances of both boosting algorithms on the test set are reported for quantile levels equal to 2.5%, 50.0% and 97.5%. Furthermore, the procedure (b) (see paragraph above) is repeated by fitting the boosting algorithm in the full sample in a 10-fold crossvalidation and the frequency of included predictor variables in the final model (i.e., the model obtained by implementing the optimal parameter M) is reported for providing an explanatory model.

Results
Here we present the results of the fitting problems. In particular, we provide some exploratory analysis in Section 3.1. Sections 3.2-3.4 present the probabilistic predictions for three categories of hydrological signatures, each one including four hydrological signatures from Table A1. An overall assessment of the implemented methods is presented in Section 3.5.

Exploratory Analysis
It is important to understand the relationships between hydrological signatures and basin attributes. To this end, a correlogram between all variables is presented in Figure 1. Most features are relatively uncorrelated. Regarding a few correlated variables, we note that statistical boosting can discriminate important variables with little consequence on predictive performance, due to its internal mechanism.

Exploratory Analysis
It is important to understand the relationships between hydrological signatures and basin attributes. To this end, a correlogram between all variables is presented in Figure 1. Most features are relatively uncorrelated. Regarding a few correlated variables, we note that statistical boosting can discriminate important variables with little consequence on predictive performance, due to its internal mechanism.  Table A1 have been used for the attributes. Vertical and horizontal black solid lines classify variables (hydrological signatures and attributes) according to their type; see Table A1.
To better understand what Figure 1 depicts, we here note that variables were sorted according to their type (in both the horizontal and vertical directions); see Table A1. As regards the horizontal direction, hydrological signatures are placed in the bottom-left corner of Figure 1, being followed by climatic indices and the remaining types of variables (as we move from the left to the right). Some hydrological signatures are highly correlated (e.g., mean daily discharge, 5% flow quantile, 95% flow quantile and baseflow index). This does not pose problems for our approach, because hydrological signatures are modelled separately.
Another important note on Figure 1 is that possible low correlations between attributes and hydrological signatures may explain potential low predictability of the signatures. Consequently, estimating the uncertainty of the predictions is important. We also  Table A1 have been used for the attributes. Vertical and horizontal black solid lines classify variables (hydrological signatures and attributes) according to their type; see Table A1.
To better understand what Figure 1 depicts, we here note that variables were sorted according to their type (in both the horizontal and vertical directions); see Table A1. As regards the horizontal direction, hydrological signatures are placed in the bottom-left corner of Figure 1, being followed by climatic indices and the remaining types of variables (as we move from the left to the right). Some hydrological signatures are highly correlated (e.g., mean daily discharge, 5% flow quantile, 95% flow quantile and baseflow index). This does not pose problems for our approach, because hydrological signatures are modelled separately.
Another important note on Figure 1 is that possible low correlations between attributes and hydrological signatures may explain potential low predictability of the signatures. Consequently, estimating the uncertainty of the predictions is important. We also note that Pearson's correlation is a linear metric; therefore, selection frequency of predictor variables given by boosting algorithms (see Section 3.5) may not be identical to a ranking of predictor variables according to the magnitude of Pearson's correlation.

Streamflow Signatures
Here we present probabilistic predictions of signatures related to discharge volumes, i.e., mean daily discharge, 5% flow quantile, 95% flow quantile and baseflow index at quantile levels 2.5%, 50.0% and 95.0% in a 10-fold cross-validation procedure for the full sample of catchments. Similar analysis for the remaining signatures is presented in predictions are presented in Figure 2. Two cases are examined, i.e., boosting with linear models as base learners and boosting with a combination of linear models and stumps as base learners. We note that 5% and 95% flow quantiles should not be confused with their 2.5%, 50.0% and 97.5% prediction quantiles. In particular, the problem is set so as to provide 2.5%, 50.0% and 97.5% prediction quantiles for 5% and 95% flow quantile hydrological signatures. In general, we observe that predictions at quantile level 50% (i.e., predictions that aim to minimize the mean absolute error with respect to the observations) approximate well the observations. Moreover, prediction intervals seem to contain observed values for both implemented algorithms. It seems that both algorithms can provide probabilistic predictions that, in general, agree with the spatial heteroscedastic nature (i.e., the large and diverse variations) of the dependent variables. For instance, in catchments with high mean daily discharge the predictive quantiles at level 97.5% are also high. We note that predictive quantiles at level 2.5% seem not to follow the fluctuations of the observations. This is more evident in predictions of the 5% flow quantile and is due to the truncation at 0. Furthermore, boosting with linear models seems to be smoother with regards to model heterogeneity compared to boosting with linear models and stumps. For instance, when looking at the 97.5% predictive quantiles of the 5% flow quantile, boosting with linear models and stumps seems to provide less efficient variable predictions compared to boosting with linear models. A formal assessment of both algorithms using proper scoring rules will be presented in Section 3.5.

Signatures of Duration and Frequency of Extreme Events
Maps of values of high-flow events, frequency of high flow events, average duration of low-flow events and frequency of low-flow days are presented in Figure A2, while the respective probabilistic predictions are presented in Figure 3. Similar results as those reported in Section 3.2 regarding the comparison between the two implemented algorithms also hold here in the 10-fold cross-validation. In particular, signatures are well predicted by both algorithms, while prediction intervals contain the observed values. Boosting with linear models and stumps as base learners seems to be less variable at the 97.5% quantile level while the truncation effect at 0 is also evident.

Signatures of Duration and Frequency of Extreme Events
Maps of values of high-flow events, frequency of high flow events, average duration of low-flow events and frequency of low-flow days are presented in Figure A2, while the respective probabilistic predictions are presented in Figure 3. Similar results as those reported in Section 3.2 regarding the comparison between the two implemented algorithms also hold here in the 10-fold cross-validation. In particular, signatures are well predicted by both algorithms, while prediction intervals contain the observed values. Boosting with linear models and stumps as base learners seems to be less variable at the 97.5% quantile level while the truncation effect at 0 is also evident.

Remaining Signatures
Remaining signatures include runoff ratio, streamflow precipitation elasticity, slope of the flow duration curve and mean half-flow date as presented in Figure A3, while respective probabilistic predictions are presented in Figure 4. While similar results to Sections 3.2 and 3.3 also hold here, we note that heteroscedasticity of mean half-flow date is not well predicted, since both algorithms seems to give predictions that are relatively constant at quantile levels 2.5% and 97.5%.

Remaining Signatures
Remaining signatures include runoff ratio, streamflow precipitation elasticity, slope of the flow duration curve and mean half-flow date as presented in Figure A3, while respective probabilistic predictions are presented in Figure 4. While similar results to Sections 3.2 and 3.3 also hold here, we note that heteroscedasticity of mean half-flow date is not well predicted, since both algorithms seems to give predictions that are relatively constant at quantile levels 2.5% and 97.5%. Figure 5 presents an assessment of both algorithms in predicting hydrological signatures at 2.5%, 50.0% and 97.5% quantile levels in a 10-fold cross-validation framework. At quantile levels 2.5% and 97.5%, boosting with linear models as base learners seems to perform better than boosting with linear models and stumps as base learners. On the other hand, the combination of linear models and stumps as base learners seems to provide better results when the interest is in point predictions of hydrological signatures.

Assesment and Importance of Predictor Variables
We are interested in obtaining some explainable models when predicting hydrological signatures. To this end, we fitted a statistical boosting model with linear models and stumps as base learners to predict a given hydrological signature at quantile level 50.0%. The final fitted model, i.e., the model with the optimal number of iterations (i.e., the model provided when trained in the full sample in a 10-fold cross-validation) includes predictor variables in frequencies reported in Figure 6. The procedure is repeated for every hydrological signature. Remote Sens. 2021, 13, x FOR PEER REVIEW 11 of 23 Figure 4. Probabilistic predictions of runoff ratio, streamflow precipitation elasticity, slope of the flow duration curve and mean half-flow date (from top to bottom) at quantile levels 2.5%, 50.0% and 97.5% for statistical boosting with linear Figure 4. Probabilistic predictions of runoff ratio, streamflow precipitation elasticity, slope of the flow duration curve and mean half-flow date (from top to bottom) at quantile levels 2.5%, 50.0% and 97.5% for statistical boosting with linear models as base learners (see "linear boosting" in the legend) and combination of linear models and stumps as base learners (see "full boosting" in the legend). Figure 5 presents an assessment of both algorithms in predicting hydrological signatures at 2.5%, 50.0% and 97.5% quantile levels in a 10-fold cross-validation framework. At quantile levels 2.5% and 97.5%, boosting with linear models as base learners seems to perform better than boosting with linear models and stumps as base learners. On the other hand, the combination of linear models and stumps as base learners seems to provide better results when the interest is in point predictions of hydrological signatures. Figure 5. Average quantile losses when predicting hydrological signatures at quantile levels 2.5%, 50.0% and 97.5% for statistical boosting with linear models as base learners (see "linear boosting" in the horizontal axis), and combination of linear models and stumps as base learners (see "full boosting" in the horizontal axis). Rankings of methods per quantile level at each hydrological signature are also reported.

Assesment and Importance of Predictor Variables
We are interested in obtaining some explainable models when predicting hydrological signatures. To this end, we fitted a statistical boosting model with linear models and stumps as base learners to predict a given hydrological signature at quantile level 50.0%. The final fitted model, i.e., the model with the optimal number of iterations (i.e., the model provided when trained in the full sample in a 10-fold cross-validation) includes predictor variables in frequencies reported in Figure 6. The procedure is repeated for every hydrological signature.
To understand Figure 6, we note that attributes in the vertical axis are reported with regards to their type, i.e., topographic characteristics, climatic indices, land cover characteristics, soil characteristics and geological characteristics (from lower to upper). In general, topographic characteristics and climatic indices seem to be most important when predicting hydrological signatures. Further discussion on the results of Figure 6 related to the selection frequency of the predictor variables can be found in Section 4.

Discussion
We note here that interpretation and accurate prediction in algorithmic modelling may be two conflicting objectives [62,63]. Therefore, some concessions should be accepted, since there is no free lunch in modelling [64], so that one obtains a model that provides acceptable predictions and is interpretable simultaneously. For instance, one could com- To understand Figure 6, we note that attributes in the vertical axis are reported with regards to their type, i.e., topographic characteristics, climatic indices, land cover characteristics, soil characteristics and geological characteristics (from lower to upper). In general, topographic characteristics and climatic indices seem to be most important when predicting hydrological signatures. Further discussion on the results of Figure 6 related to the selection frequency of the predictor variables can be found in Section 4.

Discussion
We note here that interpretation and accurate prediction in algorithmic modelling may be two conflicting objectives [62,63]. Therefore, some concessions should be accepted, since there is no free lunch in modelling [64], so that one obtains a model that provides acceptable predictions and is interpretable simultaneously. For instance, one could combine multiple models and obtain more accurate points [65,66] or probabilistic [67,68] predictions. However, in this case, interpretability would be lost for the sake of generalization. Furthermore, one could also use several models and compare them (see a discussion on the value of doing multiple comparisons using big datasets in general [69], and in hydrology in particular [70,71]). An example of comparison of multiple regression algorithms for providing point predictions of hydrological signatures can be found in [18]. Here we focus on exploiting the benefits of statistical boosting and we are less interested in comparing multiple algorithms.
With regards to studies providing probabilistic predictions of hydrological signatures, we note that [16] used some variant of quantile regression forests (which belong to the wider class of random forests algorithms). Compared to random forests, statistical boosting can be more interpretable. In particular, random forests use variable importance metrics to measure the relative importance of predictor variables in explaining the dependent variable [72]. On the other hand, statistical boosting is additive with respect to the predictor variables (which are modelled by linear regression algorithms), while their importance can be related to the frequency of the appearance of the predictor variables in the final model (see Figure 6). The finding that climatic indices are the most important variables for predicting hydrological signatures, is also confirmed by an earlier study [16]. We note that compared to [16], here a formal assessment of probabilistic predictions is provided based on quantile losses.
Uncertainties of hydrological signatures are provided by [13,14] using Monte Carlo sampling; however, the approach is not based on regression algorithms and the focus is to characterize distributional properties of hydrological signatures.
Moreover, regarding the properties of the implemented algorithm related to the problem at hand, we note that boosting performs model selection in the sense that the model is not known a priori and, therefore, several models are tried (e.g., linear models and stumps for each base learner) and boosting selects the most informative ones. This problem is especially relevant when many predictor variables exist. We also note that boosting with linear models is better than boosting with linear models and stumps for probabilistic predictions, perhaps due to the structure of stumps, which does not allow for optimal probabilistic modelling. On the other hand, most flexible models that include both linear models and stumps seem to perform better for point predictions, i.e., prediction at the 50.0% quantile level.
Regarding the importance of attributes for predicting hydrological signatures, snow fraction seems to be very important for predicting hydrological signatures, based on the number of red coloured tiles in Figure 6. The significance of snow fraction for the prediction of hydrological signatures has also been merely discussed by [16]. Catchment mean slope and catchment mean elevation (topographic characteristics) are also generally very important, based on the same criterion. Catchment mean slope and catchment mean elevation are highly correlated and, therefore, if one characteristic is omitted the other one may gain more importance. Mean daily precipitation (i.e., another climatic index) is also important.
Beyond topographic attributes and climatic indices, clay fraction (a soil characteristic), maximum monthly mean of the leaf area index (a land cover characteristic), forest fraction (another land cover characteristic) also seem to be important. Geological characteristics seem to not be particularly useful for predicting hydrological signatures. We note that these findings hold for the examined sample and, therefore, perhaps a different summarization of geological or other type of information may result in completely different conclusions. Among the examined hydrological signatures, the 5% flow quantile, runoff ratio, and the mean half-flow date seem to depend on fewer attributes (see Figure 6) compared to alternative hydrological signatures.

Conclusions
We provided probabilistic predictions at quantile levels 2.5%, 50.0% and 97.5% of 12 hydrological signatures, namely (1) mean daily discharge, (2) 5% flow quantile, (3) 95% flow quantile, (4) baseflow index, (5) average duration of high-flow events, (6) frequency of high-flow days, (7) average duration of low-flow events, (8) frequency of low-flow days, (9) runoff ratio, (10) streamflow precipitation elasticity, (11) slope of the flow duration curve and (12) mean half-flow date using statistical boosting in a regression setting. The sample includes 28 predictor variables, i.e., attributes of 667 basins in the contiguous US. Two boosting models were tested, i.e., (a) a model with linear models as base learners, and (b) a model with both linear models and stumps as base learners. Boosting modes were trained to minimize quantile loss at levels 2.5%, 50.0% and 97.5%.
Regarding prediction performances, model (a) provided better predictions at quantile levels 2.5% and 97.5%, while model (b) showed better performance in providing predictions at quantile level 50.0% (i.e., better point predictions). Boosting models can be more interpretable compared to other machine learning regression algorithms. By exploiting this latter property of theirs, we found that climatic indices and topographic characteristics are better predictors of hydrological signatures than characteristics of other types.
Uncertainty estimation of hydrological signatures is the focus of some published studies; however, a formal assessment of delivered results is missing. In particular, most studies use visual tools (e.g., q-q plots) or estimate reliability and coverages; however, a "proper scoring rule" (which may combine properties of reliability scores and coverages) can be more informative when the interest is in ranking probabilistic predictions. Machine learning algorithms can provide probabilistic predictions if designed to minimize some type of proper score (e.g., quantile scores and intervals scores), while delivered results can be more accurate compared to simple linear models, due to the flexibility of machine learning algorithms. Future work can include a comparison of more probabilistic regression models, as well as their combinations for providing more accurate predictions.  Table A1. Predictor (topographic, climatic, land cover, soil and geology attributes) and dependent (hydrological) variables of the 667 basins obtained from [31], classified according to the transformation applied to them for being imported in the statistical boosting framework. A detailed description of the variables can be found in Tables A2-A7.    Table A5. Land cover characteristics (adapted from [31]).

Attribute Description
Forest fraction Forest fraction Maximum monthly mean of the leaf area index Maximum monthly mean of the leaf area index (based on 12 monthly means) Green vegetation fraction difference Difference between the maximum and minimum monthly mean of the green vegetation fraction (based on 12 monthly means) Dominant land cover fraction Fraction of the catchment area associated with the dominant land cover Table A6. Soil characteristics (adapted from [31]).

Attribute Description
Depth to bedrock Depth to bedrock (maximum 50 m) (m) Soil depth Soil depth (maximum 1.5 m; layers marked as water and bedrock were excluded) (m) Maximum water content Maximum water content (combination of porosity and soil_depth_statsgo; layers marked as water, bedrock, and "other" were excluded) (m) Sand fraction Sand fraction (of the soil material smaller than 2 mm; layers marked as organic material, water, bedrock, and "other" were excluded) (%) Silt fraction Silt fraction (of the soil material smaller than 2 mm; layers marked as organic material, water, bedrock, and "other" were excluded) (%) Clay fraction Clay fraction (of the soil material smaller than 2 mm; layers marked as organic material, water, bedrock, and "other" were excluded) (%) Water fraction Fraction of the top 1.5 m marked as water (class 14) (%) Organic material fraction Fraction of soil_depth_statsgo marked as organic material (class 13) (%) Fraction of soil marked as other Fraction of soil_depth_statsgo marked as "other" (class 16) (%)

Appendix C
We used the R programming language [73] to implement the algorithms of the study, and to report and visualize the results.

Appendix C
We used the R programming language [73] to implement the algorithms of the study, and to report and visualize the results.
The algorithms were implemented by using the contributed R packages caret [79], mboost [80].