Prediction Interval Estimation Methods for Artificial Neural Network (ANN)-Based Modeling of the Hydro-Climatic Processes, a Review

Despite the wide applications of artificial neural networks (ANNs) in modeling hydroclimatic processes, quantification of the ANNs’ performance is a significant matter. Sustainable management of water resources requires information about the amount of uncertainty involved in the modeling results, which is a guide for proper decision making. Therefore, in recent years, uncertainty analysis of ANN modeling has attracted noticeable attention. Prediction intervals (PIs) are one of the prevalent tools for uncertainty quantification. This review paper has focused on the different techniques of PI development in the field of hydrology and climatology modeling. The implementation of each method was discussed, and their pros and cons were investigated. In addition, some suggestions are provided for future studies. This review paper was prepared via PRISMA (preferred reporting items for systematic reviews and meta-analyses) methodology.


Introduction
Sustainable water resources management includes designing and managing various aspects such as ecology, environment and hydrology integrity in the present and future [1]. Sustainable management requires adequate information about the state of water resources. Thus, appropriate modeling to investigate the situation in present and future times is necessary. On the other hand, reliability of the modeling is an important issue that directly influences the management and decision-making of the problems. Therefore, the uncertainty involved in modeling should be carefully considered so as to achieve more realistic decisions.
Recently, the artificial neural network (ANN) as a prevalent modeling method has been used for identification of the complicated non-linear relationship of inputs and output (e.g., see, [2][3][4][5][6][7][8]). The relationship between the hydrological phenomena is a complicated issue and hot topic in hydrological studies, due to spatial and temporal changes of factors that influence the process. Therefore, many hydrological models with various degrees of complexity have been used for the simulation of such a stochastic process [9]. In spite of the numerous applications of the ANN, it has been indicated in many previous studies that ANN models are inherently stochastic, as identical results would be difficult to be reproduced on different occasions [10]. Classic applications of ANN include some imperfections; e.g., the weights of the ANN are randomly assigned, which leads to a long training time; ANN behavior is unexpected and there is not a specified way for determination of the best structure. These features are a deficiency of ANNs, which can have a negative effect on the reliability of the modeling.
The application of PIs to quantify uncertainty has increased and attracted significant attention in the last decade. Therefore, it is worthy to evaluate different PIs development methods in order to identify the best performance of the methods and assess each method suitability.
The lack of review papers investigating PIs development methods in the field of hydrology caused the preparation of the current review paper. The number of published papers regarding PIs construction methods is depicted in Figure 1. The major objective of this review paper is to categorize and enumerate the PIs construction methods and their applications in hydro-climatic studies. Moreover, some suggestions for future works in order to develop and improve the PIs applications are presented. The reviewed sources are mostly included by the Scopus abstract and citation database (www.scopus.com (accessed on 26 January 2021)). Elsevier's Scopus is the most frequently used research engine, and it is updated earlier than the Web of Science on which the papers may be updated lately. In addition, as authors can load any paper onto Google Scholar, some information may not be reliable. The search terms were ("ANN"; "uncertainty") and ("ANN"; "Prediction interval"), respectively, for the uncertainty analysis and PIs. The search operator was "and". Then, the appropriate papers in the fields of hydrology and climatology were selected by abstract reviewing. Moreover, only journal articles published in English were considered, as most of the papers in Scopus research engine are in English. The initial number of obtained papers about the uncertainty assessment of hydrological and hydroclimatological studies was 36 papers; 18 papers from well-known international journals from the year 2002 are tabulated in Table 1. In addition, 69 papers associated with PIs construction of the ANNs were investigated, then 17 papers were selected according to an abstract review and relation to hydrology and climatology. Papers from the years 2002 to 2020 were presented as selected papers. results at the first and last steps. This review paper attempts to follow most of the checklist's items of the PRISMA method, since this method is used as a basis for reporting systematic reviews for most research types. Some examples of systematic literature reviews are [49,50]. The systematic research was conducted on September 1, 2020, and it was updated on December 20, 2020 and also on January, 20, 2021 for preparing a revision. The list of reviewed papers is tabulated in Table 2.     Some methodologies concerning standards of literature reviews and the way of reporting and structuring of them are RAMESES (realist and meta-narrative evidence syntheses: evolving standards), PRISMA (preferred reporting items for systematic reviews and metaanalyses) and PSALSAR (research protocol, appraisal, synthesis and analysis, reporting results). RAMSES could be an appropriate choice for systematic narrative reviews. PRISMA was developed for systematic literature reviews and meta-analyses [44]. PRISMA consists of a 27-item evaluation checklist and a specific flowchart to follow [45]. Moreover, PRISMA protocols (PRISMA-P) checklist and the Explanation and Elaboration [46] document could lead to the improvement of a more complete and reliable review report [47]. The PSALSAR method consists of six basic steps [48], while the common systematic literature review methods include four steps, which are search, appraisal, synthesis and analysis (SALSA). The PSALSAR method contains research protocol and reporting results at the first and last steps. This review paper attempts to follow most of the checklist's items of the PRISMA method, since this method is used as a basis for reporting systematic reviews for most research types. Some examples of systematic literature reviews are [49,50]. The systematic research was conducted on September 1, 2020, and it was updated on December 20, 2020 and also on January, 20, 2021 for preparing a revision. The list of reviewed papers is tabulated in Table 2.
In the following, Section 2 presents the base concepts of PIs and measurement criteria of PIs, Section 3 describes the different PIs construction methods, Section 4 compares different pros and cons of the methods and finally Section 5 recommends some suggestions for future studies.

PIs Concepts
An interval includes the upper and lower limits that capture indeterminate future value with a prescribed probability [68]. This limit and interval respectively are known as prediction limit and PI (see Figure 2). PI and CI are different measures, and it is important to distinguish them. The CI corresponds to the accuracy of the estimation of the true regression, while PI corresponds to the accuracy of the estimation concerning the observed target value. Actually, PI is more applicable than CI, since it is associated with the accuracy of observed target prediction, whereas CI presents the accuracy of true regression estimation. In order to investigate the differences between the two measures, the following description should be considered. For the estimation of the unknown function (x i ;θ) presenting the true underlying model, where θ is actual parameter set, for N data samples {(x i ,t i )} we have: where x i and t i are, respectively, input, observed target and model error. So, the objective is to estimate the true model (x i ;θ). The approximate model ; θ is the mean of the distribution of the targets, where the estimated parameters are determined using machine learning methods. To quantify the uncertainty, two aspects should be considered. One of the aspects is that CI that indicates the accuracy of the estimation of the true model. CI is measured by the distribution of the quantity (x i ;θ) − x i ;θ . The second aspect is that PI indicates the accuracy of the prediction of the target. Therefore, Equation (2) can present the relation between PI and CI as: It can be concluded from Equation (2) that PIs are wider than CIs, where PI contains CI and covers more sources of uncertainty [54].

PIs Assessment Measures
The most commonly used measures for quantifying PIs construction are PI coverage probability (PICP) and mean PI width (MPIW). The coverage measure corresponds to the encompassments of the obtained bounds. The wider PIs increase the PICP. The PICP is calculated as [24]: PI and CI are different measures, and it is important to distinguish them. The CI corresponds to the accuracy of the estimation of the true regression, while PI corresponds to the accuracy of the estimation concerning the observed target value. Actually, PI is more applicable than CI, since it is associated with the accuracy of observed target prediction, whereas CI presents the accuracy of true regression estimation. In order to investigate the differences between the two measures, the following description should be considered. For the estimation of the unknown function f(x i ; θ) presenting the true underlying model, where θ is actual parameter set, for N data samples {(x i , t i )} N i=1 we have: where x i and t i are, respectively, input, observed target and model error. So, the objective is to estimate the true model f(x i ; θ). The approximate model f x i ;θ is the mean of the distribution of the targets, where the estimated parametersθ are determined using machine learning methods. To quantify the uncertainty, two aspects should be considered. One of the aspects is that CI that indicates the accuracy of the estimation of the true model. CI is measured by the distribution of the quantity f(x i ; θ) − f x i ;θ . The second aspect is that PI indicates the accuracy of the prediction of the target. Therefore, Equation (2) can present the relation between PI and CI as: It can be concluded from Equation (2) that PIs are wider than CIs, where PI contains CI and covers more sources of uncertainty [54].

PIs Assessment Measures
The most commonly used measures for quantifying PIs construction are PI coverage probability (PICP) and mean PI width (MPIW). The coverage measure corresponds to the Sustainability 2021, 13, 1633 7 of 18 encompassments of the obtained bounds. The wider PIs increase the PICP. The PICP is calculated as [24]: where N is the number of samples, and L i and U i are the lower and upper bounds of the ith PI, respectively. The second measure is used to evaluate the width of PIs. Normalized MPIW (NMPIW) shows the normalized width as [24]: where R is the range of the observed values. NMPIW, as a dimensionless criterion, indicates the mean width of PIs. Each of these criteria separately cannot lead to a clear judgment due to their inverse relationship. Therefore, the combinational coverage width-based criterion (CWC) containing both criteria can be used to evaluate the estimated PIs as [24]: η and µ are fixed parameters, which determine the PIs with the lower value of the PICP. µ represents the confidence level of the PIs. η magnifies variation of the PICP and µ.
Different η values should be examined to determine the most appropriate η value via a trial-error process. The coverage and width criteria are the most commonly used measures to evaluate the PIs' quality, however some other statistical measures are also applied to evaluate the calculated PIs. For example, in order to evaluate the constructed PIs via the Monte Carlo method, mean and standard deviation of the Nash-Sutcliffe model efficiency for each run are calculated to measure the PIs' coverage [58].

PIs Construction Methods
There are some methods to calculate the PIs. The Bayesian and Monte Carlo are the traditional methods. The other most frequently used method is the bootstrap technique; besides LUBE, it is one of the reliable methods. In addition, there are some other methods, such as mean variance estimation [11] and first order uncertainty analysis (FOUA) [53], but as they are not so prevalent, in the following sub sections the most common methods and their applications in hydro-climatic studies are described.

Bayesian Method
The Bayesian method was introduced by MacKay [21]. Training of the classic ANN is based on minimizing the error function, which leads to obtaining the optimum weights. Whereas, the Bayesian method attempts to train the ANN for the posterior probability distribution of weights from assumed prior probability distribution using Bayes' theorem. In the Bayesian method, ANN training is performed based on the regularized cost function as: where E D is the sum of squared error and E ω is the sum of squares of the network weights. ρ and β are used to determine training goals. The concept of this method is based on consideration of the set of ANN parameters, ω as a random set of variables with presumed distributions. The Bayes' rule is applied to update the density function of the weights as [14]: where M and D are the NN model and the training dataset. P(D|ω, β, M ) and P(ω|ρ, M ) are the likelihood function of data occurrence and the prior density of parameters, respectively. Representing our knowledge, P(D|ρ, β, M ) is a normalization factor enforcing that the total probability is 1. Assuming that noises are normally distributed and P(D|ω, β, M ) and P(ω|ρ, M ) have normal distributions, it can be concluded that [14]: where Z D (β)= ( π β ) n/2 and Z ω (ρ)= ( π ρ ) ρ/2 . n and p are the number of training samples and NN parameters, respectively. By substituting Equations (8) and (9) into (7), Equation (10) is obtained as [14]: The ANN is trained via maximization of the posterior probability P(ω|D, ρ, β, M ), which is based on the minimizing Equation (6). By taking derivatives with respect to the logarithm of (10) and setting it equals to zero, the optimal values for β and ρ are obtained [14]: where γ = p − 2ρ M P tr H MP −1 is the so-called effective number of ANN parameters, and p is the total number of ANN model parameters. ω M P are the most probable values of the ANN parameters. H MP (Equation (13)) is the Hessian matrix of E(ω) as [14]: The approximation of the Hessian matrix is generally performed using the Levenberg-Marquardt optimization algorithm. Application of this technique for the training process results in ANNs having the variance as [14]: The uncertainties corresponding to the data and parameters, respectively, are quantified via the term in the right and left sides of Equation (15). Finally, PI can be calculated considering the total variance of the ith future sample as [14]: where z 1− a 2 is the 1−(α/2) quantile of the normal distribution function with zero mean and unit variance. Additionally, ∇ T ω MPŷ i is the gradient of the ANN output with respect to its parameters' set of ω MP . The Bayesian method for PI construction has a strong mathematical foundation. This method requires calculation of the Hessian matrix, which needs a significant amount of time, but it should be considered that the computational load is lower in the process of constructing PIs because of only calculating the gradient of Neural Network (NN) output. Some studies applied the Bayesian method in order to construct the PIs. Kasiviswanathan and Sudheer [51] used the bootstrap and Bayesian techniques to assess the uncertainty of the flood forecasting models of the ANN. The method application was based on the assumption that the model structure is deterministic, therefore, only the parameter of uncertainty was assessed in this study. It was concluded that model implementation is acceptable when the ensemble mean is considered. It was concluded that the bootstrap method is simple and easy in the case of the implementation as compared to the Bayesian method, but comparison of the obtained results showed that the Bayesian method led to narrower PIs and lower variance in parameter convergence.

Monte Carlo Method
The performance of the Monte Carlo method is based on alteration of the model inputs, parameters or structure of their ensemble. The number of iterations depends on the required level of reliability and is a problem-dependent task. More repetition leads to more reliable results, but higher computational cost should also be examined. If the model structure and the input data are assumed to be certain, Equation (16) can be presented as [58]:ŷ t,i = M(x, θ i ); t = 1, 2, . . . , n; i = 1, 2, . . . , s where θ i is the set of parameters sampled for the ith run of the Monte Carlo simulation,ŷ t,i is the model output of the ith time step for the ith run, n is the number of time steps and s is the number of simulations. The statistical properties (such as moments and quantiles) of the model output for each time step t are estimated from the realizationsŷ t,i . In order to quantile the uncertainty, the following equation can be expressed [58]: where,ŷ t is the model output at time step t,ŷ t,i is the value of model output at time t simulated by the model M(x, θ i ) in ith simulation,Q(p) is pth [0,1] quantile, w i is the weight given to the model output in ith at simulation. Quantiles obtained in this way are conditioned on the inputs to the model, the model structure and the weight vector w i . The computation of the model's PI with confidence level α (0 < α < 1) is achieved though estimation of the 1−α 2 * 100% and 1+α 2 * 100% via theŷ t,i . The lower prediction limit PL L and the upper prediction limit PL U are calculated as [58]: Then, the PI is derived considering the output of the calibrated (optimal) model (y) as: where PI L and PI U are the interval of the obtained results as lower and upper bounds, respectively. It should define the purpose of the work and its significance. Shrestha, Kayastha and Solomatine [58] applied the Monte Carlo method to assess the parametric uncertainty of the analysis of hydrological models of rainfall runoff using ANN. It was shown that the Monte Carlo method could be used to determine the other sources of uncertainty, such as input, structure or their combination. Tapoglou, Varouchakis, Trichakis and Karatzas [60] applied the Monte Carlo technique to investigate the uncertainty associated with modeling of the hydraulic head in an aquifer via the ANN. The model was performed 300 times by various training sets, and initial random values and the training results constituted a sensitivity analysis of the ANN training to the kriging part of the algorithm. This study concluded that error intervals for the train and test data of the ANN and kriging PIs were narrow, considering the complexity of the study area. Application of the Bayesian kriging methodology was assessed, and it was concluded that the difference between the predicted values and the results of simulation of the actual data was low. This method led to consistent and reliable performance in different conditions. It was assessed that this method is appropriate to simulate groundwater-level, chiefly in cases with complicated behavior and unknown geological data.
Seifi, Ehteram, Singh and Mosavi [59] attempted to evaluate the uncertainty of groundwater level modeling via the hybrid ANN modeling and some other black box models and six meta-heuristic optimization methods, such as the grasshopper algorithm, cat swarm, weed algorithm, genetic algorithm, krill algorithm and particle swarm optimization, and it was mentioned that hybrid methods led to better performance and accuracy than sole methods.

Bootstrap Method
In this technique, several ANNs (B ANNs) are trained with randomly selected subsets [22] (see Figure 3). The randomly selected samples from total data are used for training each of networks. This technique is based on ensembling some ANNs, which could lead to lower estimation errors with regard to a single ANN. This method is independent of any complicated calculation using the non-linear operator or function. A model as f ANN (x) is fitted to each of the generated bootstrap sub-sets, and the bootstrapping estimate is calculated as the average and variance of each model as: Figure 3. Schematic of the bootstrap method [14]. P B stands for training data sub-sets.
For constructing the PIs [l.u], values of the observed X = (x 1 , x 2 , . . . , x 3 ) with normal distribution probability of P and according to Equations (21) and (22), P (l < X < u) are as: where Z = X−ŷ boot σ boot , the standard score of X, is distributed as standard normal [69], hence: or l =ŷ boot −zσ boot ; u =ŷ boot +zσ boot . Table 3 presents the corresponding values of Z and PIs. Talebizadeh and Moridnejad [52] applied the bootstrap method to assess the uncertainty arising from measurement error and also the uncertainty of the ANNs' output for forecasting the lake level fluctuations. It was stressed that PIs' estimation provided beneficial information for decision making and designing. Kasiviswanathan and Sudheer [53] combined the bootstrap and the FOUA method to investigate the parametric and predictive uncertainty of the rainfall-runoff ANN-based modeling to forecast river flow. It was concluded that the FOUA method could compute the sensitivity coefficients that are the first order partial derivative of the model output and parameters of modeling. In this method, the computational burden and time of simulation for uncertainty analysis are reduced due to the usage of the statistical parameters such as mean and variance of the ANN weight vectors and biases. The parameter variability was determined via the bootstrap method. The obtained results for uncertainty analysis were quantified via the coverage and width criteria. It was concluded that the results for training and verifying data sets matched each other. Moreover, uncertainty associated with various domains of flow (low, medium and high) was assessed to identify the effect of the magnitude of flow on uncertainty; the results indicated that the uncertainty level changed with different flow regimes, proportionally. The overall results, considering both width and coverage criteria, show that the FOUA method led to a better quantification of the prediction uncertainty compared to the bootstrap method. Wang, Zheng, Zhao, Jiang, Wang, Guo and Wang [55] used the bootstrap method to calculate the PIs of water quality modeling via the wavelet-ANN approach. The uncertainty of the model structure and data noise was investigated, and it was shown that the application of the wavelet data pre-processing could lead to more accurate results. Kumar, Tiwari, Chatterjee and Mishra [39] used the combination of the bootstrap method and wavelet-ANN to quantify the uncertainty associated with the reservoir inflow forecasting. Moreover, multiple linear regression model implementation was compared to the bootstrap method and it was concluded that the performance of the bootstrap method is more reliable. It was also mentioned that PIs' estimation could provide more useful information in operational inflow forecasting. Kasiviswanathan, He, Sudheer and Tay [56] used the bootstrap technique for the quantification of the uncertainty associated with modeling streamflow and flood management. The coverage and width criteria were applied for quantification of the model's performance uncertainty. Results indicated that the bootstrap method is the proper method for streamflow forecasting and flood management. Moreover, it was mentioned that there are some limitations associated with forecasting high flow due to the lower samples and dependence of the ANN on the number of samples. Thus, it was suggested that one use hybrid modeling and integrate the data-driven models with physically-based/conceptual models and/or empirical relationships between high flows and influencing factors for the enhancement of the accuracy of the model for modeling high flow regimes.

LUBE Method
The LUBE method is based on training an ANN with two outputs for developing the PIs in one level (see Figure 4). Two outputs present the upper and lower bounds of the PIs. The proposed ANN is trained based on minimizing the defined cost function. Cost function contains both coverage and width criteria. This method is independent of special information about the PI bound. Previous studies assumed that PIs developed via this method are superior to the other PIs construction methods. In addition, its computational expense is insignificant [24]. Unlike traditional methods, LUBE method performance is independent of point prediction and it is a non-parametric technique. LUBE method application does not depend on parametric distribution of data. It is fast and simple [71].

LUBE Method
The LUBE method is based on training an ANN with two outputs for developing the PIs in one level (see Figure 4). Two outputs present the upper and lower bounds of the PIs. The proposed ANN is trained based on minimizing the defined cost function. Cost function contains both coverage and width criteria. This method is independent of special information about the PI bound. Previous studies assumed that PIs developed via this method are superior to the other PIs construction methods. In addition, its computational expense is insignificant [24]. Unlike traditional methods, LUBE method performance is independent of point prediction and it is a non-parametric technique. LUBE method application does not depend on parametric distribution of data. It is fast and simple [71]. Kasiviswanathan, Cibin, Sudheer and Chaubey [61] attempted to develop the PIs of the ANN-based rainfall-runoff modeling by generation of the ensemble predictions (similar to the LUBE method). PIs were developed at two levels. At the first level, optimum ANN parameters were obtained via the genetic algorithm. In the second step, the ensemble of the models was created by optimization of the verity of the ANN parameters. PIs were calculated by minimizing the residual variance of the ensemble mean and maximization of the covering targets. Moreover, at the same time, the minimization of the PIs' width was taken into account. It was stated that by consideration of the ensemble mean value as the output of the model, the peak flow could be predicted more precisely compared to the classic point prediction of the ANN. Taormina and Chau [62] examined the LUBE method for construction of the PIs at different confidence levels for the 6 hours ahead streamflow discharges forecasting. Particle swarm optimization was used to minimize the CWC cost function. It was shown that the obtained results depend on the used particle swarm optimization paradigm. They claimed that the multi-objective framework led to more appropriate results than single-objective swarm optimization. It was also concluded that the applied algorithm to develop the model could have a remarkable effect on the PIs quality. Zhang, Zhou, Ye, Zeng and Chen [63] applied the LUBE method to construct the PIs of flood forecasting. This study proposed a PI symmetry index and objective function to evaluate the coverage, width and symmetry of PIs. To optimize the proposed objective function, the shuffled complex evolution algorithm was applied. The mean of the bounds was used to present deterministic forecasting. Kasiviswanathan and Sudheer [51] applied the PI method (similar to the LUBE method) for quantification of the ANN modeling uncertainty. The coverage and width criteria were used to quantify the PIs. This paper compared other PI construction methods results, such as the Bayesian and bootstrap methods, and concluded that the PI method was more reliable. In addition, it was presented that the PI method could successfully capture the peak points. Kasiviswanathan, Sudheer and He [64] assessed the uncertainty associated Kasiviswanathan, Cibin, Sudheer and Chaubey [61] attempted to develop the PIs of the ANN-based rainfall-runoff modeling by generation of the ensemble predictions (similar to the LUBE method). PIs were developed at two levels. At the first level, optimum ANN parameters were obtained via the genetic algorithm. In the second step, the ensemble of the models was created by optimization of the verity of the ANN parameters. PIs were calculated by minimizing the residual variance of the ensemble mean and maximization of the covering targets. Moreover, at the same time, the minimization of the PIs' width was taken into account. It was stated that by consideration of the ensemble mean value as the output of the model, the peak flow could be predicted more precisely compared to the classic point prediction of the ANN. Taormina and Chau [62] examined the LUBE method for construction of the PIs at different confidence levels for the 6 hours ahead streamflow discharges forecasting. Particle swarm optimization was used to minimize the CWC cost function. It was shown that the obtained results depend on the used particle swarm optimization paradigm. They claimed that the multi-objective framework led to more appropriate results than single-objective swarm optimization. It was also concluded that the applied algorithm to develop the model could have a remarkable effect on the PIs quality. Zhang, Zhou, Ye, Zeng and Chen [63] applied the LUBE method to construct the PIs of flood forecasting. This study proposed a PI symmetry index and objective function to evaluate the coverage, width and symmetry of PIs. To optimize the proposed objective function, the shuffled complex evolution algorithm was applied. The mean of the bounds was used to present deterministic forecasting. Kasiviswanathan and Sudheer [51] applied the PI method (similar to the LUBE method) for quantification of the ANN modeling uncertainty. The coverage and width criteria were used to quantify the PIs. This paper compared other PI construction methods results, such as the Bayesian and bootstrap methods, and concluded that the PI method was more reliable. In addition, it was presented that the PI method could successfully capture the peak points. Kasiviswanathan, Sudheer and He [64] assessed the uncertainty associated with input and parameters for the ANN-based rainfall-runoff modeling. A two-stage optimization method was applied to estimate the PIs. Chen and Chau [66] used the LUBE method to construct the PIs of sediment load modeling via the hybrid double feedforward NN. It was demonstrated that PIs led to appropriate results in the 90% and 95% CLs. The proposed method could generate reliable PIs. It was discussed that application of the hybrid double feedforward NN could improve performance of the PIs construction in the classification of the low, medium and high sediment loads, and coverage probability about 100% for low and medium sediment loads, but its performance was weak for modeling high sediment loads. Moreover, it was concluded that the LUBE method was efficient in quantifying the uncertainty of data-driven models. Nourani, Paknezhad, Sharghi and Khosravi [57] used the LUBE method to construct the PIs associated with the ANN-based downscaling of the general circulation models. In this study, the LUBE method was applied by generating multiple sets of weights to develop narrow PIs with high coverage probability. It was indicated that the LUBE method could be successfully used to compute the PIs of ANN-based downscaling with reliable performance. Nourani, Sayyah-Fard, Alami and Sharghi [65] quantified the uncertainty of the ANN-based evaporation modeling via the LUBE method. It was claimed that the LUBE method could construct PIs with an appropriate level of reliability; however, data pre-processing methods could affect the uncertainty. This study applied simulated annealing optimization algorithms to construct PIs with higher coverage and lower width. It was mentioned that this method could overcome the problem of trapping in local minima. Kasiviswanathan, Sudheer, Soundharajan and Adeloye [67] applied upper lower bound and mean of forecasting to evaluate uncertainty in inflow modeling via the ANN for optimizing the reservoir operation and decision making. An integrated simulation-optimization was applied, which led to minimizing the error. In Figure 5, the procedure of PIs construction is depicted. with input and parameters for the ANN-based rainfall-runoff modeling. A two-stage optimization method was applied to estimate the PIs. Chen and Chau [66] used the LUBE method to construct the PIs of sediment load modeling via the hybrid double feedforward NN. It was demonstrated that PIs led to appropriate results in the 90% and 95% CLs. The proposed method could generate reliable PIs. It was discussed that application of the hybrid double feedforward NN could improve performance of the PIs construction in the classification of the low, medium and high sediment loads, and coverage probability about 100% for low and medium sediment loads, but its performance was weak for modeling high sediment loads. Moreover, it was concluded that the LUBE method was efficient in quantifying the uncertainty of data-driven models. Nourani, Paknezhad, Sharghi and Khosravi [57] used the LUBE method to construct the PIs associated with the ANN-based downscaling of the general circulation models. In this study, the LUBE method was applied by generating multiple sets of weights to develop narrow PIs with high coverage probability. It was indicated that the LUBE method could be successfully used to compute the PIs of ANN-based downscaling with reliable performance. Nourani, Sayyah-Fard, Alami and Sharghi [65] quantified the uncertainty of the ANN-based evaporation modeling via the LUBE method. It was claimed that the LUBE method could construct PIs with an appropriate level of reliability; however, data pre-processing methods could affect the uncertainty. This study applied simulated annealing optimization algorithms to construct PIs with higher coverage and lower width. It was mentioned that this method could overcome the problem of trapping in local minima. Kasiviswanathan, Sudheer, Soundharajan and Adeloye [67] applied upper lower bound and mean of forecasting to evaluate uncertainty in inflow modeling via the ANN for optimizing the reservoir operation and decision making. An integrated simulation-optimization was applied, which led to minimizing the error. In Figure 5, the procedure of PIs construction is depicted.

Comparison of the PIs Construction Methods
Various methods of PIs construction with different levels of complexity, computational burden, difficulty in implementation, reliability and required times have been developed. Undoubtedly, it is impossible to claim that a particular technique is superior to

Comparison of the PIs Construction Methods
Various methods of PIs construction with different levels of complexity, computational burden, difficulty in implementation, reliability and required times have been developed. Undoubtedly, it is impossible to claim that a particular technique is superior to the others, but each method has its own advantages. Therefore, in this section, the advantage and disadvantages of the prevalent methods are investigated. A brief comparison of each method is tabulated in Table 4. Khosravi, Nahavandi, Creighton and Atiya [14] compared the different methods of PI construction for data from various domains. It was assumed that the delta method obtains the highest quality of the PIs, but its repeatability (performance of the method in worst cases) is not acceptable. Moreover, the constructed PIs via the delta technique consist of fixed PI width. The Bayesian method is the most acceptable method in the case of reproducibility (various iterations of the method lead to similar results). Moreover, the stability of developed PIs is the other advantage of this method. The bootstrap method is favorable in the case of variability (the response of PIs to the level of uncertainty associated with data). However, the obtained PIs via the bootstrap method may lead to low quality in comparison to the delta and Bayesian methods. The variance of the outputs may be overestimated, which may cause wider PIs. In addition, it has been demonstrated that by increasing the number of training iterations, the bootstrap method might not definitely improve the results. It has also been indicated that each method has its pros and cons, and implementation of different methods by consideration of various criteria may lead to different outcomes. Kasiviswanathan and Sudheer [51] compared different PIs construction methods and showed that parameters coverage and peak flow prediction are high for the PI method and low for the bootstrap method. Fulfillment of statistical and probabilistic assumptions is low for the bootstrap method, but it is high for the Bayesian method.
The difficulty of the implementation of the bootstrap, Bayesian and LUBE methods are, respectively, medium, high and low.

Gaps and Suggestions for Future Studies
The following issues are suggested for future studies to fill the existing gaps in already performed studies. i.
Most of the PIs construction methods applied coverage and width criteria and their combination, but there is not any criterion in order to present information about the probability and reliability of the lower and upper bounds of the constructed PIs. Therefore, future studies may develop and present some criteria about this issue, for example, proximity of the target to the upper or lower bounds. ii. The reviewing of multiple studies showed that there is not any study that uses the delta method for PIs construction. Although it may have high level of computational cost, according to its reliable performance in other fields as expressed in Khosravi, Nahavandi, Creighton and Atiya [14], it is proposed that one apply this method in hydrological studies as well. iii. The performance of the LUBE method, as the most robust method of PI construction, can be improved in different aspects in order to obtain more reliable PIs. The implementation could be augmented by combining with the ANN structure selection techniques. iv. Adaptation between point prediction and PIs has not been examined, yet significantly presented. Therefore, it can be recommended that future works analyze the adaptation and correlation between point prediction and PIs construction. Moreover, some criteria can be defined to capture simultaneously. v.
It is recommended that one apply the presented methods to assess the uncertainty associated with the improved version of the ANN, such as emotional ANN [72] and to investigate the effects of hormonal parameters in the reliability of the models. vi. The LUBE method performance is based on the cost functions. Some studies used multi-objective optimization cost function in which the coverage and width criteria were considered as cost function simultaneously. In contrast, some other studies used coverage and width combination criteria as the cost function, in which some parameters should be determined by trial and error. Therefore, future studies can compare the implementation of multi-objective and single-objective optimization methods. Moreover, future studies can propose the appropriate value of CWC parameters or propose a method for better and faster determination of the parameters' values. vii. As there are few studies attempting to investigate other artificial intelligence methods such as ANFIS, it is recommended that one construct the PIs of those methods too.

Conclusions
This study investigated multiple PIs construction methods applied in hydro-climatic studies. It concluded that the bootstrap method has been used in the majority of the studies as it is simple and can be applied easily. Moreover, the LUBE method has gained noticeable attention recently in hydrological studies due to its superiority in implementation and reliability compared to other methods. Nevertheless, there are few applications of the Bayesian or delta method in the development of PIs in the hydrological issues.