Real-Time Probabilistic Flood Forecasting Using Multiple Machine Learning Methods

: Probabilistic ﬂood forecasting, which provides uncertain information in the forecasting of ﬂoods, is practical and informative for implementing ﬂood-mitigation countermeasures. This study adopted various machine learning methods, including support vector regression (SVR), a fuzzy inference model (FIM), and the k -nearest neighbors ( k -NN) method, to establish a probabilistic forecasting model. The probabilistic forecasting method is a combination of a deterministic forecast produced using SVR and a probability distribution of forecast errors determined by the FIM and k -NN method. This study proposed an FIM with a modiﬁed defuzziﬁcation scheme to transform the FIM’s output into a probability distribution, and k -NN was employed to reﬁne the probability distribution. The probabilistic forecasting model was applied to forecast ﬂash ﬂoods with lead times of 1–3 hours in Yilan River, Taiwan. Validation results revealed the deterministic forecasting to be accurate, and the probabilistic forecasting was promising in view of a forecasted hydrograph and quantitative assessment concerning the conﬁdence level.


Introduction
A real-time flood forecasting model is an essential nonstructural component in a flood warning system. It can provide timely forecasting information to authorities and the public with sufficient time for preparation and useful information for implementing flood-mitigation countermeasures. Flood forecasting is often performed deterministically [1], which means that a single estimate of flood discharge or stage is predicted. However, a deterministic forecast can leave users with an illusion of certainty, which can lead them to take inadequate measures [2]. Therefore, a probabilistic forecast that specifies a probability distribution pertaining to the predictand is more practical and has thus gained attention in flood forecasting [1].
Many methods are used to perform probabilistic flood forecasting. For example, the Bayesian forecasting system that integrates prior and posterior information using Bayes' theorem has been adopted [3][4][5][6]. Furthermore, multimodel ensemble methods have been used, which produce several forecasts based on different models [7][8][9]. Moreover, researchers have adopted generalized likelihood uncertainty estimation, which is based on the idea that many different parameters may yield equally satisfactory estimates [10][11][12][13][14][15]. The present study adopted a method that combines deterministic forecasts and the probability distribution of forecast errors to produce probabilistic forecasts [1]. Such a method was adopted because forecast-error data can quantify the total uncertainty of forecasting. Montanari and Brath [16], Tamea et al. [17], and Weerts et al. [18] have used similar methods based on processing past forecast-error data to produce probability distributions for future forecasts. On

Probabilistic Forecasting Method
This study applied the probabilistic forecasting method based on the methodology proposed by Chen and Yu [1], but modified it using the k-NN method to refine the probability distribution. A probabilistic forecast was obtained by combining deterministic forecasting with the probability distribution of forecast errors. A concise description of the method is presented as follows, and the detailed methodology can be found in Chen and Yu [1].
The forecast error (E) is defined as the difference between the deterministic forecast (F) and the observation (O).
Given that O represents an observation without uncertainty, the variance of deterministic forecast F is the same as that of the error forecast.
var(F) = var(E) (2) Thus, the uncertainty of the forecast can be deduced from the uncertainty of forecast errors. Then, the probability distribution of the forecast Π F can be derived by adding the probability distribution of forecast errors Π E into the single deterministic forecast F.
Consequently, the probabilistic forecasting results can be demonstrated as a confidence interval with a certain confidence level from the probability distribution Π F .

Support Vector Regression
This study used SVR to provide deterministic forecasts. SVR is a regression procedure based on a SVM that utilizes the structural risk minimization induction principle to minimize the expected risk based on limited data [45]. Detailed descriptions of SVM theory can be found in the literature [26,46]. A brief methodology of SVR is described as follows.
SVR finds the optimal nonlinear regression function f (x) according to r data sets (x i , y i ), where x i are input vectors and y i are corresponding output variables (i = 1, . . . , r). The SVR function can be written as where w is the weight vector, b is bias, and Φ is a nonlinear function that maps the original data onto a higher dimensional space, in which the input-output data can exhibit linearity. The calibration of the regression function involves an error tolerance ε when calculating the loss L, which is defined using Vapnik's ε-insensitive loss function.
This regression problem can be formulated as a convex optimization problem, in which a dual set of Lagrange multipliers, α i and α * i , are introduced to solve the problem by applying a standard quadratic programming algorithm. As a result, the SVR function can be written as Then, a kernel function K x i , x j = Φ(x i ) T ·Φ x j is used to yield the inner products in the higher dimensional space to ease the calculation. This study used the radial basis function kernel with a parameter γ as the kernel function.
From Equation (6), only data with nonzero Lagrange multipliers α i − α * i are used in the final regression function, and these data are termed support vectors. Finally, the regression function can be formulated as where x k denotes the support vector and m represents the number of support vectors.

Fuzzy Inference Model
This study adopted an FIM with a defuzzification method to infer the probability distribution of forecast errors (the difference between the observation and forecast). The FIM included four steps (Figure 1), which are described as follows.
(1) Fuzzification Fuzzification is a process that converts a crisp value (numerical value) to a fuzzy variable through a fuzzy membership function. Some membership functions are widely used, such as the triangular, trapezoidal, and Gaussian functions. In this study, the Gaussian membership function (Equation (9)) was used because of its easier differentiability compared with the triangular membership function [1]. The Gaussian function also exhibits superior performance compared with the trapezoidal function, and also demonstrates a smoother transition in its intervals [47].
where µ is the membership grade of the crisp input x; σ is the dispersion of the function; and m is the center of the function. where is the membership grade of the crisp input ; is the dispersion of the function; and is the center of the function. (2) Fuzzy IF-THEN rules The IF-THEN rule is used to formulate the conditional statement in the FIM. The fuzzy IF-THEN rule can be expressed as follows:

IF ( is ) and ( is ) and ⋯ and ( is ) THEN ( is ) and ( is ) and ⋯ and ( is )
where (i = 1, 2, …, p) and (j = 1, 2, …, q) indicate the crisp input and output variables, respectively; and (i = 1, 2, …, p) and (j = 1, 2, …, q) are fuzzy sets. The IF part of the rule is called the antecedent or premise, and the THEN part is the consequence or conclusion.
(3) Fuzzy inference Fuzzy inference calculates the similarity between crisp inputs and fuzzy sets in fuzzy rules. It involves two procedures, namely implication and aggregation. Implication generates a fuzzy set in the consequence part for each fuzzy rule, whereas aggregation integrates the output fuzzy sets in all fuzzy rules into an aggregated fuzzy set. More details can be found in Yu and Chen [48] and Chen [49].
(4) Defuzzification The fuzzy output obtained in the previous step is an aggregated fuzzy set that should be defuzzified to a crisp value * .
The centroid method, which directly computes the crisp output as a weighted average of membership grades, was used in the present study. * = ∑ × ∑ (11) where is the fuzzy membership grade of the output variable of the i-th rule and n is the number of rules.

Defuzzification Into a Probability Distribution
The higher value of in Equation (11) indicates that the of the i-th rule imposes a higher weight of influence on the output * . Chen and Yu [1] proposed a probability interpretation of defuzzification on the basis of basic defuzzification distribution transformation [50]. Thus, the Step 4 Fuzzification Fuzzify x 1 , x 2 , … to fuzzy sets by membership function

Fuzzy IF-THEN rules
Formulating fuzzy rule in the form of IF-THEN statement Step 2 (2) Fuzzy IF-THEN rules The IF-THEN rule is used to formulate the conditional statement in the FIM. The fuzzy IF-THEN rule can be expressed as follows: IF (x 1 is A 1 ) and (x 2 is A 2 ) and · · · and x p is A p THEN (y 1 is B 1 ) and (y 2 is B 2 ) and · · · and y q is B q (10) where x i (i = 1, 2, . . . , p) and y j (j = 1, 2, . . . , q) indicate the crisp input and output variables, respectively; and A i (i = 1, 2, . . . , p) and B j (j = 1, 2, . . . , q) are fuzzy sets. The IF part of the rule is called the antecedent or premise, and the THEN part is the consequence or conclusion.
(3) Fuzzy inference Fuzzy inference calculates the similarity between crisp inputs and fuzzy sets in fuzzy rules. It involves two procedures, namely implication and aggregation. Implication generates a fuzzy set in the consequence part for each fuzzy rule, whereas aggregation integrates the output fuzzy sets in all fuzzy rules into an aggregated fuzzy set. More details can be found in Yu and Chen [48] and Chen [49].
(4) Defuzzification The fuzzy output obtained in the previous step is an aggregated fuzzy set that should be defuzzified to a crisp value y * .
The centroid method, which directly computes the crisp output as a weighted average of membership grades, was used in the present study.

of 13
where µ i is the fuzzy membership grade of the output variable y i of the i-th rule and n is the number of rules.

Defuzzification Into a Probability Distribution
The higher value of µ i in Equation (11) indicates that the y i of the i-th rule imposes a higher weight of influence on the output y * . Chen and Yu [1] proposed a probability interpretation of defuzzification on the basis of basic defuzzification distribution transformation [50]. Thus, the defuzzification process in the FIM can be converted to produce a probability distribution. For the detailed theory and process of defuzzification into a probability distribution, please refer to Chen and Yu [1].

k-Nearest Neighbors Method
The k-NN method, an effective nonparametric technique, was utilized in this study to smooth the derived probability distribution from the previous step because it may have a rough shape. The k-NN algorithm typically picks a certain number of data items that are closer to the object. Given an object data vector of z = (z 1 , z 2 , · · · , z m ), the k-NN algorithm selects k data items that are nearest to the object according to the similarity or distance D from the candidate data z to the object data z.

Study Area and Data
In this study, Yilan River ( Figure 2) in Taiwan was selected as the study area. There are some river level stations in the Yi-Lan River basin. The Liwu station that has relatively more complete records than other stations was used in this study. Therefore, the flood stage at Liwu station was the target for real-time forecasting. The Liwu basin has an area of 108.1 km 2 . Hourly rainfall data from six gauges and hourly river stage data at Liwu station from 2012 to 2018 were collected, and 15 flood events with complete rainfall and stage data were obtained. The collected 15 flood events were divided into a calibration set with 10 events and a validation set with 5 events. Table 1 lists the flood events with information on the source event (typhoon or storm), date of occurrence, rainfall duration, peak flood stage, and total rainfall amount. Spatially averaged rainfall was calculated from six gauges using the Thiessen polygon method, and was used as the rainfall variable in the study. six gauges and hourly river stage data at Liwu station from 2012 to 2018 were collected, and 15 flood events with complete rainfall and stage data were obtained. The collected 15 flood events were divided into a calibration set with 10 events and a validation set with 5 events. Table 1 lists the flood events with information on the source event (typhoon or storm), date of occurrence, rainfall duration, peak flood stage, and total rainfall amount. Spatially averaged rainfall was calculated from six gauges using the Thiessen polygon method, and was used as the rainfall variable in the study.

Deterministic Model Development and Forecasting
This study used SVR to perform deterministic forecasting of the flood stage at Liwu station. The observed rainfall and river stages at Liwu station during flood events were selected as input variables because of the strong relationship between these data and future stages. As the cross-sections of a river change markedly during a flood, the absolute river stage may not provide appropriate information for discriminating floods. Instead, the river stage increment, which is the river stage relative to the initial stage at the beginning of a flood event, is a more relevant variable for determining flood magnitude. Thus, the river stage increment relative to the initial stage was chosen as the stage variable for this study. Specifically, the initial stage was subtracted from the river stage data, and the obtained residual (stage increment) was the stage variable used in this study. The forecasted absolute river stage can be simply obtained by adding the forecasted stage increment to the initial stage.
Furthermore, the data of spatially averaged rainfall (R) and stage increment (S) were normalized to a range of [0, 1]. The benefit of using normalized data is avoiding one variable dominating others when the differences in their values are notable. Bray and Han [25] demonstrated that the SVR model with normalized data outperformed that without normalized data. The time lags between rainfall and stage had to be identified to construct the forecasting model; therefore, the correlation coefficients among lagged variables were calculated to identify relevant input variables for the forecasted variables [26,51]. The correlated time lags between stage and rainfall were 1-5 hours and the correlated time lags for the stage itself were 1-3 hours. Therefore, the SVR stage forecasting model in this study had eight inputs, and the model structure can be expressed as follows: whereŜ is the forecasted flood stage; S is the observed flood stage; R is the observed average rainfall; f SVR indicates the SVR model; and the subscript t is the time index. The SVR model was calibrated using normalized data, and the original model outputs were the flood stages in the normalized scale. The output flood stages were transformed to their actual scale to match the observations. The SVR Water 2020, 12, 787 7 of 13 model, Equation (13), was established using data from 10 calibration flood events. This study used the root mean square error (RMSE) as the objective function to optimize the SVR parameters.
where n is the number of data. During the calibration phase, the SVR model simulated flood events with an RMSE value of 0.07 m, indicating that the model was well calibrated. The SVR model, Equation (13), was used to perform real-time deterministic forecasting with a lead time of 1 hour regarding the five validation events. To perform multiple-hour-ahead forecasting, Equation (13) could be used in a recursive form as in Equations (15) and (16), where the future stage inputsŜ t+2 andŜ t+1 can be available from the forecasted data and the future rainfall is obtained from naïve forecasts; that is, Figures 3 and 4 present the deterministic forecasting results of the flood hydrographs for Events 11 and 15, respectively. Basin average rainfall and forecast error are also presented in the figures. The forecasted hydrographs were close to the observed ones and the forecast errors were small, indicating that the deterministic forecasting model could effectively perform real-time forecasting with lead times of 1-3 hours. To evaluate the forecasting performance in an objective manner, statistical indices, the RMSE, and the coefficient of efficiency (CE), were calculated with respect to validation events.
where S is the average of the observation stage. The CE value being closer to unity indicated good model performance. Table 2 lists the statistical indices of the RMSE and CE with respect to multiple-hour-ahead forecasting for validation events. The low RMSE and high CE values confirmed that the SVR model could effectively perform deterministic forecasting.
Water 2020, 12, 787 7 of 13 calibrated using normalized data, and the original model outputs were the flood stages in the normalized scale. The output flood stages were transformed to their actual scale to match the observations. The SVR model, Equation (13), was established using data from 10 calibration flood events. This study used the root mean square error (RMSE) as the objective function to optimize the SVR parameters.

RMSE = ∑ ( − )
where is the number of data. During the calibration phase, the SVR model simulated flood events with an RMSE value of 0.07 m, indicating that the model was well calibrated. The SVR model, Equation (13), was used to perform real-time deterministic forecasting with a lead time of 1 hour regarding the five validation events. To perform multiple-hour-ahead forecasting, Equation (13) could be used in a recursive form as in Equations (15) and (16), where the future stage inputs and can be available from the forecasted data and the future rainfall is obtained from naïve forecasts; that is, Figures 3 and 4 present the deterministic forecasting results of the flood hydrographs for Events 11 and 15, respectively. Basin average rainfall and forecast error are also presented in the figures. The forecasted hydrographs were close to the observed ones and the forecast errors were small, indicating that the deterministic forecasting model could effectively perform real-time forecasting with lead times of 1-3 hours. To evaluate the forecasting performance in an objective manner, statistical indices, the RMSE, and the coefficient of efficiency (CE), were calculated with respect to validation events.
where ̅ is the average of the observation stage. The CE value being closer to unity indicated good model performance. Table 2 lists the statistical indices of the RMSE and CE with respect to multiplehour-ahead forecasting for validation events. The low RMSE and high CE values confirmed that the SVR model could effectively perform deterministic forecasting.

Probabilistic Model Development
This section describes the use of the FIM to obtain error probability distributions. The deterministic forecasting model was originally used with normalized data. Thus, fuzzy inference was also performed with data in a normalized scale. First, the data regarding the input variables were transformed into fuzzy sets by applying a fuzzy membership function, namely the Gaussian membership function in Equation (9). The parameter of the membership function is the dispersion parameter . As parameter indicates the dispersion of the data, adopting the standard deviation of the data as the parameter is logical. The parameters of the fuzzy membership function for rainfall (R) and stage (S) were 0.18 and 0.14, respectively, at their normalized scale.
Subsequently, the fuzzy rules were used to formulate a conditional statement to infer the forecast errors. The deterministic model used eight inputs (see those in Equation (13)) to produce the forecasts. Therefore, the forecast errors were dependent on the used inputs. Using these inputs in the premise of the fuzzy rule is rational; however, too many variables in the premise may lead to an inadequate implication. Moreover, lagged input variables can contain the same information when naïve forecasting is applied for forecasting with longer lead times. Therefore, the premise part was simplified using the most relevant variables of rainfall and stage at time . That is, the variables and were used in the premise to infer the forecast errors with lead times of 1-3 hours. Accordingly, the fuzzy rule was formulated as

IF ( is ) and ( is )
where and , are fuzzy sets defined by the Gaussian fuzzy membership function pertaining to variables and , respectively; , , and are inferred forecast errors with lead times of 1-3 hours; and , , and are the respective fuzzy sets of forecast errors.

Probabilistic Model Development
This section describes the use of the FIM to obtain error probability distributions. The deterministic forecasting model was originally used with normalized data. Thus, fuzzy inference was also performed with data in a normalized scale. First, the data regarding the input variables were transformed into fuzzy sets by applying a fuzzy membership function, namely the Gaussian membership function in Equation (9). The parameter of the membership function is the dispersion parameter σ. As parameter σ indicates the dispersion of the data, adopting the standard deviation of the data as the parameter is logical. The parameters of the fuzzy membership function for rainfall (R) and stage (S) were 0.18 and 0.14, respectively, at their normalized scale.
Subsequently, the fuzzy rules were used to formulate a conditional statement to infer the forecast errors. The deterministic model used eight inputs (see those in Equation (13)) to produce the forecasts. Therefore, the forecast errors were dependent on the used inputs. Using these inputs in the premise of the fuzzy rule is rational; however, too many variables in the premise may lead to an inadequate implication. Moreover, lagged input variables can contain the same information when naïve forecasting is applied for forecasting with longer lead times. Therefore, the premise part was simplified using the most relevant variables of rainfall and stage at time t. That is, the variables R t and S t were used in the premise to infer the forecast errors with lead times of 1-3 hours. Accordingly, the fuzzy rule was formulated as where F s and F R , are fuzzy sets defined by the Gaussian fuzzy membership function pertaining to variables S t and R t , respectively; E t+1 , E t+2 , and E t+3 are inferred forecast errors with lead times of 1-3 hours; and F E1 , F E2 , and F E3 are the respective fuzzy sets of forecast errors. When stage and rainfall data S t and R t at present time t were available, fuzzy inference could be conducted to derive the aggregated output fuzzy set. The proposed defuzzification approach was then employed to obtain the probability distributions of forecast errors. After defuzzification is performed, the derived probability distribution may demonstrate a rough-shaped curve. To solve this problem, this study applied the k-NN method to smooth the rough probability curve. The smoothing process generates numerous data to form a smooth curve to replace the original probability distribution's rough curve. A resampling technique was adopted to implement the smoothing. At each sampling time, an object data item was randomly selected from the original probability distribution. Then, the k-NN method was applied to select three data items nearest to the object from the original probability distribution (k was set to 3 herein). The mean and standard deviation of the three data items were calculated to construct an interval, of which the upper and lower boundaries were the mean ± standard deviation, respectively. Next, a number within the interval was randomly picked to become an adjusted value to refine the probability distribution. The resampling process was repeated 10,000 times to produce a smooth probability curve with 10,000 values. With the derived smooth probability distribution, different confidence levels could be used to form the predictive confidence interval (CI). Figure 5 presents the forecast errors and the predicted 90% CI pertaining to five validation events with continuous data sequences. The confidence region could include most of the forecast errors, indicating that the CI practically covers the uncertainty of the forecast errors. The confidence region is extended with the forecast lead time, which is also rational in light of the uncertainty in forecasting.
Water 2020, 12, 787 9 of 13 When stage and rainfall data and at present time were available, fuzzy inference could be conducted to derive the aggregated output fuzzy set. The proposed defuzzification approach was then employed to obtain the probability distributions of forecast errors. After defuzzification is performed, the derived probability distribution may demonstrate a rough-shaped curve. To solve this problem, this study applied the k-NN method to smooth the rough probability curve. The smoothing process generates numerous data to form a smooth curve to replace the original probability distribution's rough curve. A resampling technique was adopted to implement the smoothing. At each sampling time, an object data item was randomly selected from the original probability distribution. Then, the k-NN method was applied to select three data items nearest to the object from the original probability distribution (k was set to 3 herein). The mean and standard deviation of the three data items were calculated to construct an interval, of which the upper and lower boundaries were the mean ± standard deviation, respectively. Next, a number within the interval was randomly picked to become an adjusted value to refine the probability distribution. The resampling process was repeated 10,000 times to produce a smooth probability curve with 10,000 values. With the derived smooth probability distribution, different confidence levels could be used to form the predictive confidence interval (CI). Figure 5 presents the forecast errors and the predicted 90% CI pertaining to five validation events with continuous data sequences. The confidence region could include most of the forecast errors, indicating that the CI practically covers the uncertainty of the forecast errors. The confidence region is extended with the forecast lead time, which is also rational in light of the uncertainty in forecasting.

Probabilistic Forecasting Results
The probabilistic flood-stage forecasting results were obtained by adding the probability distribution of forecast errors to the deterministic stage forecasts. Figure 6 illustrates the probabilistic flood-stage forecasting results with 90% CIs for validation events. For 1-hour forecasting, the confidence regions covered most of the observed data well with a narrow span, indicating that the proposed probabilistic forecasting method was both correct and useful. The CI range widened when the lead times increased to 2-3 hours. Moreover, the predictive CIs around the peak stage broadened sharply, and the upper boundary of the 90% CI was large. This meant that the predictive CI was less

Probabilistic Forecasting Results
The probabilistic flood-stage forecasting results were obtained by adding the probability distribution of forecast errors to the deterministic stage forecasts. Figure 6 illustrates the probabilistic flood-stage forecasting results with 90% CIs for validation events. For 1-hour forecasting, the confidence regions covered most of the observed data well with a narrow span, indicating that the proposed probabilistic forecasting method was both correct and useful. The CI range widened when the lead times increased to 2-3 hours. Moreover, the predictive CIs around the peak stage broadened sharply, and the upper boundary of the 90% CI was large. This meant that the predictive CI was less practical around the peak, because a larger CI indicates less confidence in the object of interest. Nevertheless, the proposed probabilistic forecasting model reflected the existence of greater uncertainty around the peak flood. assess the probabilistic forecasting results. Figure 7 plots the percentages of observed flood stages that were included in the CI for confidence levels of 10%, 20%, …, 90%, 95%, and 99%, with respect to validation events. The percentages of included data closely matched the confidence levels, because the points on the graph were close to the 45° line. Scrutinizing the data regarding different lead times revealed that the probabilistic forecasting performance only decreased marginally with an increase in lead times. This suggested that the probabilistic forecasts with longer lead times were not inferior to those with shorter lead times in terms of this assessment measure. The capability of the proposed probabilistic forecasting model that involves using multiple machine learning methods is promising.  If the predictive probability distribution can effectively explain the forecasting uncertainty, then the percentage of data included in the CI will be identical to the confidence level. Therefore, the quantity of data that are correctly enclosed within the confidence region can be used as a guide to assess the probabilistic forecasting results. Figure 7 plots the percentages of observed flood stages that were included in the CI for confidence levels of 10%, 20%, . . . , 90%, 95%, and 99%, with respect to validation events. The percentages of included data closely matched the confidence levels, because the points on the graph were close to the 45 • line. Scrutinizing the data regarding different lead times revealed that the probabilistic forecasting performance only decreased marginally with an increase in lead times. This suggested that the probabilistic forecasts with longer lead times were not inferior to those with shorter lead times in terms of this assessment measure. The capability of the proposed probabilistic forecasting model that involves using multiple machine learning methods is promising.
Water 2020, 12, 787 10 of 13 practical around the peak, because a larger CI indicates less confidence in the object of interest. Nevertheless, the proposed probabilistic forecasting model reflected the existence of greater uncertainty around the peak flood. If the predictive probability distribution can effectively explain the forecasting uncertainty, then the percentage of data included in the CI will be identical to the confidence level. Therefore, the quantity of data that are correctly enclosed within the confidence region can be used as a guide to assess the probabilistic forecasting results. Figure 7 plots the percentages of observed flood stages that were included in the CI for confidence levels of 10%, 20%, …, 90%, 95%, and 99%, with respect to validation events. The percentages of included data closely matched the confidence levels, because the points on the graph were close to the 45° line. Scrutinizing the data regarding different lead times revealed that the probabilistic forecasting performance only decreased marginally with an increase in lead times. This suggested that the probabilistic forecasts with longer lead times were not inferior to those with shorter lead times in terms of this assessment measure. The capability of the proposed probabilistic forecasting model that involves using multiple machine learning methods is promising.

Conclusions
Probabilistic forecasts are more informative and helpful than deterministic forecasts in practical flood forecasting. This study developed a real-time probabilistic forecasting model for impending flash floods using various machine learning methods. The probabilistic forecasting method combines deterministic forecasting and the probability distribution of forecast errors. SVR was employed to provide deterministic forecasts, and an FIM with a modified scheme for defuzzification was applied to deduce the predictive probability distribution of forecast errors. A resampling scheme with the k-NN method was used to refine the predictive probability distribution. The probabilistic forecasting results could thus be presented using a CI.
The proposed methodology was applied to perform probabilistic flood-stage forecasting with lead times of 1-3 hours in Taiwan's Yilan River. Correlation analysis was performed to determine the lagged inputs, and a recursive form of the model was established to perform multiple-hour-ahead forecasts. The SVR performed deterministic forecasting well, as was indicated by the low RMSE and high CE values. The probabilistic forecasting results were agreeable because the 90% CI could cover most of the observations with a narrow band width. To objectively assess the probabilistic forecasting performance, this study adopted a quantitative measure that calculated the percentage of observations included in the predictive CI. The percentages of included data closely matched the confidence levels, suggesting the capability of the proposed probabilistic forecasting model that involves using multiple machine learning methods.