Prediction of River Stage Using Multistep-Ahead Machine Learning Techniques for a Tidal River of Taiwan

Time-series prediction of a river stage during typhoons or storms is essential for flood control or flood disaster prevention. Data-driven models using machine learning (ML) techniques have become an attractive and effective approach to modeling and analyzing river stage dynamics. However, relatively new ML techniques, such as the light gradient boosting machine regression (LGBMR), have rarely been applied to predict the river stage in a tidal river. In this study, data-driven ML models were developed under a multistep-ahead prediction framework and evaluated for river stage modeling. Four ML techniques, namely support vector regression (SVR), random forest regression (RFR), multilayer perceptron regression (MLPR), and LGBMR, were employed to establish data-driven ML models with Bayesian optimization. The models were applied to simulate river stage hydrographs of the tidal reach of the Lan-Yang River Basin in Northeastern Taiwan. Historical measurements of rainfall, river stages, and tidal levels were collected from 2004 to 2017 and used for training and validation of the four models. Four scenarios were used to investigate the effect of the combinations of input variables on river stage predictions. The results indicated that (1) the tidal level at a previous stage significantly affected the prediction results; (2) the LGBMR model achieves more favorable prediction performance than the SVR, RFR, and MLPR models; and (3) the LGBMR model could efficiently and accurately predict the 1–6-h river stage in the tidal river. This study provides an extensive and insightful comparison of four data-driven ML models for river stage forecasting that can be helpful for model selection and flood mitigation.


Introduction
Accurate river stage forecasting is a crucial component in the flood early warning system and plays a key role in flood disaster mitigation. Taiwan had an average of four to five typhoons per year over the past 10 years [1]. Typhoon-induced floods can frequently cause considerable social and economic losses. For example, Typhoon Morakot hit Taiwan in 2009, resulting in a torrential rainfall of 2748 mm in only 72 h [2]. Such an extreme rainfall caused compound hazards, such as floods, river overflows, landslides, river embankment failures, and driftwood accumulation. Typhoon Morakot caused approximately 680 casualties and approximately NT$90 billion in expenses for direct damage [3]. Since Typhoon Morakot, more intensive investigations, analyses, and developments for disaster prevention have been conducted to better understand disaster risk assessment. The flood warning system is a vital mitigation technique during natural disasters that can be used by river managers to make decisions before the arrival of a typhoon. Therefore, studies on accurate and reliable river stage forecasting are required to reduce the impact of flood disasters.
Two main approaches are used to establish flood prediction models. The first approach involves the use of a flood dynamic process to perform mathematical modeling. This

1.
This study contributes to improving forecasting performance by revealing the optimal combinations of input variables, such as rainfall, water level, and tidal stage.

2.
This is the first study to propose a direct multistep forecasting model based on LGBMR with Bayesian optimization for flood forecasting with a lead time of 1-6 h. 3.
The present study comprehensively assessed and compared the performance of four models (SVR, RFR, MLPR, and LGBMR) for forecasting the water level in a tidal river.

Data-Driven Model for River Stage Forecasting
The main process in data-driven modeling is called "the learning stage," in which the relationship between a system's input and output variables is constructed [9]: (1) with available data [(x 1 , y 1 ), (x 2 , y 2 ), . . . (x n , y n )] = {x i , y i } n i=1 (2) in which x is the input vector, y is the desired output, n is the number of data, and f is the regression function. The general representative approaches for time-series forecasting include direct and recursive multistep forecasting [44]. Compared with the recursive approach, the direct approach is simpler and easier to employ. In addition, the direct approach does not produce any significant prediction errors during the forecasting process. Therefore, the direct approach is employed in Equation (1) for river stage modeling, yielding the equation on which the data-driven model is based: where t is the current time, ∆t is the lead time,Ĥ t+∆t is the forecasted river stage at time t + ∆t, L denotes the lag length of the input variables, R t−L is the antecedent rainfall at time t − L, H t−L is the antecedent river stage at time t-L, and S t−L is the antecedent tidal level at time t − L. Following the approach adopted by Wang et al. [45], the lag length was set as 6 h in the present study; this lag length takes into consideration the concentration time of a watershed. To investigate the lead time, the lead time commonly applied in hydrology modeling of 1-6 h was used in this study [45][46][47]. In Equation (3), four ML techniques were applied to construct the data-driven models. ML techniques can be used to solve classification or regression problems. This study focused on forecasting water levels, which is a nonlinear regression problem. The regression algorithms of four ML techniques are presented in Sections 2.2-2.5. Figure 1 displays a conceptual diagram of the four ML techniques.

SVR
As indicated in Equation (3), a suitable ML technique for constructing the regression function f is required. The SVR approach proposed by Drucker et al. [48] was employed herein for nonlinear regression. The regression function of SVR can be expressed as follows [46,49]: where w is the weight vector, is the nonlinear mapping function, and b is the bias term. According to the fundamental concept of structural risk minimization to prevent overfitting, Equation (4) can be further expressed as follows: min , , , * 1 2 ‖w‖ + C (ξ + ξ * ) (5) subject to − . ( ) + ≤ + . ( ) + − ≤ + * ≥ 0, * ≥ 0, = 1, 2, … , (6) where C denotes the cost parameter or penalty parameter, and * are nonnegative slack variables, and is the parameter of the insensitive loss function. On the basis of Lagrange multipliers, the optimization problem of SVR can be written as a dual pattern [50]: K(x 1 , x)

SVR
As indicated in Equation (3), a suitable ML technique for constructing the regression function f is required. The SVR approach proposed by Drucker et al. [48] was employed herein for nonlinear regression. The regression function of SVR can be expressed as follows [46,49]: where w is the weight vector, φ is the nonlinear mapping function, and b is the bias term. According to the fundamental concept of structural risk minimization to prevent overfitting, Equation (4) can be further expressed as follows: where C denotes the cost parameter or penalty parameter, ξ and ξ * are nonnegative slack variables, and is the parameter of the insensitive loss function. On the basis of Lagrange multipliers, the optimization problem of SVR can be written as a dual pattern [50]: Water 2021, 13, 920 6 of 26 in which α and α * are Lagrange multipliers and K is the kernel function. In this study, a commonly used radial basis function was employed to estimate the kernel function. Detailed descriptions of the SVR methodology can be found in the literature [51,52].

RFR
The RFR approach proposed by Breiman [53] is a tree-based ensemble ML technique based on the combination of bagging (bootstrap aggregation) and the random subspace method. In the training process, the binary recursive partitioning of classification and the regression tree is used to build each decision tree. Once a forest has been constructed, predictions from each tree in the forest are combined as the final result. The advantages of RFR are its simplicity and the low number of required parameters. The RFR algorithm, as shown in Figure 1b, is summarized as follows [20,26,27,54]: On the basis of the bootstrap method, a subset of samples is randomly produced with replacements from the original dataset.

2.
These bootstrap samples are employed to construct regression trees. The optimal split criterion is used to split each node of the regression trees into two descendant nodes. The process on each descendant node is continued recursively until a termination criterion is fulfilled.

3.
Each regression tree provides a predicted result. Once all of the regression trees have reached their maximum size, the final prediction is determined as the average of the results from all of the regression trees: tr (x) (8) in which tr is the number of trees, N tree is the maximum size of the trees, andĥ tr denotes the prediction of each regression tree. Detailed descriptions of RFR have been provided in previous studies [55,56].

MLPR
MLPR, which belongs to the feed-forward neural network, includes three layers: input, hidden, and output layers ( Figure 1c). The neural network in MLPR consists of neurons, biases assigned to neurons, connections among neurons, and weights connecting neurons. Mathematically, the regression function of MLPR can be expressed as follows [57,58]: where c r denotes the bias of the r-th output neuron, u qr is the weight connecting the q-th neuron in the hidden layer to the r-th neuron in the output layer, and a q (x) represents the activation function of the hidden neuron, which can be expressed in terms of F: in which d q is the bias of the q-th hidden neuron, x p is the input variable, and v pq is the weight connecting the p-th neuron in the input layer to the q-th neuron in the hidden layer. Several types of activation functions can be employed, including linear, sigmoid, hyperbolic tangent (tanh), and rectified linear unit (ReLU) functions. In the training process of MLPR, the back-propagation algorithm is used for adjusting the weights connecting neurons to minimize errors [19,59]. Details regarding the theory of MLPR have been provided in previous studies [60][61][62].

LGBMR
LGBMR uses four main algorithms to improve computational efficiency and prevent overfitting: gradient-based one-side sampling (GOSS), exclusive feature bundling (EFB), a histogram-based algorithm, and a leaf-wise growth algorithm [41,63]. As shown in Figure 1d, the leaf-wise growth algorithm allows the identification of the leaf node with the largest split gain, enabling the management of big data while preventing overfitting. In addition, LGBMR adopts the histogram-based decision tree algorithm to divide continuous floating-point features into variety intervals for reduce the computational power required for prediction. Moreover, GOSS and EFB are used to reduce the number of samples for accelerating the training process of LGBMR.
The objective function of LGBMR can be written as follows: where l is the loss function, Ω is the regularization term of a decision tree f i at the t time iteration, y i is the true (objective) value, andŷ i is the predicted value. On the basis of the boosting algorithm, Equation (11) can be further expressed as follows: whereŷ t−1 i is the predicted value at the t − 1 step model and f t (x i ) denotes the new predicted value at the t-th step. To solve the objective function, the Newton method is employed to simplify Equation (12) into the following equation: where g i and h i are, respectively, the first and second derivatives of the loss function, which can be expressed as follows: Samples in the regression trees are related to leaf nodes. The final value of loss can be determined from the accumulation of the loss values of the leaf nodes. Thus, with the use of I j to represent the sample of leaf j, Equation (13) can be rewritten as where T is the total number of regression trees, w is the weight of the lead node, and λ is the regularization parameter. To conclude, the optimal objective function can be solved through minimization of the quadratic function. Detailed descriptions of LGBMR have been provided in previous studies [39,41,63].

Bayesian Optimization and Cross-Validation
The prediction performance of data-driven models can be considerably affected by the selection of parameters. To use the training data set for construction of the models, training parameters (hyperparameters) must be selected. In general, several methods can be adopted for the tuning of hyperparameters, including grid search, random search, and the use of genetic algorithms [64]. Grid and random search are traditional parameter optimization methods in ML; however, they require brute-force search or certain experi-Water 2021, 13, 920 8 of 26 ence [65]. In addition, genetic algorithms can be applied to search for optimal parameters; however, they require several control variables and have a poor local search capacity [66].
Another favorable choice is Bayesian optimization, which has been demonstrated to be more efficient than genetic algorithms in practice. Bayesian optimization is primarily based on the concept of employing a Gaussian process to establish and optimize a substitute function with consideration of the previous evaluation results of the objective function. In this study, Bayesian optimization was employed and combined with 10-fold crossvalidation to enhance the prediction accuracy. Bayesian optimization was implemented for model training according to the following steps [39,40]: The objective function was defined, and the interval range of the parameters was set.

2.
During the training process of the four models, the indicator of the mean square error was used to evaluate the result of each parameter combination. 3.
The optimal combination of parameters could be determined through 10-fold crossvalidation.

4.
With these optimal parameters, the model was finally constructed, and the test dataset was used for river stage prediction.

Performance Evaluation Criteria
To quantitatively evaluate the performance of the four models, the following four evaluation criteria were employed [67-71]: 1.
Mean absolute error (MAE)

4.
Root mean square error (RMSE) where H mea i and H pre i are, respectively, the measured and predicted river stages and H mea and H pre are, respectively, the mean of the measured and predicted river stages.
The NSE value may range from -∞ to 1. The closer the value of NSE to 1, the better the prediction ability during modeling. A negative NSE value indicates considerably poor prediction performance. An R 2 value ranging from 0 to 1 is used to represent the relationship between measured and predicted river stages. An R 2 value of 0 indicates no relation, while the value of 1 denotes the predicted values equal to the measured. MAE and RMSE are indicators showing the number of errors obtained by a model, and the optimal value of MAE and RMSE is zero.
To further evaluate the capability of the four models for river stage modeling, two additional error indexes were employed:
Error of time-to-peak water level (ETP) where H mea p and H pre p are, respectively, the measured and predicted peak river stages and T mea p and T pre p denote the measured and predicted time to the peak river stage, respectively. The closer the values of PWE and ETP are to 0, the higher the accuracy of the model is.

Flowchart for Training and Validation
Figure 2 displays a flowchart of the process of establishing and evaluating the four models, which involved three main steps: preprocessing, ML, and validation. In the preprocessing step, time-series data were collected, including rainfall, water level, and tidal stage data. Subsequently, the collected data were divided into training and test datasets. The training datasets were employed to establish the data-driven models using ML. Then, the test datasets were utilized to evaluate the performance of the constructed (i.e., trained) models for the purpose of validation. To prevent the obtainment of inaccurate prediction results, this study employed the commonly used min-max normalization method to normalize the collected data into the range of (−1, 1). After preprocessing, the optimal combinations of input vectors were investigated, and Bayesian optimization with 10-fold cross-validation was employed to enhance the training results. After the establishment of the four models, the test datasets were used as inputs and the river stage was accordingly forecasted. To assess whether the four models produced accurate and reliable results, several indicators were used to evaluate the performance of the proposed models.
2. Error of time-to-peak water level (ETP) where and are, respectively, the measured and predicted peak river stages and and denote the measured and predicted time to the peak river stage, respectively. The closer the values of PWE and ETP are to 0, the higher the accuracy of the model is.

Flowchart for Training and Validation
Figure 2 displays a flowchart of the process of establishing and evaluating the four models, which involved three main steps: preprocessing, ML, and validation. In the preprocessing step, time-series data were collected, including rainfall, water level, and tidal stage data. Subsequently, the collected data were divided into training and test datasets. The training datasets were employed to establish the data-driven models using ML. Then, the test datasets were utilized to evaluate the performance of the constructed (i.e., trained) models for the purpose of validation. To prevent the obtainment of inaccurate prediction results, this study employed the commonly used min-max normalization method to normalize the collected data into the range of (−1, 1). After preprocessing, the optimal combinations of input vectors were investigated, and Bayesian optimization with 10-fold crossvalidation was employed to enhance the training results. After the establishment of the four models, the test datasets were used as inputs and the river stage was accordingly forecasted. To assess whether the four models produced accurate and reliable results, several indicators were used to evaluate the performance of the proposed models.

Study Area and Data
The Lan-Yang River basin, which has a watershed area of 978 km 2 , is located in the northeast part of Taiwan ( Figure 3). The main river reach of the Lan-Yang River has a length of 73 km and an average slope of 1/55. The Yi-Lan River, with a length of 17.25 km, is a tributary of Lan-Yang River. Figure 3 shows the locations of hydrological stations, namely four rainfall gauging stations, three river stage gauging stations, and one tidal gauging station. The four models were constructed to forecast river stages at the Lanyan, Simon, and Kavalan stations with a lead time of 1-6 h. The Kavalan station is located at the Yi-Lan River and is situated near the estuary. The river stage at the Kavalan station

Study Area and Data
The Lan-Yang River basin, which has a watershed area of 978 km 2 , is located in the northeast part of Taiwan ( Figure 3). The main river reach of the Lan-Yang River has a length of 73 km and an average slope of 1/55. The Yi-Lan River, with a length of 17.25 km, is a tributary of Lan-Yang River. Figure 3 shows the locations of hydrological stations, namely four rainfall gauging stations, three river stage gauging stations, and one tidal gauging station. The four models were constructed to forecast river stages at the Lanyan, Simon, and Kavalan stations with a lead time of 1-6 h. The Kavalan station is located at the Yi-Lan River and is situated near the estuary. The river stage at the Kavalan station may be affected by the upstream discharge and tidal level. Therefore, this study considered the tidal effect while predicting the river stage at the Kavalan station.
Water 2021, 13, x FOR PEER REVIEW 10 of may be affected by the upstream discharge and tidal level. Therefore, this study cons ered the tidal effect while predicting the river stage at the Kavalan station. In this study, data regarding hourly rainfall, river stage, and tidal level at each stati were collected for two types of events, namely typhoons and storms. Table 1 lists d collected from June 2004 to October 2017 as well as the maximum water level at the th stations. The total collected data of 37 events were considered for both Lanyan and Sim stations. Because the Kavalan station had limited available data and some missing da the collected data of 20 events were used. As shown in Table 1, the maximum values recorded river stages at the Lanyan, Simon, and Kavalan stations were 8.06, 7.54, and 3 m, respectively. To further examine the hydrology statistics, Table 2 lists the characteris based on collected data from 2004 to 2017. For the training data sets of Lanyan station, result indicates that the water level ranged from 1.80 m to 7.40 m, and the mean wa level is 3.42 m.  In this study, data regarding hourly rainfall, river stage, and tidal level at each station were collected for two types of events, namely typhoons and storms. Table 1 lists data collected from June 2004 to October 2017 as well as the maximum water level at the three stations. The total collected data of 37 events were considered for both Lanyan and Simon stations. Because the Kavalan station had limited available data and some missing data, the collected data of 20 events were used. As shown in Table 1, the maximum values of recorded river stages at the Lanyan, Simon, and Kavalan stations were 8.06, 7.54, and 3.11 m, respectively. To further examine the hydrology statistics, Table 2 lists the characteristic based on collected data from 2004 to 2017. For the training data sets of Lanyan station, the result indicates that the water level ranged from 1.80 m to 7.40 m, and the mean water level is 3.42 m.
To construct and assess the four models, the total collected data were separated into training and test datasets. Figure 4 shows the measured river stage data used for the Lanyan, Simon, and Kavalan stations. Significant increases and decreases in the river stage usually occur from May to October. In addition, the river stages at the Lanyan and Simon stations were unaffected by the tidal current, whereas the river stage at the Kavalan station was significantly affected by the tidal current. Figure 4 displays the employed datasets, in which 70% (Event No. 1-22) was used for training and the remaining 30% (Event No. [23][24][25][26][27][28][29][30][31][32][33][34][35][36][37] was utilized for testing.

Results and Discussion
As indicated in the flowchart illustrated in Figure 2, four ML techniques were applied to construct the data-driven model for river stage prediction. The models were trained and validated using datasets of collected measurements. The forecasting results of model training and validation are presented and discussed in this section. Six evaluation criteria were used to examine the model performance during the training and testing processes. Furthermore, the performance of the four models was compared to examine their applicability for river stage forecasting. All models were developed in a Python 3.7 environment running on an Intel Core i5 and 3.0-GHZ CPU with 8.0-GB RAM.

Analysis for Combinations of Input Variables
Before model training and validation, the commonly used SVR was employed to determine the optimal combination of input variables. The target output was the prediction of the river stage at the Kavalan station with a lead time of 1 h (i.e., t + 1). As shown in Table 3, four combination scenarios were evaluated. For the first combination of input variables, namely C1, only the antecedent tidal stage from t to t -6 was adopted as the input. For the second combination of input variables, named C2, the antecedent hourly rainfall and tidal

Results and Discussion
As indicated in the flowchart illustrated in Figure 2, four ML techniques were applied to construct the data-driven model for river stage prediction. The models were trained and validated using datasets of collected measurements. The forecasting results of model training and validation are presented and discussed in this section. Six evaluation criteria were used to examine the model performance during the training and testing processes. Furthermore, the performance of the four models was compared to examine their applicability for river stage forecasting. All models were developed in a Python 3.7 environment running on an Intel Core i5 and 3.0-GHZ CPU with 8.0-GB RAM.

Analysis for Combinations of Input Variables
Before model training and validation, the commonly used SVR was employed to determine the optimal combination of input variables. The target output was the prediction of the river stage at the Kavalan station with a lead time of 1 h (i.e., t + 1). As shown in Table 3, four combination scenarios were evaluated. For the first combination of input variables, namely C1, only the antecedent tidal stage from t to t -6 was adopted as the input. For the second combination of input variables, named C2, the antecedent hourly rainfall and tidal stage were utilized. The combination of antecedent rainfall, river stage, and tidal level data from t to t − 6 was used as the third combination, namely C3. In the final combination, namely C4, only antecedent rainfall data at each station were considered.
The four input combinations were subsequently used to construct the data-driven SVR model for river stage prediction. Figure 5 displays the model training results, presented as a scatter plot of the measured and forecasted river stages obtained using the four combinations of input variables. The scatter points of SVR with C3 were closer to the 45 • line (y = x, black dotted line) than those of SVR with C1, C2, and C4. In addition, Figure 5 shows that the scatter points of SVR with C4 were dispersed from the 45 • line for the low river stage (approximately lower than 1.0 m). Furthermore, to assess the performance of the four combinations of input variables, four evaluation criteria, namely R 2 , MAE, RMSE, and NSE were adopted. Table 4 shows the performance of river stage forecasting at a lead time of 1 h when SVR was used with the four input combinations. As shown in Figure 5 and Table 4, C3 achieved higher accuracy during model training than did C1, C2, and C4 for both high and low river stages. stage were utilized. The combination of antecedent rainfall, river stage, and tidal level data from t to t − 6 was used as the third combination, namely C3. In the final combination, namely C4, only antecedent rainfall data at each station were considered.

Combinations of Inputs Input Vectors Output Variables Rainfall River Stage Tidal Level
The four input combinations were subsequently used to construct the data-driven SVR model for river stage prediction. Figure 5 displays the model training results, presented as a scatter plot of the measured and forecasted river stages obtained using the four combinations of input variables. The scatter points of SVR with C3 were closer to the 45° line (y = x, black dotted line) than those of SVR with C1, C2, and C4. In addition, Figure  5 shows that the scatter points of SVR with C4 were dispersed from the 45° line for the low river stage (approximately lower than 1.0 m). Furthermore, to assess the performance of the four combinations of input variables, four evaluation criteria, namely R 2 , MAE, RMSE, and NSE were adopted. Table 4 shows the performance of river stage forecasting at a lead time of 1 h when SVR was used with the four input combinations. As shown in Figure 5 and Table 4, C3 achieved higher accuracy during model training than did C1, C2, and C4 for both high and low river stages. After the quantitative analysis of model training, model testing was performed using the four combinations of input variables. Figure 6 shows simulated results with four closeup views displaying the four flood events. These close-up views of the simulated water levels reveal the differences in river stage prediction results among the four combinations. The simulated river stage hydrographs obtained using SVR with C3 agrees very closely with the measured river stage hydrographs. However, a large discrepancy between the simulated and measured river stage was observed when SVR was used with C1, indicating that considering only the tidal input does not correctly indicate the peak water level and time-to-peak water level.  After the quantitative analysis of model training, model testing was performed using the four combinations of input variables. Figure 6 shows simulated results with four closeup views displaying the four flood events. These close-up views of the simulated water levels reveal the differences in river stage prediction results among the four combinations. The simulated river stage hydrographs obtained using SVR with C3 agrees very closely with the measured river stage hydrographs. However, a large discrepancy between the simulated and measured river stage was observed when SVR was used with C1, indicating that considering only the tidal input does not correctly indicate the peak water level and time-topeak water level. To further investigate the model validation, four flood events at the Kavalan station were selected to evaluate the river stage prediction capability of SVR with different input combinations. Table 5 lists simulated results, including those for ETP and PWE. SVR with C3 demonstrated the most favorable performance, with the lowest average absolute PWE of 0.17 m, whereas SVR with C1 and C4 achieved almost the same largest average absolute PWE of 0.37. Furthermore, SVR with C3 was found to have the lowest ETP among all of the combinations. Thus, SVR with C1 and C4 demonstrated poor performance, whereas SVR with C3 exhibited the most favorable performance. In summary, these results indicate that considering three input variables (i.e., rainfall, river stage, and tidal level) in the previous 6 h can significantly improve the accuracy of river stage prediction in a tidal river.

Analysis of Model Training Results
After the optimal combination of input variables was determined, the four models were trained. The optimal combination, namely C3, was used as the input to establish the river stage prediction model for each station. In ML, hyperparameters play a vital role in model performance. During model training, Bayesian optimization together with a 10-fold cross-validation was employed to determine the optimal parameters. For SVR, penalty and kernel parameters were selected as the hyperparameters [46]. According to Choi et al. [54], the two main parameters to be determined in an RFR model are N split and N tree , where N split is the minimum number of samples required to split. For MLPR, a single hidden layer is used because of its relatively wide application. Hence, the number of neurons in the hidden layer N neu must be determined. As hyperparameters in the LGBMR model, the parameters N dep and N leaves respectively represent the maximum tree depth and maximum number of tree leaves. Table 6 summarizes the optimal parameters for the three stations obtained using the proposed models. The result shows different optimal parameters for different stations and lead times. In the MLPR model, the activation function of tanh was preferred for Lanyan and Simon stations. For the Kavalan station, the optimal activation function was ReLU. The four models were trained using the optimal parameters listed in Table 6. Figure 7 presents the training results obtained using four models at each station for lead times of 1, 3, and 6 h. For the 1-h lead time, the forecasted points nearly reached the 45 • reference line for all stations. When the lead time increased to 3 and 6 h, the forecasted points moved away from the reference line. The results indicated that the forecasted points at the Simon station with a high river stage condition (>7 m), as determined through SVR, RFR, and LGBMR, were lower than the 45 • line. Furthermore, SVR, RFR, and LGBMR achieved a similar underestimated trend at the Kavalan station with a high river stage condition (>2.5 m). Overall, the points forecasted using LGBMR were closer to the 45 • reference line than those forecasted using SVR, RFR, and MLPR, indicating that LGBMR demonstrated the most favorable training performance among the four models except for a few peak values. Table 7 lists the training results of the four models for 1-6-h lead times at the three stations obtained using the four evaluation indexes. All models demonstrated favorable performance in terms of R 2 , which was over 0.72 for 1-6-h lead times at the three stations. Both the RMSE and MAE values for RFR and LGBMR models were slightly lower than those for the SVR and MLPR models. In addition, both the RFR and LGBMR models yielded higher NSE values than did the SVR and MLPR models. The four models were trained using the optimal parameters listed in Table 6. Figure  7 presents the training results obtained using four models at each station for lead times of 1, 3, and 6 h. For the 1-h lead time, the forecasted points nearly reached the 45° reference line for all stations. When the lead time increased to 3 and 6 h, the forecasted points moved away from the reference line. The results indicated that the forecasted points at the Simon station with a high river stage condition (>7 m), as determined through SVR, RFR, and LGBMR, were lower than the 45° line. Furthermore, SVR, RFR, and LGBMR achieved a similar underestimated trend at the Kavalan station with a high river stage condition (>2.5 m). Overall, the points forecasted using LGBMR were closer to the 45° reference line than those forecasted using SVR, RFR, and MLPR, indicating that LGBMR demonstrated the most favorable training performance among the four models except for a few peak values. Table 7 lists the training results of the four models for 1-6-h lead times at the three stations obtained using the four evaluation indexes. All models demonstrated favorable performance in terms of R 2 , which was over 0.72 for 1-6-h lead times at the three stations. Both the RMSE and MAE values for RFR and LGBMR models were slightly lower than those for the SVR and MLPR models. In addition, both the RFR and LGBMR models yielded higher NSE values than did the SVR and MLPR models.

Results of Model Validation
On the basis of the four models trained in Section 4.2, the collected test dataset was employed to drive four models for river stage forecasting with 1-6-h ahead lead times. Figure 8 compares the measured river stage with the forecasted river stage for Typhoon Megi with 1-6-h lead times at the Lanyan station. As the lead time increased, the difference in forecasted results among the four models became more significant. The results (Figure 8) revealed that MLPR yielded overestimated values of peak river stages except for the 3-h lead time. A comparison between the measured and forecasted river stages for Typhoon Saola at the Simon station is presented in Figure 9. The results reveal that the four models could efficiently forecast the river stage at a 1-h lead time. A higher ETP (i.e., phase lag) between the measured and forecasted results was observed for 3-6-h lead times. Figure 10 shows the results of a comparison of the four models for Typhoon Soulik with 1-6-h lead times at the Kavalan station. The river stage hydrographs forecasted using the four models agreed with the measured river stage hydrographs for a 1-h lead time. The difference among the model validation results considerably increased for a lead time of 2 to 6 h. The four models also overestimated the peak river stages.

Results of Model Validation
On the basis of the four models trained in Section 4.2, the collected test dataset was employed to drive four models for river stage forecasting with 1-6-h ahead lead times. Figure 8 compares the measured river stage with the forecasted river stage for Typhoon Megi with 1-6-h lead times at the Lanyan station. As the lead time increased, the difference in forecasted results among the four models became more significant. The results (Figure 8) revealed that MLPR yielded overestimated values of peak river stages except for the 3-h lead time. A comparison between the measured and forecasted river stages for Typhoon Saola at the Simon station is presented in Figure 9. The results reveal that the four models could efficiently forecast the river stage at a 1-h lead time. A higher ETP (i.e., phase lag) between the measured and forecasted results was observed for 3-6-h lead times. Figure 10 shows the results of a comparison of the four models for Typhoon Soulik with 1-6-h lead times at the Kavalan station. The river stage hydrographs forecasted using the four models agreed with the measured river stage hydrographs for a 1-h lead time. The difference among the model validation results considerably increased for a lead time of 2 to 6 h. The four models also overestimated the peak river stages.

Performance Evaluation of River Stage Forecasting
This section presents the evaluation of river stage forecasting performance using two evaluation metrics, namely ETP and PWE. Table 8 summarizes the results of 1-, 3-, and 6-h lead times using ETP and PWE for the three stations. As listed in Table 8, the maximum absolute values of PWE determined using SVR, RFR, MLPR, and LGBMR were respectively 0.89, 1.42, 1.33, and 0.89 m at the Lanyan station and 0.63, 0.77, 0.50, and 0.43 m at the Simon station. Figure 11 presents boxplots of the absolute values of PWE from four selected events for 1-, 3-, and 6-h lead times. As displayed in Figure 11, the box results obtained using the RFR and MLPR models had broader distributions for 3-and 6-h lead times than the distributions obtained using the SVR and LGBMR models. Moreover, the median of the absolute values of PWE obtained using the LGBMR model was slightly closer to zero than the medians obtained using the SVR, RFR, and MLPR models. Although RFR exhibited satisfactory training performance (see Section 4.2), it exhibited poor performance in the model validation. According to the results of the range and length of the box and whiskers, the LGBMR model demonstrated more favorable performance with an average PWE value of 0.22 m. Furthermore, to compare the performance of the models, the ETP results listed in Table 8 were averaged by all events and stations; the results are displayed in Figure 12. The average value of ETP obtained using LGBMR was close to 2 h for 1-6-h lead times. The results of the comprehensive assessment revealed that the LGBMR model exhibited more favorable performance in forecasting both the peak river stage and time-to-peak river stage.    A quantitative comparison of the CPU time is presented in Table 9, including for the training and validation stages. The LGBMR model required the shortest CPU time, whereas the MLPR model required the longest CPU time in the training stage. In the model validation process, predictions made by each model during 1-6-h lead times were completed within 0.5 s. This finding implies that all four models could satisfy the operational requirements of flood forecasting computational time. Nevertheless, LGBMR would be a more favorable choice for practical river stage simulation because of its high efficiency and accuracy.   A quantitative comparison of the CPU time is presented in Table 9, including training and validation stages. The LGBMR model required the shortest CP whereas the MLPR model required the longest CPU time in the training stage model validation process, predictions made by each model during 1-6-h lead tim completed within 0.5 s. This finding implies that all four models could satisfy the tional requirements of flood forecasting computational time. Nevertheless, L would be a more favorable choice for practical river stage simulation because of efficiency and accuracy.  A quantitative comparison of the CPU time is presented in Table 9, including for the training and validation stages. The LGBMR model required the shortest CPU time, whereas the MLPR model required the longest CPU time in the training stage. In the model validation process, predictions made by each model during 1-6-h lead times were completed within 0.5 s. This finding implies that all four models could satisfy the operational requirements of flood forecasting computational time. Nevertheless, LGBMR would be a more favorable choice for practical river stage simulation because of its high efficiency and accuracy.

Conclusions
This study presents a multistep-ahead framework involving Bayesian optimization to construct data-driven prediction models based on four ML techniques, namely SVR, RFR, MLPR, and LGBMR, for river stage forecasting. The models were successfully applied to predict the evolution of river stages in a tidal river, namely the Lan-Yang River Basin in Taiwan. The application of LGBMR in river stage simulation is a relatively new approach. Nearly 14 years of hydrology data were collected and employed to train data-driven models through the application of Bayesian optimization with a 10-fold cross-validation. The constructed models were then applied to forecast the hourly future river stage up to 6 h ahead at three stations. The performance of the four models was also compared on the basis of six evaluation criteria. The results revealed that the LGBMR model produced the most accurate river stage hydrograph among the four models.
The primary findings and contributions of this study are as follows: 1.
The optimal combination of input variables was determined using the SVR model with four designed combinations. The results indicate that the combination of rainfall, river stage, and tidal level as the input variables can improve the river stage prediction accuracy in a tidal reach.

2.
The results of the quantitative analysis of model training and validation were used to compare the forecasting performance of the four models. The results demonstrated that the LGBMR model produced more satisfactory river stage forecasting at a lead time of up to 6 h. The average PWE and ETP values obtained using LGBMR were 0.22 m and 2 h, respectively, indicating an acceptable accuracy in river stage forecasting.
The high efficiency and accuracy of the LGBMR model make it a robust approach for river stage prediction. To develop a real-time flood forecasting system, the previous values of input variables, such as rainfall and tidal level, can be extended to forecast values using physics-based models. Therefore, future studies should focus on integrating a physics-based model with a data-driven model by correcting errors for improving flood forecasting accuracy. Meanwhile, more data-driven techniques, such as deep learning, should be adopted to enhance comparative research in the future.