Water Level Simulation in River Network by Data Assimilation Using Ensemble Kalman Filter

Featured Application: The new data assimilation method based on the ensemble Kalman ﬁlter with the optimized parameters produces accurate water level predictions in river networks; it is thus a potential model for the real-time prediction of water levels in river networks in practice. Abstract: Water level simulation for complex water river networks is complex, and existing forecasting models are mainly used for single-channel rivers. In this paper, we present a new data assimilation model based on the ensemble Kalman ﬁlter (EnKF) for accurate water level simulation in complex river networks. The EnKF-based data model was tested on simulated water level data from a river network hydrodynamic model and optimized through parameter analysis. It was then applied to a real mountainous single-channel river and plain river network and compared with a data assimilation model based on the extended Kalman ﬁlter (EKF). The results showed that the EnKF-based model, with a medium ensemble sample size of 100–150, normal observation noise of 0.0001–0.01 m, and a high standard deviation of 0.01–0.1 m, outperformed the EKF-based model, with a 49% reduction in simulation errors, a 45% reduction in calculation cost, and a 43% reduction in ﬁltering time. Furthermore, the EnKF-based data assimilation model predicted the water level in the plain river network better than the mountainous single-channel river. Around 5 to 8 h were required for data assimilation; afterwards, the model could make accurate predictions covering 20 to 30 h. The EnKF-based data assimilation model offers a potential solution for water level predictions in river networks.


Introduction
River networks play an important role in watershed landscapes due to their contributions to the watershed's hydrological processes. Compared to single open-channel rivers, they are more complex and more sophisticated hydraulic simulation models are required in order to solve unsteady flow problems in river networks. In addition, due to human activity, the river network structure of basins has changed significantly, causing spatial and temporal differences in hydrological connectivity [1]. The complex nature of river networks produces challenges in flow modeling, particularly in flood forecasting for a river network. As a result, the theories and corresponding algorithms for simulating and forecasting hydraulic flow in river networks have not been well established. Although flood forecasting with machine learning or deep learning models has been attempted in relation to flood simulations [2][3][4][5], a large amount of data are required as inputs for the sufficient training of model parameters. The large number of parameters and the overfitting of parameters are also great concerns when attempting to realize accurate real-time predictions [6]. challenges in such areas compared to single open-channel rivers in terms of calculation efficiency, the data treatment (e.g., data assimilation) process and degree, and forecasting periods (a few hours to days ahead). To date, very few studies have been conducted to investigate the performance of KF or its variants on the water flow simulation of river networks and, where available, they have simplified the river networks with the use of limited river branches and nodes [21], or short-term flow forecasting was carried out [9]. For example, Munier et al. [9] developed a lumped hydrological model to propagate the discharge through the Seine river basin in France and KM was used to assimilate the discharge data to improve the simulation accuracy of flow forecasting within 72 h. However, the simulation results could be significantly influenced by the degree of river network spatialization and the spatial distribution.
In this study, an EnKF algorithm was developed for performing data assimilation on simulated water level data from a river network hydraulic model for real-time water level prediction. A new data assimilation scheme was proposed for applying EnKF to simulated water level data to improve the simulation capacity, followed by a parameter analysis conducted on a simulated river network in order to understand and optimize the key parameter values. Based on the optimized parameter data, an experimental evaluation was conducted on a real single-channel river and a real plain river network with an area of 325 km 2 . In addition, the data assimilation results obtained with EnKF were compared to those obtained with the classic Kalman filter for hydraulic simulation, i.e., the extended Kalman filter (EKF), with regard to their simulation accuracy, calculation cost, and data assimilation efficiency. In this study, we aimed to provide a data assimilation scheme using EnKF to improve the real-time simulation capacity of a river network hydraulic model.

One-Dimensional Hydrodynamic Model of a River
Saint-Venant equations are often used to describe one-dimensional flow, as follows. The continuous equation: ∂A ∂t The momentum equation:

∂Q ∂t
where A is the cross-section area; t is time; Q is the average flow of the cross-section; x is displacement along the water flow; q L is the lateral streamflow per unit length; a is the momentum correction coefficient; g is the acceleration of gravity; Z is the water level of the section; K is the flow modulus, K = AR 2/3 /n, where R is the hydraulic radius of the river and n is the Manning coefficient; and v x is the velocity component of the lateral stream flow along the water flow, usually taken as v x = 0. Saint-Venant equations are hyperbolic partial differential equations for which analytical solutions are not yet available. The finite difference method is an effective numerical discretization method for solving the Saint-Venant equations and is divided into two schemes: the explicit difference scheme and the implicit difference scheme. Although the explicit difference scheme does not require the joint solution of equations, the time step and space step are limited by strict adherence to the "Courant condition" in order to maintain stability, whereas the implicit difference scheme is more flexible in the selection of the time and space step and is therefore popularly used. In this study, the Preissmann four-point implicit difference scheme were adopted to discretize the equations, and the space-time discretization schemes of the continuous and momentum equations were obtained as follows: Difference form for the continuous equations: Difference form for the momentum equations: where θ is the Preissmann weighting factor; superscripts t and t + 1 and subscripts i and i + 1 represent the time and one-dimensional spatial layers, respectively; and B is the river width. Thus, the following linear equations, expressed in increments, are obtained as follows: where A 1i~E2i are the expressions for each of the known quantities in time layer t. After the elimination of the elements, we can obtain where The corresponding section recurrence equations are obtained by sequentially working backwards from the first section of the river to any section: where, when i = 0, and when i > 0, For a river with m reaches, the number of cross sections is m + 1. Based on Equation (7), we can obtain where According to the above equations, we can see that the flow increment at the first and last cross-sections of the river can be expressed in terms of the increment of the water level. Therefore, the corresponding equations using the non-increment method can be obtained by expanding Equation (12). The equations are: In addition, in this study we adopted a three-stage joint solution method based on a large sparse matrix solver to solve the river network. The method divides the solution of the equations into three levels-the micro-segments, river segments, and nodes-and calculates and back-substitutes them step by step, in order to compress the size of the matrix and improve the calculation speed.

General Form of a River Network Hydrodynamic System
The general form of a one-dimensional hydrodynamic equation formed using the model described above can be expressed as where F is defined as the hydrodynamic equation. X is the hydraulic state variable, including the discharge, water depth, or water level, and t is time.
Given the existence of white noise in relation to hydraulic state variables, the following discretized nonlinear hydraulic state equation can be obtained: where Φ k refers to the nonlinear hydraulic state functional from the state at time k − 1 to the state at k, including the hydrodynamic model functions written in Section 2, and Γ k is the weighting matrix of state noise, which is normally taken as a unit matrix. W k is a dynamic noise with a mean value equal to zero. In addition, the equation for the measured river network state variables Y k can be expressed as: where H k is the linear measurement function at time k; V k is the measurement noise with a mean value equal to zero. By combining Equations (2) and (3), we obtain the equations describing the hydrodynamic nonlinear dynamic system of a river network:

Application of Ensemble Kalman Filter
In this study we established a new scheme based on EnKF and applied it to a river network hydrodynamic system. A three-stage joint solution method was used to solve the hydrodynamic model of the river network, and the water level was selected as the state variable of the nonlinear dynamic system of the river network. Before the start of the filtering treatment, the Box-Muller transform method was used to generate an initial sample ensemble of nodal water level randomly. The data followed the normal distribution N µ, σ 2 . To ensure the diversity of the sampling data, with reasonable dispersion, the average value µ was determined based on the simulated water level at the start of filtering, and the standard deviation was also set as a certain proportion of the nodal water depth at the start of filtering.
The initial sample ensemble of the nodal water level was generated by means of the Box-Muller transform method. Its average value X is shown as: where m represents the number of samples, i.e., the size of the ensemble. The sample values obey a normal distribution and have a standard deviation of σ. The disturbance term ∆X i for the ith ensemble member is and ∆X is defined as the ensemble of disturbance terms: Then, the error variance matrix can be expressed as follows: The data assimilation process for river networks using the EnKF algorithm is shown in Figure 1. The details of the algorithm's development are shown below.
When filtering, the first step is to calculate the state simulation values of each sample set, that is, the prediction valuesX k/k−1 : data assimilation results for each calculation moment. In addition, the filtering process can be ended when the correction error variance matrix is less than a certain value in order to improve the efficiency of the river network hydrodynamic model calculation; similarly, the filtering process can be started when the error value between the simulated and observed values exceeds a certain set value in order to improve the simulation accuracy of the river network hydrodynamic model.

Parameter Analysis
In this section, parameters, including the standard deviation of the initial sample ensemble, the observation error/noise level δ, and the size of the ensemble m, are analyzed in terms of their influence on the data assimilation performance for a simulated river Then, the prediction error variance matrix and gain matrix are calculated as in Equations (24) and (25), respectively. Calculating the gain matrix requires the introduction of the observation noise level, that is, the observation error. The Box-Muller method is also used to discretize the observation data to obtain the observation set, which is fed into the ensemble Kalman filtering algorithm, together with the forecast water level set for assimilation, to obtain a set of assimilated water levels, whose mean value is the optimal solution for data assimilation.
where R k is the observation error covariance; P k/k−1 H T k and H k P k/k−1 H T k are calculated according to the following equations: whereX Finally, update each ensemble member of the water level variable based on the gain matrix: where , and R is the observation error covariance. Note that the corrected water level is used as the current moment value, which is then used as the next moment value of the water level prediction. The mean value of the corrected water level is set as the best estimate of the current moment of data assimilation (optimal solution).
The correction error variance matrix is also calculated as an approximation of the error covariance matrix of the corrected values: With recurrence in time, the above process can be cycled continuously to obtain the data assimilation results for each calculation moment. In addition, the filtering process can be ended when the correction error variance matrix is less than a certain value in order to improve the efficiency of the river network hydrodynamic model calculation; similarly, the filtering process can be started when the error value between the simulated and observed values exceeds a certain set value in order to improve the simulation accuracy of the river network hydrodynamic model.

Parameter Analysis
In this section, parameters, including the standard deviation σ of the initial sample ensemble, the observation error/noise level δ, and the size of the ensemble m, are analyzed in terms of their influence on the data assimilation performance for a simulated river network with unsteady-state flows. Note that this river network was adopted from the literature [22]. Figure 2 illustrates a looped canal river network with 14 branches connected by nodes depicted by number. This network was simulated as a convergent type, representing drainage systems in practice. Water level measurement stations were set at nodes 9 and 13, named Station 1 and Station 2. The detailed characteristics of this looped canal network can be found in the literature [22]. Appl network with unsteady-state flows. Note that this river network was adopted from the literature [22]. Figure 2 illustrates a looped canal river network with 14 branches connected by nodes depicted by number. This network was simulated as a convergent type, representing drainage systems in practice. Water level measurement stations were set at nodes 9 and 13, named Station 1 and Station 2. The detailed characteristics of this looped canal network can be found in the literature [22]. The analysis was conducted on a computer with an Intel Core i7-8700K CPU, 3.2 GHz CPU clock speed and 16.0 GB DDR4 RAM. The system was run on Windows 10 Enterprise, version 1909.  Table 1 shows the test scheme. The results calculated based on the true value of the roughness were regarded as the precise water level values, and the observation noise (i.e., error) was added to the precise water level values to produce the observation values. The observation noise levels δ (0.0, 0.0001, 0.0005, 0.001, 0.005, 0.01, and 0.05 m) were analyzed. The standard deviation of the initial ensemble was 0.0, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, and 0.1 m. The average water depth value was 2.5 m; correspondingly, the relative observation noise level values were 0%, 0.004%, 0.02%, 0.04%, 0.2%,0.4%, and 2.0%. The relative standard deviation of the initial ensemble was 0%, 0.004%, 0.02%, 0.04%, 0.2%,0.4%, 2.0%, and 4.0%. A constant ensemble size of 100 was tested.

Observation Noise Level and Standard Deviation of the Initial Ensemble
The performance of the EnKF algorithm with different parameters was assessed on the basis of the root-mean-square-error (RMSE) of the water level, which was calculated based on the simulated and precise values. When the observation noise level δ equaled The analysis was conducted on a computer with an Intel Core i7-8700K CPU, 3.2 GHz CPU clock speed and 16.0 GB DDR4 RAM. The system was run on Windows 10 Enterprise, version 1909. Table 1 shows the test scheme. The results calculated based on the true value of the roughness were regarded as the precise water level values, and the observation noise (i.e., error) was added to the precise water level values to produce the observation values. The observation noise levels δ (0.0, 0.0001, 0.0005, 0.001, 0.005, 0.01, and 0.05 m) were analyzed. The standard deviation of the initial ensemble was 0.0, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, and 0.1 m. The average water depth value was 2.5 m; correspondingly, the relative observation noise level values were 0%, 0.004%, 0.02%, 0.04%, 0.2%,0.4%, and 2.0%. The relative standard deviation of the initial ensemble was 0%, 0.004%, 0.02%, 0.04%, 0.2%,0.4%, 2.0%, and 4.0%. A constant ensemble size of 100 was tested.

Observation Noise Level and Standard Deviation of the Initial Ensemble
Note: √ means the calculation was successful; × means the calculation was unsuccessful due to the failure of convergence and data assimilation.
The performance of the EnKF algorithm with different parameters was assessed on the basis of the root-mean-square-error (RMSE) of the water level, which was calculated based on the simulated and precise values. When the observation noise level δ equaled 0.0, except for standard deviation = 0.0 (i.e., equivalent to no data assimilation), the data assimilation failed. This is because the gain matrix K decreased continuously during the data assimilation process, implying that the weight of using new information to correct the predicted values decreased, i.e., the tracking performance of the system decreased. It was thus necessary to use an observation noise level of more than zero to ensure a certain disturbance level and prevent data assimilation failure. Figure 3 shows that the RMSE values changed greatly with different settings for the parameters of standard deviation, σ, and observation noise level, δ, at Stations 1 and 2. When σ = 0.0, the sample data in the initial ensemble were constant numbers, which were the initial conditions of the water level at each node. In this case, the simulation results were not affected by the observation noise or the ensemble size, and the results were equivalent to the results without data assimilation. The RMSE values decreased significantly with increasing standard deviation, particularly at the low level of observation noise. This indicates that a relatively high standard deviation and low observation noise level could lead to a good data assimilation result. For example, at Station 1, a negligible RMSE value of 0.0011 m was produced when σ = 0.1, δ = 0.0001, whereas a high RMSE value of 0.1563 m was obtained when σ = 0.0001, δ = 0.05. Appl Note: √ means the calculation was successful; × means the calculation was unsuccessful due to the failure of convergence and data assimilation. Figure 3 shows that the RMSE values changed greatly with different settings for the parameters of standard deviation, σ, and observation noise level, δ, at Stations 1 and 2. When σ = 0.0, the sample data in the initial ensemble were constant numbers, which were the initial conditions of the water level at each node. In this case, the simulation results were not affected by the observation noise or the ensemble size, and the results were equivalent to the results without data assimilation. The RMSE values decreased significantly with increasing standard deviation, particularly at the low level of observation noise. This indicates that a relatively high standard deviation and low observation noise level could lead to a good data assimilation result. For example, at Station 1, a negligible RMSE value of 0.0011 m was produced when σ = 0.1, δ = 0.0001, whereas a high RMSE value of 0.1563 m was obtained when σ = 0.0001, δ = 0.05.

Ensemble Size
As the computational resources are limited for realistic and large-scale hydraulic applications, the ensemble size should be small. We thus tested four ensemble sizes, n = 50, 100, 150, and 200. The observation noise level δ was taken to be 0.0001 m and 0.05 m, and

Ensemble Size
As the computational resources are limited for realistic and large-scale hydraulic applications, the ensemble size should be small. We thus tested four ensemble sizes, n = 50, 100, 150, and 200. The observation noise level δ was taken to be 0.0001 m and 0.05 m, and the standard deviation σ was taken to be 0.0001 m and 0.1 m. Table 2 presents the parameters and their values.
Note: √ means the calculation was successful; × means the calculation was unsuccessful due to the failure of convergence and data assimilation. Tables 3 and 4 show the RMSE values with different ensemble sizes at certain observation noise levels and standard deviation values. Generally, the RMSE values decreased with increasing ensemble size for the low observation noise level (i.e., δ = 0.001, σ = 0.0001 or 0.1), whereas for large observation noise level of 0.05, either the RMSE values were constant or the calculation failed, indicating that the data assimilation process was not improved by increasing the ensemble size. As shown in Table 5, computational cost increased significantly with ensemble size, and a large ensemble size could result in an extensive cost in terms of computational resources. Thus, we determined that the ensemble sample size could be set at a medium level (i.e., 100 or 150). Based on the sensitivity analysis of the observation noise level and the standard deviation of the initial ensemble, we suggest that a relatively low observation noise level (e.g., from 0.0001 to 0.01 m) and a relatively high standard deviation (e.g., from 0.01 to 0.1 m), along with a medium ensemble sample size (e.g., from 100 to 150), should be used to produce a good data assimilation with low RMSE values.   To demonstrate the effectiveness of the proposed algorithm with the optimized parameter values, obtained as described in the parameter analysis section, we used two sets of data from a real single open-channel river in Canada and a real river network in China.

Case 1: A Real Single Open-Channel River.
The Kootenai River originates in British Columbia, Canada, and flows through Montana, Idaho, USA. A 24-km-long section of the river was used for this study, as shown in Figure 4. There are three hydrological stations on the river: Bonners Ferry, Tribal Hatchery, and Klockmann Ranch, which are located upstream, midstream, and downstream of the river, respectively. The measured water volume flow at Tribal Hatchery was used as the boundary condition for the upstream entrance (as shown in Figure 5a), and the measured water level at Klockmann Ranch (as shown in Figure 5b) was used as the boundary condition for the downstream exit, whereas Bonners Ferry was used as the test site. The cross-sectional information and water level and flow information used in the tests were obtained from the official website of the American Geographic Society. Table 6 shows the basic information of the selected 24-km-long section of the Kootenai river.

Case 2: A Real River Network.
Located in the southeast of Zhejiang Province, China, the Le Liu Hong plain river network covers an area of 325 km 2 . It is divided into three river network systems-Hongqiao, Lecheng, and Liushi-according to its topography and geomorphology, which are controlled by control gates and are independent of each other. Each of the three river network water systems has a water level station, Hongqiao station, Lecheng station, and Liushi station, as shown in Figure 6. The hydrodynamic model of the river network of Le Liu Hong plain includes 398 rivers, 304 nodes, and a total of 1962 cross-sections. In this study, we used the 2019 Lekima typhoon storm flood data for data assimilation testing.

Parameter Settings and Evaluation Metrics
The Box-Muller method was used to generate a random number set for the initial water level values at Bonners Ferry station in Case 1 and Hongqiao, Lecheng, and Liushi stations in Case 2. The set mean value was given as the state value at the beginning moment of data assimilation. Based on the parameter analysis and optimization described in the parameter analysis section, the ensemble size m, the standard deviation σ, and the observation noise level δ were taken to be 100, 0.1 m, and 0.005 m, respectively. As the water depth values were between 6 and 9 m for Case 1 and between 3 and 4.5 m for Case 2, the corresponding relative observation noise level values and the relative standard deviation values were approximately less than 5% of the water depth values.
The RMSE, calculated based on the water lever difference between the simulation and measurements from the start of the data assimilation process to the end of the water level simulation, was used as an evaluation metric. Furthermore, the algorithm calculation cost in seconds and the time for the algorithm to reach stability were used as the other two evaluation metrics. For Case 1, the simulation period covered 27 days, and the data assimilation process started on day 5 (hour 96). For Case 2, the simulation period was from 8:00 on 7 August 2019 to 8:00 on 12 August 2019 (5 days in total), assuming that data assimilation took place from 0:00 on 9 August 2019. The filtering ended when the elements on the diagonal of the water level correction error variance matrix were less than 0.001 m 2 .
Appl. Sci. 2023, 13, x FOR PEER REVIEW 12 of 20 boundary condition for the upstream entrance (as shown in Figure 5a), and the measured water level at Klockmann Ranch (as shown in Figure 5b) was used as the boundary condition for the downstream exit, whereas Bonners Ferry was used as the test site. The crosssectional information and water level and flow information used in the tests were obtained from the official website of the American Geographic Society. Table 6 shows the basic information of the selected 24-km-long section of the Kootenai river.      boundary condition for the upstream entrance (as shown in Figure 5a), and the measured water level at Klockmann Ranch (as shown in Figure 5b) was used as the boundary condition for the downstream exit, whereas Bonners Ferry was used as the test site. The crosssectional information and water level and flow information used in the tests were obtained from the official website of the American Geographic Society. Table 6 shows the basic information of the selected 24-km-long section of the Kootenai river.     network covers an area of 325 km 2 . It is divided into three river network systems-Hongqiao, Lecheng, and Liushi-according to its topography and geomorphology, which are controlled by control gates and are independent of each other. Each of the three river network water systems has a water level station, Hongqiao station, Lecheng station, and Liushi station, as shown in Figure 6. The hydrodynamic model of the river network of Le Liu Hong plain includes 398 rivers, 304 nodes, and a total of 1962 cross-sections. In this study, we used the 2019 Lekima typhoon storm flood data for data assimilation testing.

Parameter Settings and Evaluation Metrics
The Box-Muller method was used to generate a random number set for the initial water level values at Bonners Ferry station in Case 1 and Hongqiao, Lecheng, and Liushi stations in Case 2. The set mean value was given as the state value at the beginning moment of data assimilation. Based on the parameter analysis and optimization described in the parameter analysis section, the ensemble size m, the standard deviation , and the observation noise level δ were taken to be 100, 0.1 m, and 0.005 m, respectively. As the water depth values were between 6 and 9 m for Case 1 and between 3 and 4.5 m for Case 2, the corresponding relative observation noise level values and the relative standard deviation values were approximately less than 5% of the water depth values.
The RMSE, calculated based on the water lever difference between the simulation and measurements from the start of the data assimilation process to the end of the water level simulation, was used as an evaluation metric. Furthermore, the algorithm calculation cost in seconds and the time for the algorithm to reach stability were used as the other two evaluation metrics. For Case 1, the simulation period covered 27 days, and the data assimilation process started on day 5 (hour 96). For Case 2, the simulation period was from 8:00 on 7 August 2019 to 8:00 on 12 August 2019 (5 days in total), assuming that data assimilation took place from 0:00 on 9 August 2019. The filtering ended when the elements on the diagonal of the water level correction error variance matrix were less than 0.001 m 2 .

Algorithms for Comparison
Our data assimilation scheme based on the EnKF algorithm was compared to the EKF method, which is commonly used for hydraulic predictions for river networks. The corresponding observed noise level and filtering time period were consistent with those calculated using the ensemble Kalman filter for both testing cases. As both algorithms contain an observation noise level parameter, this parameter was varied, at 0.0005, 0.005, and 0.05 m, to examine whether it affected the difference in performance between the EnKF algorithm and the EKF algorithm. Note that the observation noise value of 0.05 m was so high that it was impractical in real river networks; thus, this setting was only used for research purposes. Figure 7 shows the water level error values between the simulation and the measurements among different data treatment schemes for Case 1. The evaluation metrics used for evaluating data assimilation performance are tabulated in Table 7. The data assimilation started in hour 96, and after 23 h of filtering, the data assimilation end condition had been reached for both the EnKF and EKF algorithms. The data assimilation process, whether using either EnKF or EKF, produced an instant and efficient reduction in the simulation errors. The time required for both EnKF and EKF algorithms to reach stability was minimal, indicating that both algorithms could correct the prediction results instantly for this single-river case. With the end of the assimilation phase at hour 119, the filtered water level error increased slightly but still fluctuated, mainly between −0.05 m and 0.05 m, which was much lower than the water level error without data assimilation. The RMSE values decreased significantly by around two times after the data assimilation process was carried out using EnKF or the EKF algorithm. Although the EnKF algorithm produced a slightly lower RMSE, the difference was negligible for Case 1. However, the algorithm calculation cost for EKF was 16.7 s, which was 19.2% higher than that of EnKF, indicating that EKF was more time-consuming than EnKF. This result is similar to those found in the literature [23], namely, that the computational time of EnKF was 0.8 of that of EKF. This is because EKF requires the computation of the tangent linear state operator, whereas EnKF avoids this computation step. carried out using EnKF or the EKF algorithm. Although the EnKF algorithm produced a slightly lower RMSE, the difference was negligible for Case 1. However, the algorithm calculation cost for EKF was 16.7 s, which was 19.2% higher than that of EnKF, indicating that EKF was more time-consuming than EnKF. This result is similar to those found in the literature [23], namely, that the computational time of EnKF was 0.8 of that of EKF. This is because EKF requires the computation of the tangent linear state operator, whereas EnKF avoids this computation step. Figure 7. Water level simulation error values for different data treatment schemes. The black line represents the data that were not treated via data assimilation; the green line represents the data treated via EKF; the red line represents the data treated via EnKF.  Note that the river in Case 1 was a mountainous single-channel river with a relatively simple water flow. This enabled a significant reduction in the algorithm calculation time and thus in the difference in calculation cost between EKF and EnKF. Indeed, for Case 2, which was a complex river network, we observed greater differences in the evaluation metrics. Figure 8 shows the simulated water level without data assimilation at the beginning period of the simulation (before hour 40) and with subsequent data assimilation using EKF and EnKF. Generally, the simulation error was substantially reduced and approximated to zero at the end of data assimilation using EKF or EnKF. This simulation error would spread out gradually over the long term after the end of data assimilation, indicating that the data assimilation effect could still be maintained for a relatively long period (up to 20 to 30 h) with negligible errors after the end of the data assimilation process. Appl  Note that the river in Case 1 was a mountainous single-channel river with a relatively simple water flow. This enabled a significant reduction in the algorithm calculation time and thus in the difference in calculation cost between EKF and EnKF. Indeed, for Case 2, which was a complex river network, we observed greater differences in the evaluation metrics. Figure 8 shows the simulated water level without data assimilation at the beginning period of the simulation (before hour 40) and with subsequent data assimilation using EKF and EnKF. Generally, the simulation error was substantially reduced and approximated to zero at the end of data assimilation using EKF or EnKF. This simulation error would spread out gradually over the long term after the end of data assimilation, indicating that the data assimilation effect could still be maintained for a relatively long period (up to 20 to 30 h) with negligible errors after the end of the data assimilation process. Table 8 presents a comparison of the evaluation metric results of EKF and EnKF for different stations. The calculation time required by EnKF for this river network was 4243.6 s, which was only 45% of that required by EKF (9335.4 s). Compared to the mountainous single-channel river, as expected, the calculation time increased hugely; furthermore, EnKF exhibited a much lower calculation cost value relative to EKF, i.e., the ratio of the calculation cost between EnKF and EKF decreased greatly, from 81% to 45%, indicating that the calculation efficiency of EnKF was much higher than that of EKF for a complex river network. We also found that the RMSE values of EnKF were much lower than those of EKF, regardless of the water level measuring stations selected. Particularly, Lecheng station had an RMSE value of only 9% when using EnKF compared to EKF. Interestingly, EnKF was more stable than EKF during the data assimilation period at the three stations. The high fluctuations of EKF may reduce its convergence. The filtering time (i.e., the time taken by the algorithm to reach stability after the beginning of filtering) was also much shorter for EnKF than for EKF. For example, for the Hongqiao station, EnKF took 5 h to reach stability for accurate water level calculations, whereas EKF took 20 h.  To verify the effectiveness of EnKF with different key parameter values, Figure 9 shows the RMSE values of EnKF compared to EKF at different observation noise levels. The EnKF algorithm exhibited much lower simulation errors at observation noise level values of 0.0005 and 0.005, regardless of the observation stations. The EKF algorithm had a slightly lower simulation error at Hongqiao station and Liushi station, which may be due to the impractically high value of observation noise in practice.   The solid black line represents the data that were not treated via data assimilation; the green dashed line represents the data that were treated via EnKF; the red dashed line represents the data that were treated via EKF. Table 8 presents a comparison of the evaluation metric results of EKF and EnKF for different stations. The calculation time required by EnKF for this river network was 4243.6 s, which was only 45% of that required by EKF (9335.4 s). Compared to the mountainous single-channel river, as expected, the calculation time increased hugely; furthermore, EnKF exhibited a much lower calculation cost value relative to EKF, i.e., the ratio of the calculation cost between EnKF and EKF decreased greatly, from 81% to 45%, indicating that the calculation efficiency of EnKF was much higher than that of EKF for a complex river network. We also found that the RMSE values of EnKF were much lower than those of EKF, regardless of the water level measuring stations selected. Particularly, Lecheng station had an RMSE value of only 9% when using EnKF compared to EKF. Interestingly, EnKF was more stable than EKF during the data assimilation period at the three stations. The high fluctuations of EKF may reduce its convergence. The filtering time (i.e., the time taken by the algorithm to reach stability after the beginning of filtering) was also much shorter for EnKF than for EKF. For example, for the Hongqiao station, EnKF took 5 h to reach stability for accurate water level calculations, whereas EKF took 20 h. To verify the effectiveness of EnKF with different key parameter values, Figure 9 shows the RMSE values of EnKF compared to EKF at different observation noise levels. The EnKF algorithm exhibited much lower simulation errors at observation noise level values of 0.0005 and 0.005, regardless of the observation stations. The EKF algorithm had a slightly lower simulation error at Hongqiao station and Liushi station, which may be due to the impractically high value of observation noise in practice.

Comparison of the EnKF and EKF Algorithm
To verify the effectiveness of EnKF with different key parameter values, Figure 9 shows the RMSE values of EnKF compared to EKF at different observation noise levels. The EnKF algorithm exhibited much lower simulation errors at observation noise level values of 0.0005 and 0.005, regardless of the observation stations. The EKF algorithm had a slightly lower simulation error at Hongqiao station and Liushi station, which may be due to the impractically high value of observation noise in practice.   Similarly, EnKF's calculation costs and the time required for EnKF to reach stability after the beginning of filtering at different observation noise levels were also consistently much lower than those of EKF, particularly at practical observation noise levels (0.0005 and 0.005), as shown in Table 9. The time for the EnKF to reach stability after the beginning of filtering was 27% to 81% of that for EKF, and the calculation cost for EnKF was 27% to 81% to that for EKF. The above testing comparison between EnKF and EKF showed the great advantages of the EnKF-based data assimilation model when it was applied to the complex river network. Indeed, both EnKF and EKF have their advantages in predicting water levels. However, EnKF was more effective, especially in terms of computational effort and the time to reach stable prediction.
The computational characteristics of the Kalman filter (e.g., a short online estimation time, small storage size, and convenient error analysis) are assessed to determine its suitability for real-time correction processing of dynamic systems. For a nonlinear dynamic system of a complex river network, the EKF algorithm is often used, but there are still some problems with this approach, mainly: (1) it is difficult to calculate the prediction error variance matrix, and its accuracy cannot be guaranteed, as verified by our experiments in which data assimilation based on EKF produced much higher simulation errors than EnKF-based data assimilation. (2) It is necessary to calculate both the tangent state operator and the covariance of dynamic noise [12], which makes the algorithm structure complex and causes a large amount of calculations. The calculation cost of EKF is almost twice as high as that of EnKF. (3) The hydraulic calculation of river networks is a strongly nonlinear problem. The use of the EKF algorithm requires one to omit the terms above the first-order derivative; however, the omitted terms could precisely reflect the nonlinear effect of the strong nonlinear problem. (4) The EKF algorithm contains an assumption in the application, that is, the error obeys a normal distribution; therefore, it is not suitable to be applied to a system with a non-normal distribution of errors. (5) EKF is not capable of addressing horizontal error correlations in the model or measurement errors in large systems due to computational limitations [24].
For water level prediction in practice, the accuracy and timing of water level predictions are also important. Our experiments on the Le Liu Hong plain river network showed that the EnKF-based assimilation model was able to forecast the water level after 5 to 8 h and make predictions up to 20 to 30 h after the data assimilation process was completed. These timings are similar to those found in other research results. For example, Karri et al. [25] conducted data assimilation research on the regional waters of Singapore, revealing that EnKF is efficient in short-term forecasting of water levels and currents, and the model could make better predictions up to 24 h, with the highest forecast quality within 6 h. Although increasing the initial ensemble size could lead to increased computation costs, as shown in Table 5, the overall computational cost was still relatively low. For example, the computational cost of the EnKF-based algorithm, as tested on the simulated river network, was only 223 s when using a personal computer when the ensemble size was 200, observation noise was 0.0001, and standard deviation was 0.1.
Given the complex nature of a river network, more parameters, such as the water-level measurement station number and the river roughness coefficient, can be simultaneously corrected during the simulation time with hydraulic state variables. This further increases the forecasting accuracy and efficiency of the EnKF-based data assimilation model developed in this paper. Sensitivity to the type of river network and connection complexity could also be essential to testing the model's application capacity in practice. In the present study, we focused on comparing the data assimilation strategy based on the hydraulic model of a river network coupled with EKF and that coupled with EnKF. Our main objective was to understand the advantages of the data assimilation method based on EnKF. Thus, the 1D hydraulic model based on the Saint-Venant equations was used to solve the state variables of river networks. Indeed, 2D models are being developed by researchers [26][27][28], and future work can be carried out to develop a data assimilation strategy based on a 2D hydraulic model.

Conclusions
Given the complex nature of a river network, it is challenging to perform good water level predictions with high accuracy and low computational cost when using the traditional EKF-based data assimilation approach. In this study, we developed a new data assimilation scheme based on the EnKF algorithm to optimize real-time water level simulation in river networks. The parameter analysis results showed that the initial sample ensemble size, its standard deviation, and the observation noise level could significantly impact the data assimilation performance. Optimized parameter settings, with a medium ensemble sample size (e.g., from 100 to 150), a relatively low observation noise level (e.g., from 0.0001 to 0.01 m), and a relatively high standard deviation (e.g., from 0.01 to 0.1 m), resulted in good data assimilation performance.
The experimental testing conducted on a mountainous single-channel river showed that the data assimilation method based on EnKF had a similar capacity to correct simulation errors to that of the EKF-based model, but for a river network with complex branches and connecting nodes, compared to the EKF-based model, the EnKF-based data assimilation method was able to decrease simulation errors greatly by up to 49%, to reduce calculation costs by 45%, and to reduce the filtering time by up to 43%. In addition, this performance was not impacted by the key model parameter, i.e., the observation noise level.
The experiments conducted on a real water river network showed that the EnKFbased data assimilation model forecast water levels with a high accuracy after 5 to 8 h and could make predictions up to 20 to 30 h after the data assimilation process was completed, indicating the potential application capacity of the EnKF-based assimilation model developed in this paper.  Data Availability Statement: All data, models, and code generated or used in this study are available upon request from the corresponding author.