An Optimized and Scalable Algorithm for the Fast Convergence of Steady 1-D Open-Channel Flows

Calculating an open-channel steady flow is of main interest in many situations; this includes defining the initial conditions for the unsteady simulation or the computation of the water level for a given discharge. There are several applications that require a very short computation time in order to envisage a large number of runs, for example, uncertainty analysis or optimization. Here, an optimized algorithm was implemented for the fast and efficient computation of a 1-D steady flow. It merges several techniques: a pseudo-time version of the Saint-Venant equations, an evolutionary domain and the use of a non-linear Krylov accelerator. After validation of this new algorithm, we also showed that it performs well in scalability tests. The computation cost evolves linearly with the number of nodes. This was also corroborated when the execution time was compared to that obtained by the non-linear solver, CasADi. A real-world example using a 9.5 km stretch of river confirmed that the computation times were very short compared to a standard timedependent computation.


Introduction
Channelized flows can be simulated using 1-D, 2-D or 3-D models depending on the level of flow detail that is required [1]. Results from hydrodynamic numerical models are used in multiple domains including flood risk analysis [2] or real-time control of river facilities [3], for instance.
One-dimensional models are used when a dominant direction can be assumed in the velocity field. This may be the case when the flow is restricted to the main riverbed. A 1-D model can still be used in the case of out-of-bank flooding, although they are unable to represent complex 2-D flow patterns in the floodplain [4]. Several practical cases have shown that flood mapping can be performed using 1-D models [5,6] and they can also be used in other fields, such as flood routing for hydropower plant operations [7] and mixed flows in pipes [8].
In fact, 1-D models are still used extensively even though 2-D and 3-D models are currently widely available. There are various reasons for their use, for example, digital elevation models (DEM) and bathymetry data are not available in some regions of the world. When only cross-section profiles are available to represent the geometry of a riverbed, 1-D models can make direct use of such profiles whereas they have to be interpolated to be used in 2-D or 3-D models. Besides, many applications do not require a detailed description of the flow features in the floodplain, and thus, a 1-D modeling approach is sufficient.
Large-scale hydraulic modeling of river networks [9][10][11][12][13] makes heavy use of 1-D models because of their ability to compute long stretches of the river at a reasonable cost. To initiate a computation, one needs boundary conditions as well as the initial condition. This initial condition is often a steady water profile, which can be obtained by performing a time-dependent simulation with steady boundary conditions over a period of time that is long enough to reach a steady solution [14]. This step may consume a considerable amount of time before the main problem can be addressed. The initial condition should be computed with the same numerical scheme as the one used in the unsteady model in order to ensure the steadiness of the first step of the unsteady model. The ability of these models to quickly obtain a steady initial solution is also of great importance.
Optimization is another field where obtaining a steady result as quickly as possible is important. Indeed, most optimization techniques require a large number of runs in order to figure out the optimal solution. In order to ensure that the overall computation time is as short as possible, techniques that are quick should be utilized including parallelization [15] or the use of fast computing models.
Although 1-D simulations are known to provide results in a short period of time, accelerating the computation of the 1-D steady solution is of great interest in the fields mentioned above. This can be achieved by using two main strategies. The first consists of exploiting the resources of modern computers more efficiently. Such techniques are more frequently applied to 2-D cases, which naturally require more computing resources. Common hardware acceleration strategies include parallelizing codes on several CPU cores [16,17] or on a GPU [18][19][20]. The second method consists of designing algorithms in order to converge with less effort toward the solution. To the authors' knowledge, there is no such work available in the literature. Both strategies can be combined in order to obtain the best performance.
The purpose of this paper is to propose algorithmic strategies for the fast computation of 1-D steady solutions. First, the equations are presented. Then, we introduce two original strategies in order to reduce the overall computation time. These strategies can easily be implemented in other hydraulic codes. An alternative non-linear solver is introduced for comparison purposes. Finally, the validation results, the optimal parameters for the algorithm, the performance of the model and its application to a real-world problem are presented.

Equations
One-dimensional water flow is described by the St-Venant equations, which are as follows [21]: where A is the cross-section area (m 2 ), Q is the discharge (m 3 /s), ql is a lateral discharge per unit length (m 2 /s), g is the gravity acceleration (m/s 2 ), Sf is the friction slope (-), ux is the velocity along the x direction of ql (m/s) and zs is the free surface elevation (m). The numerical resolution of this set of nonconservative equations has been shown to provide accurate results in many practical cases, including in the presence of discontinuities [22,23]. Assuming a steady flow, temporal derivatives vanish in Equation (1). Since solving the first equation is straightforward, the discharge distribution is known on the entire domain for a given distribution of ql. We assume that the sign of Q is independent from x. It means that a single equation remains with a single unknown A. In order to keep the same numerical scheme as the one used for the unsteady system (which is important when a steady solution is used as the initial condition of an unsteady problem and to be able to use a similar algorithmic strategy), Kerger et al. [14] added a pseudo-temporal term (pseudo-time is  (s)) to the steady form of Equation (1): where    sign Q . Kerger et al. [14] justify this choice for  by analyzing the characteristic velocity of Equation (2) 1 Fr c cu (3) where c (m/s) is the wave celerity. In subcritical flows (Fr < 1, is the width of the free surface (m)), and the sign of  is the sign of  . If Fr > 1, then . For critical flows (Fr = 1), the characteristic velocity is 0, independently from  .
In order to keep some form of consistency with the general model, if we choose    sign Q , an upstream boundary condition is required when Fr > 1 and a downstream boundary condition is required when Fr < 1. This is equivalent to the position of the water depth boundary condition for the numerical resolution of the full 1-D set of equations. With a single boundary condition and a discharge distributed in the channel, solving Equation (2) determines the cross-section area (and subsequently, the water depth) all along the stretch. Equation (2) is discretized according to the finite volume method. It is solved according to the same numerical scheme as the one used for the full unsteady model [14,24].
For a node i, Equation (2) is discretized in finite volumes as: (4) where x (m) is the spatial discretization step and subscripts refer to the position of variables values. For the sake of clarity, we consider  0 Q and a constant reconstruction of the flux at finite volume boundaries to explicate the numerical scheme. When applying the considered upwinding directions [14] for a node i not located next to a boundary, Equation (4) is equivalent to: (5) This flux vector splitting method has been shown to be unconditionally stable [14]. For the node located at the downstream boundary (  1 iN ), if a weak water level boundary condition zs,BC is imposed at the border, Equation (4) becomes: Without a boundary condition imposed on the value of zs at the external border, a nil zs gradient is imposed and Equation (4) becomes: At the upstream node (  0 i ), if a weak boundary condition of the water level is imposed at the border, Equation (4)

Original Solving Strategy
Solving Equation (2) instead of Equation (1) decreases the computation time since the number of equations and unknowns is reduced. In order to save even more time, two additional strategies were implemented: (a) a non-linear Krylov accelerator was used to promote fast convergence and (b) the computation was only performed on a sliding part of the full domain.

Non-Linear Krylov Acceleration
Numerically solving a non-linear system can be performed by different means, including Newton's method and Broyden's method [25]. More sophisticated methods exist in order to solve non-linear systems faster, such as the Jacobian-free Newton-Krylov method [26] and Anderson acceleration [27]. The Anderson acceleration method uses the results from successive iterations in order to adapt the new approximation. Walker and Ni [28] showed that this method can be considered equivalent to the well-known GMRES method [29] when applied to linear systems. The nonlinear Krylov acceleration (NKA) [30,31], which is similar to Anderson acceleration, has been found to be more efficient in some applications than more recent methods such as the Jacobian-free Newton-Krylov method [32]. NKA was used for a faster convergence of our pseudo-time model.
Since NKA only relies on the results directly produced by the hydraulic model, it can be easily applied to other algorithms or other domains. Indeed, no gradient evaluation (nor other prerequisite) is required before calling on the NKA algorithm.
The NKA algorithm records N ( 0 N   ) previous moves of the root finding process. Based on these previous moves, NKA adapts its guess for the new root. One of the main assumptions is that the Jacobian matrix remains constant within the scope of N moves. We briefly explain the method here and extended details can be found in [30][31][32][33].
A Newton-Raphson iteration process computes the n + 1st guess of the root based on the nth guess xn, on the value of the function f at xn and on the invert of the Jacobian matrix J: Instead of evaluating J −1 at each iteration, NKA evaluates it from N previous moves or sets it to the identity matrix, in which case the method degenerates to the fixed-point method.
NKA takes advantage of a history of N corrections of x (denoted v) and N evolutions of   f x (denoted w) at iterate n: The method assumes that J is constant and invertible within the scope of the N previous iterations, which is written as follows: (12) Mathematical developments described in the references cited above lead to the expression: where the coefficients zi are the solution of the projection: Equation (13) shows that the correction of the variable x is decomposed into two components: 1. The first term depicts the correction as a linear combination of previous corrections. 2. The second term is similar to the second term of a fixed-point iteration that takes into account previous evolutions of function f.
Note that the Jacobian matrix is not used in this iterative process.

Evolutionary Domain
The second strategy was designed to reduce computational cost. It consists in reducing the size of the computation domain and sliding it along the river stretch in order to evaluate the cross-section areas from downstream to upstream.
The development of this strategy has arisen from the long history of numerical hydrodynamics in various flow regimes. Indeed, many flows that are solved for rivers are subcritical (Fr < 1) at downstream and upstream boundary conditions. In such a situation, the cross-section area information is propagated from downstream to upstream. For a fixed flow direction, the upwinding direction of the scheme takes all unknowns and properties (except those linked to the discharge) downstream. This means that when a new node is computed, it depends only on the downstream nodes ( Figure 1). It should be noticed that when a node i is added, the upstream border of node i + 1 produces a change in the upwinding direction of property Q and the unknown Avel (cross-section area used to compute the velocity). The node i + 1, which was supposed to be converged, undergoes a new convergence process that indirectly affects nodes i + 2, i + 3, … Theoretically, all the nodes located downstream of node i should be kept in the computational domain.
The boundary condition on the border between the node i and node i − 1 (see Figure 1) is called a "mirror border". On this border, all the unknowns and properties are reconstructed from the computed inner node. This method is equivalent to reproducing the node i in i − 1, like a mirror. For node i in Figure 1, the discretization of Equation (2) is:  In order to assess whether the removed nodes / A   remain lower than f  , an analytical analysis of the discretized model is performed. Let us consider three nodes and their borders ( Figure  2). The discretized formulation of Equation (2) at node i is: The influence of a change in the cross-section area in 1 i  on the value of / A   in i is given by the derivative: If the evolution of the cross-section area in the computation domain ( 1  , then node i should be added again in the computation domain in order to make sure it remains below the threshold until the end of the entire computation. It should be noted that node 1, 2, ii   might also be impacted. This technique guarantees that once the sliding domain has moved along the entire domain, all nodes have reached at least the final precision required. As stated earlier, the method described here was designed within the framework of subcritical flows. In order to be able to deal with a larger range of flow regimes, several adaptations were made. When a supercritical node is detected downstream (let say at position m), it is not computed and the computation domain is extended until a subcritical node is found upstream (say at position n). Then, the domain starting from m to n is computed and converged. This technique avoids boundary condition problems. Indeed, a supercritical flow requires an upstream BC since the characteristic velocity is directed towards the downstream.

Combination of Solving Strategies
The combined use of the sliding domain and NKA involves several specific considerations. The use of NKA is implemented in the code with a safety coefficient that deactivates this optimizing technique in some cases. Indeed, it was experienced that NKA could lead to some instabilities when there was a sudden change in the cross-section area. This behavior is due to the assumption in NKA that the Jacobian matrix is constant and invertible locally [32,33]. In order to avoid such a situation, the accelerator is deactivated when Remove this node from computation list If upstream nodes remain to be added to computation list: Add 1 upstream node to the computation list Finalize

Alternative Non-Linear Solver
Various techniques can be used to solve nonlinear Equation (2). Up to this point, we have chosen to discretize the equation using a finite volume scheme and solve it with an explicit time scheme, which is consistent with the unified strategy of WOLF [24]. Other techniques can be used, including finite difference schemes and/or implicit time schemes. Another possibility is to use an optimization algorithm for nonlinear systems. One of these is the recently developed CasADi software [34,35].
CasADi first started as an algorithmic differentiation tool. During its evolution, developers chose to shift the focus toward optimization. From non-linear expressions, CasADi is able to generate all the information needed by a nonlinear solver in order to return a solution to the problem. CasADi provides interfaces to MATLAB or Python for easy use.
The purpose of using CasADi is to show how our algorithm performs compared to a state-ofthe-art solver.
The implementation in CasADi was done through Opti stack, a collection of helper functions used for nonlinear programming problems. It is possible to define variables to optimize, parameters, an objective function and constraints. The solving of a 1-D steady open channel flow can be done thanks to this framework.
The constraints of the problem are discretized in Equation (2) for each node and a water depth above 0 everywhere. The downstream boundary condition is imposed through Equation (6). If no boundary condition is set, then a flow condition can be imposed through a constraint on the Froude number for the downstream node and the flow head is minimized at the upstream node. If the flow presents a critical section, minimizing the head upstream is equivalent to finding the section with the highest critical head.
Another way to solve a flow with a critical section is to prescribe a Froude number transition from Fr 1  to Fr 1  at that critical section. This is done by setting a constraint on the Froude number on the nodes upstream and downstream of the critical section. The identification of the critical section should be done prior to the computation on the basis of a critical head analysis.

Results and Discussion
This section presents the validation of the results and focuses on the optimal parameters and performance. The geometries of the tests were different in order to examine as many cases as possible.

Validation
The validation of the models was performed on a bump placed in a straight horizontal channel, considering three different flow conditions. The bump and channel geometry have been described previously in [36]. The whole domain ranges from 0 to 20 m with the following bed elevation: The channel is considered to be rectangular. The discretization step was chosen as 0.1 m.
The different flow conditions are described in Table 1. All tests were performed with  The objective was to show that the model is able to deal accurately with various flow regimes and transitions. Test A simulates a fully subcritical flow with no transition. Test B creates a subcritical flow upstream, a subcritical flow downstream and a hydraulic jump in between, downstream of the bump. Finally, the goal of test C is to show the robustness of the method for a downstream supercritical flow and an upstream subcritical flow.
The analytical solutions for tests A, B and C were computed from the Bernoulli principle (head conservation) [22] and the conjugate water depth formula was used for test B. A graphical comparison of the analytical values and numerical results obtained by our algorithm and CasADi is given in Figures 3-5. It appears that the new algorithm and CasADi provide results that fit well with the analytical solution. However, a small difference in energy can be noticed between the analytical solution and the numerical results and also between both numerical methods (see Table 2). This was quantified and explained by Bruwier et al. [22]. Moreover, a more noticeable localized difference appears between CasADi and the new algorithm results at the hydraulic jump. Even if the numerical scheme is the same for the new algorithm and CasADi, each method has its own convergence threshold. Altogether, the analysis validates both models.

Optimal Setting of the New Algorithm
Five test cases were defined in order to specify the optimal values for the parameters for the new algorithm. These five tests were designed in order to induce changes in the flow characteristics due to topographic or cross-section variations. The first three tests (1 to 3) concern a channel with a rectangular cross-section and a bed slope that follows a sine function: (for tests 1 and 2) and 2   for test 3. Two hundred nodes were used to discretize the 20 m long channel, resulting in a 10 cm spatial step. For test 1, the downstream boundary condition is a free surface elevation imposed at 1.2 m. For tests 2 and 3, the same type of boundary condition was imposed with a smaller value of 0.55 m, which results in a higher Froude number downstream. The specific discharge was imposed upstream at 1 m 2 /s for tests 1 to 3. The Manning equation was used for friction in tests 1 to 3, with the Manning coefficient n = 0.04 s/m 1/3 .
The topography and the hydraulic solutions were found thanks to the principle of head conservation (Bernoulli) and are shown in Figures 6-8 for tests 1 to 3. Table 3 summarizes the characteristics of each test. The objective of tests 1 to 3 was to analyze the influence of a variation of the bed topography on the behavior of the sliding domain. For test 1, the irregularity of the bed has only a slight influence on the water level. For the other tests, the higher Froude number and less spaced bed elevation variations were meant to investigate the possible influence of oscillations in the water level on the sliding domain performance.   The topography and final water levels for tests 4 and 5 are depicted in Figures 9 and 10. A summary of these tests is given in Table 3.    Our algorithm includes many parameters that need to be specified. These parameters are the partial residual threshold  p , the final residual threshold  f , the temporal scheme to solve Equation (4) and the coefficient for the deactivation of the nonlinear Krylov accelerator. The final and partial residuals are two closely linked parameters. They also have a direct impact on the computation time. For a given final residual, which has to be parametrized by the user, the partial residual influences the number of iterates required to converge the partial domain and the size of these domains. After investigation, the other two parameters were shown to have almost no influence on the computation time. The following results focus on the best value to use for the partial residual threshold p  for a fixed value of 8 10 f    m 2 /s. CPU times were measured on a desktop computer (Intel i7 3.5 GHz CPU) for the five tests and various partial residual thresholds. The results are reported graphically in Figure 11. It appears that the overall computation time decreases with an increase in the partial residual threshold. Some stagnation appears around 10 −2 m 2 /s for tests 3 and 5. This can be explained by the fact that the residual naturally decreases at each iteration. Keeping some nodes in the computation domain results in a decrease in the residual for each node included in the computation domain. In order to assess the efficiency of the new algorithm, two scalability tests were performed. The first one consisted of extending the domain upstream, with a constant spatial discretization. The second test consisted of keeping the same channel but refining the discretization and then increasing the number of computation nodes.
The first test took place on a frictionless sine bed elevation described by: The downstream boundary condition is a water level imposed at 0.65 m. The discharge is constant along the channel stretch and is equal to 1 m 2 /s. Cross-sections are rectangular and 1 m wide. The computation results (Figure 12a) showed that the computation time per node decreases when the number of nodes increase. This can be explained by the fact that when the domain gets longer, the flow conditions upstream are smoother than downstream. Longer domains undergo fewer changes than shorter domains, leading to shorter computation time per node. This example shows that higher partial residual values provide the best computation times. In order to complete this scalability study, we looked at the behavior of the algorithm when the spatial step decreases for a given domain length. In classical explicit schemes, this case leads to a quadratic increase in the computation time. Indeed, when the spatial step decreases, the number of nodes increases and the time step decreases.
A 100 km-long channel with a constant 0.025% slope was chosen to illustrate the behavior of the new algorithm. The cross-sections are trapezoidal and are described using tabular values (1 m wide at the bottom of the section and 5 m wide at 1 m above the bottom). Friction was generated using a Manning law with n = 0.03 s/m 1/3 . The downstream boundary condition is a water level set at 1 m, and 1 m 3 /s is injected at the upper node and the injection of 4 m 3 /s is shared amongst the other nodes (through the ql term in Equation (1), 0 The computation times are showed in Figure 12b. The behavior is slightly different from that observed in the previous test. Indeed, the evolution of the computation time is linear in regard to the number of nodes (the CPU time per node is globally constant) when the partial residual is set at 10 −6 and 10 −8 m 2 /s. For partial residual values of 10 −2 and 10 −4 m 2 /s, the evolution is linear up to 50,000 nodes; however, for the finest discretization, the computation time increases in a nonlinear way.
This point was investigated further. It appears that at some moment in the computation, the algorithm needs to increase its computation domain size without being able to reduce it quickly (i.e., upstream nodes are added to the computation list while downstream nodes cannot be removed for residual values reasons). This increase in the number of nodes in the computation list was nonlinear compared to the situation with coarser discretization.

Performance of the Models
The test case chosen for the comparison between CasADi and our algorithm is a rectangular channel (1 m wide) with a 100 m long sine wave bottom as shown in the following equation: A Manning friction formula with n = 0.04 s/m 1/3 was used to estimate friction losses. The downstream boundary condition is a water depth equal to 0.6 m. The unit discharge is uniform and is equal to 1 m 2 /s. The precision parameters are as follows: We compared the CPU time spent in the solving stages of our new algorithm and CasADi. The goal was to compare the evolution of the computation time with the number of nodes, rather than comparing absolute values. The results are given in Figure 13. They confirm that the computation time evolves almost linearly with the number of nodes when the new algorithm is used. This is not the case with CasADi: the computation time increases following a power law N  , 1   . This is due to the matrix operations that CasADi has to perform. Increasing the number of nodes leads to a nonproportional increase in computation time. The speed up factor, which is the ratio between the CPU time spent when using our new algorithm and the CPU time spent with CasADi, ranges from in order of 10 0 to in order of 10 2 according to the number of nodes considered.

Real-World Application
The Romanche River in the French Alps is currently facing a number of significant changes. A new hydropower facility is being built in order to replace older power plants. In this context, the dam operator needs a fast computation routine in order to operate its facilities in an optimized and safe way. To do so, an unsteady 1-D model was implemented in Fortran and integrated in a Simulink S-Function in order to be compatible with the operator model [7]. Our new algorithm was used for the fast computation of an initial condition.
The studied part of the Romanche River stretches over 9.5 km (Figure 14), from the new Livet Dam to Gavet. The geometry and the calibration of the friction coefficients are detailed in [7]. The river was discretized with 191 nodes (50 m-long meshes). The downstream boundary condition was set at an elevation of 436.8 m. Two discharges were tested: 10 and 40 m 3 /s. Since the details of the hydraulic results are of limited interest for this paper, we focused on the execution time and showed that the Froude number remains under 1 for both discharges (Figure 14). For the same machine as the one used earlier, the execution times (CPU time) are 0.018 s and 0.022 s, respectively (

Conclusions
Several innovations are introduced in this paper, including the use of the non-linear Krylov accelerator in open-channel flows, an evolutionary domain algorithm and the use of CasADi to solve steady 1-D flows. These improvements lead to an algorithm that is able to quickly solve steady openchannel flows. Therefore, optimization problems and uncertainty analyses that require many evaluations, become more tractable.
An original algorithm was implemented in order to significantly improve the computation time of a steady 1-D open-channel flow problem. It includes two main optimizing strategies: a non-linear Krylov accelerator and an evolutionary domain algorithm. This new algorithm was validated against the academic benchmarks of flows over a bump. The results showed a good agreement between the numerical and analytical values.
The performance of the suggested algorithm was evaluated against the non-linear optimization software CasADi. It showed good scalability properties. Indeed, the execution time of the proposed algorithm evolves linearly with the number of nodes. This is not the case with other techniques when the mesh is refined and/or when the number of nodes increase.
Finally, we demonstrated the capabilities of our algorithm in a real-world case. We used the optimized algorithm in order to compute quickly the initial condition required by the operational model for the Romanche River in France. Our technique was able to provide a steady state solution to the unsteady model in a very short period of time.
Funding: This research received no external funding.

Conflicts of Interest:
The authors declare no conflict of interest.
Computer Code and Software: The following software and codes were used for this paper: (a) WOLF was developed by the HECE research group (http://www.hece.ulg.ac.be/cms/) at the University of Liège since 2000 and is not freely available, (b) CasADi is freely available at https://web.casadi.org/, and (c) the routines used to test CasADi are freely available at https://gitlab.uliege.be/HECE/HydroCasADi.