Abstract
This work aims to supplement the realization and validation of a higher-order well-balanced unstructured finite volume (FV) scheme, that has been relatively recently presented, for numerically simulating weakly non-linear weakly dispersive water waves over varying bathymetries. We investigate and develop solution strategies for the sparse linear system that appears during this FV discretisation of a set of extended Boussinesq-type equations on unstructured meshes. The resultant linear system of equations must be solved at each discrete time step as to recover the actual velocity field of the flow and advance in time. The system’s coefficient matrix is sparse, un-symmetric and often ill-conditioned. Its characteristics are affected by physical quantities of the problem to be solved, such as the undisturbed water depth and the mesh topology. To this end, we investigate the application of different well-known iterative techniques, with and without the usage of preconditioners and reordering, for the solution of this sparse linear system. The iiterative methods considered are the GMRES and the BiCGSTAB, three preconditioning techniques, including different ILU factorizations and two different reordering techniques are implemented and discussed. An optimal strategy, in terms of computational efficiency and robustness, is finally proposed which combines the use of the BiCGSTAB method with the ILUT preconditioner and the Reverse Cuthill–McKee reordering.
1. Introduction
Boussinesq-type (BT) equations have been widely used in the past few decades for the description of water wave propagation and transformation near coastal zones. Starting from the classical Boussinesq equations [1], which are limited to relatively shallow water, and up to now-days, numerous researchers have contributed to the development of analytical theories and their numerical approximation as to simulate the various wave phenomena such as wave shoaling, diffraction, refraction and breaking. As such, significant attempts have been made to extend the applicability of the Boussinesq-type models to deeper water, for example in [2,3,4]. These extended models give a more accurate representation of the wave’s phase and group velocities in intermediate waters, with water depth to wavelength ratio up to 1/2, and sometimes are referred to as low-order enhanced BT equations. Moreover, significant effort has been made in recent years into advancing the nonlinear and dispersive properties of BT models by including high-order nonlinear and dispersion terms, we refer for example to [5,6,7,8]. An extensive review, which describes the state of the art in water wave modeling by means of BT models, can be found, for example, in [9].
For the numerical solution of BT equations different methods have been proposed such as, the Finite-Difference (FD) method, [3,4,8,10] and the Finite-Element method, [11,12,13]. In the last ten years the use of the Finite-Volume (FV) method has become the most widely used one due to its effectiveness in the approximation of hyperbolic conservation laws. Applications and advances along this line of research can be found, for example, in [14,15,16,17,18,19,20,21,22]. Usually in the numerical solution of BT equations the inversion of one or more matrices, i.e., solution(s) of linear system(s), is required as to recover the actual velocity field of the water flow from the solution variables. This is an essential procedure (especially in two-dimensional simulations) if the original dispersive BT equations are re-written in a conservative-like form and the vector of the unknown variables includes part of the dispersive terms, [9]. In the most simple cases e.g., when structured computational meshes are used, the linear system’s matrices can be of tridiagonal shape, but when using e.g., unstructured meshes the matrices that occur are more complicated, unsymmetric and have variable sparsity patterns [10,12,19]. For this reason further investigations for the efficient and accurate solution of the resulting linear systems in two-dimensional computations on unstructured meshes are in needed.
This work is complementary to [19,20,23] where, for the first time, a high-order well-balanced unstructured finite volume (FV) scheme on triangular meshes was presented for modeling weakly nonlinear and weakly dispersive water waves over slowly varying bathymetries, as described by the 2D depth-integrated extended Boussinesq equations of Nwogu [3] rewritten in conservation law form. Formulating the model equations in conservative form is imperative when one wants to exploit the advantages modern FV schemes have to offer, such as conservativity and shock-capturing. In this work, the FV scheme implemented numerically solves the conservative form of the equations, following the median dual node-centered approach, for both the advective and dispersive part of the equations. For the advective fluxes, the scheme utilizes an approximate Riemann solver along with a well-balanced topography source term upwinding. Higher order accuracy in space and time is achieved through a MUSCL-type reconstruction technique and through a strong stability preserving explicit Runge–Kutta time stepping. After each step in the time marching scheme a non-trivial sparse linear system must be solved in order to recover the velocity field in the flow. To this end, the accuracy and efficiency of the overall numerical solver depends on this system’s solution.
Following from the above, we examine here, in some depth, the solution process that must be followed for the solution of this sparse and large linear system. The complexity of the system is such that the use of iterative methods is necessary to obtain efficient solutions for even moderately sized problems (depending on the degrees of freedom). As such, two classical Krylov Subspace iterative methods are utilized in this work [24,25]. The Generalized Minimal Residual (GMRES) and the Biconjucate Gradient Stabilized (BiCGStab) algorithms. Both methods are implemented using the SPARSKIT package [26]. The optimal solution strategy depends on factors such as problem size, sparsity pattern and the system’s matrix eigenspectrum. The applicability of these two well-known iterative methods along with preconditioning and reordering possibilities are examined, as to find an efficient solution procedure. The effect of the mesh’s topology and of certain physical conditions, e.g., the flow field’s reference water depth, to the solution procedure is also investigated. We deem that the investigation will provide the useful guidelines needed when other such model equations are to be approximated on unstructured meshes and by any FV numerical approach. This we consider to be the major novel aspect of the present work.
The outline of this paper is as follows: The BT model equations are described in Section 2 while the numerical model used is briefly presented in Section 3 along with the derivation of the sparse matrix and its properties. The iterative methods utilized for the solution of the resulting linear system, the preconditioning methods tested and a detailed comparison of their performance are presented in Section 4, while Section 5 describes the reordering methods. Conclusions are drawn in Section 6.
2. Governing Equations
A number of BT models have been developed to describe the transformation and propagation of waves in coastal regions. In this work, the model equations solved are the extended BT equations of Nwogu [3] which are based on the assumption that the wave height is much smaller that the water depth h. The equations derived by Nwogu [3] using the velocity vector at an arbitrary distance, , from a still water level, h, as the velocity variable, instead of the commonly used depth-averaged velocity. The elevation of the velocity variable becames a free parameter used to optimize the equations and making them applicable to a wider range of water depths, compared to the classical Boussinesq equations. The equations of Nwogu describe weakly non-linear and weakly dispersive water waves in variable water depth. , which measures the weight of nonlinear effects, and the square water depth to wave length (L) ratio , which represents the dimension of the dispersive effects, is of the same order with, i.e., the Stokes number . The equations provide accurate linear dispersion and shoaling characteristics for values of (intermediate water depths), where k is the wave number and is essentially a scale of the value of , providing a correction of to the shallow water theory. The equations are presented here in a conservative-like form and as such are numerically approximated by an unstructured FV scheme. Following [19] the vector conservative form of the equations reads as:
where is the space-time Cartesian domain over which solutions are sought, are the physically conservative variables, is the vector of the actual solution variables, with H being the total water depth and are the nonlinear flux vectors given as
where
The source term vector, , includes the bed topography’s (b) slope , the bed friction effects , given in this work in terms of the Manning coefficient , and the dispersive terms . These terms read as
with
where
Equations (1) have flux terms identical as those in the non-linear shallow water (or Saint-Venant) equations and variables contain all time derivatives in the momentum equations, including part of the dispersion terms. The dispersion vector contains only spatial derivatives since is explicitly defined by the mass equation.
3. The Numerical Model
To numerically solve Equation (1) we use the Finite Volume (FV) scheme proposed in [19,20]. This FV approach is of the node-centered median-dual type where the control volumes (see Figure 1, left panel) are elements dual to the primal triangular mesh. The locations of the discrete solutions are called data points N which essentially correspond to the number of vertices of the mesh. Referring to Figure 1, the boundary of a control volume (cell), , around an internal node P, is defined by connecting the barycenters of the surrounding triangles (having P as a common vertex) with the mid-points of the corresponding edges that meet at node P.
Figure 1.
Median-dual computational cell implemented in the FV scheme (left) and the computational cell used for the gradient of the divergence in (2) (right).
3.1. The FV Approximation
After integration of (1) over each computational cell and application of the Gauss divergence theorem the semi-discrete form of the FV scheme reads as:
where is the volume-averaged value of the conserved-liked quantities at a given time, is the set of the neighboring nodes to P, i.e., , where is the boundary of the computational domain and are the numerical flux vectors across each internal face, and boundary face respectively.
The numerical fluxes are evaluated solving a Riemann problem at cell interfaces using the approximate Riemann solver of Roe [27]. To reach higher-order spatial accuracy an extension of the MUSCL methodology of Van Leer [28] is used. This extension relies on the evaluation of the fluxes with extrapolated physical variables at the midpoint M of an edge . Each component of the physical variables and bed topography b are extrapolated using extrapolation gradients which are obtained using a combination of centered and upwind gradients [19,29,30,31] as to increase accuracy of the basic MUSCL reconstruction [30]. In this way a third-order spatial accuracy is obtained [19]. For the reduction of oscillations in cases where non-linearity prevails (e.g., when the dispersive terms become negligible) the use of a slope limiting procedure is necessary and the edge-based non-linear slope limiter of Van Albada–Van Leer is used. Details for the numerical model used such as wet/dry front treatment, boundary conditions and discretization of the dispersive terms can be found in [19,20].
3.2. The Resulting Linear System for the Velocity Field Recovery
Concerning the time discretization an optimal third order explicit Strong Stability Preserving Runge–Kutta (SSP-RK) method was adopted [19,32] under the usual CFL stability restriction. In each time step on the RK scheme the values of the velocities must be extracted from the new solution variable following from (2). The FV discretization of results to a sparse matrix. The linear system with and , has to be solved in each step of the RK time marching scheme.
Keeping in mind that is our unknown velocity vector at each mesh node, each two rows of the matrix correspond to a node on the grid and for each such node we have,
Now the gradients and must be computed. For that reason, we use the average of the gradient in a cell [33,34,35]:
which is computed in each mesh node P by applying the Green–Gauss theorem in the region , i.e., the union of all triangles which share the vertex P and is the value of w at the midpoint of the edge . The outward normal vector to as , while is the corresponding unit vector. Using (11) Equation (10) reads as:
and is now obvious that we have to compute and at M. Referring again to Figure 1 (right panel), a new computational cell is defined, constructed by the union of two triangles which share the edge . Hence, the discrete averages of the divergence can be computed as follows for ,
where A similar computation is performed for the approximation of . We refer to [19] for more details on the discretization. By performing the above approximations we restrict the unknown information used in (10), i.e., values of , only to that coming from the nodes that are neighbors of node P, i.e., nodes .
Substituting the above equation to (12) and for each node , gives:
which can be further rewritten as:
After some calculations the resulting sparse linear system to be solved can be presented in a more compact form, as:
where the sub-matrices and now depend only on geometric quantities and the area , and are given as
The number of geometrical entries in each summation is always two, while the number of entries in the summation in (16) is equal to the number of the neighbors of P. This means that the maximum non-zero elements of the matrix in each row P in (16) are two times the number of the neighbors of node P plus one.
It is important to state here that, for the linear system (16) its coefficient matrix is constant in time. So it is constructed and stored, in a compressed sparse row CSR format [24], at a pre-processing stage at the beginning of each simulation. The present work concerns the solution process after the sparse matrix has been stored.
3.3. System’s Matrix Properties
The resulting system’s coefficient matrix is an un-symmetric but structurally symmetric matrix, in terms of its non-zero entries. It is often ill-conditioned and also mesh dependent. This means that the sparsity pattern of the matrix depends on the ordering of the nodes on each grid. Different grids lead to different matrix structures. In this work four type of grids are used, see Figure 2.
Figure 2.
Representative grid types: Equilateral, Orthogonal I, Orthogonal II, Distorted (left to right).
For a given computational domain, and with out loss of generality, with dimensions in the x- and y-direction respectively, we define a subdivision of by line segments, namely and depending on the grid type the corresponding subdivision of can be easily determined. As such, we define the characteristic length (effective mesh size) for each grid as .

Figure 3.
Matrix sparsity patterns for the four different mesh types shown in Figure 2 for with the number of non-zero elements.
Each grid used has 15 nodes in the x-axis () and the resulting matrix has non-zero elements shown. The resulting structure using the Orthogonal I type of grid is quite different from that of the other types which have a much smaller bandwidth. Further, for the Orthogonal I grid type the number of the non-zero elements, , in the system’s matrix is almost double of that obtained from the other grids. Matrix structure remains the same while refining the meshes. Table 1 shows the characteristics of indicative matrices produced using for and different nodes in x-axis () while Table 2 show the values for each matrix reported in Table 1.
Table 1.
Total () and non-zero elements () for the matrices produced for m with different .
Table 2.
The values for the matrices produced for m with different .
The properties of the matrix are also affected by the physical situation of the problem examined. The most important parameters are the still water level, h, see Equation (15), with relation to the nodes used on the grid, i.e., the resulting value of . To illustrate the dependence on the ratio the spectrum of eigenvalues for six matrices are shown in Figure 4 and Figure 5. The six matrices examined have been produced using two type of grids, equilateral and Orthogonal I. Figure 4 shows the eigenvalues of three matrices produced using the equilateral type of grid. The first matrix (on the left) has m and the second (center) uses m and the third one (right) uses m and . All matrices have minimum eigenvalues near zero and it is noted that as the ratio grows the spectrum of the matrix has a much larger spread of eigenvalues through the right half of the complex plane. The same conclusions can be obtained from Figure 5 which depicts the spectrum of eigenvalues for three matrices, obtained using the Orthogonal I type of grid. The parameters used are the same as in Figure 4.
Figure 4.
Eigenvalues of three matrices using the equilateral type of grid with (top left), (top right) and (bottom).

Figure 5.
Eigenvalues of three matrices using the Orthogonal I type of grid with (top left), (top right) and (bottom).
The above behavior gives evidence that preconditioning is crucial in this type of problems. These conclusions are further reflected in the respective condition numbers of the three matrices which are between –. For example the matrix produced by the distorted grid and m, (see Figure 2) has a condition number equal to while the one produced from the equilateral grid has condition number . The matrices become even more ill-conditioned as the depth is further increased, or the grid is refined. Similar behavior has been mentioned in [10] where the finite difference method is used to solve a Boussinesq model in two dimensions.
4. Iterative Methods, Preconditioning and Reordering
4.1. Application of Iterative Methods
In this presentation, we want to develop an optimal strategy for the solution of system (16) abbreviated to Ax = b, and we use a “toy” problem in which the right hand side vector is computed adding the columns of the matrix . The initial guess used for the iterative methods discussed next was the zero vector. From now on all test cases presented will solve this “toy” problem unless otherwise stated. We mention again here that the system’s matrix depends on the numbering and the geometrical quantities of the mesh nodes. So, and since it is not depending on the unknown quantities of the BT model we can construct it and store it only once in the beginning of our simulation.
One common solution method for sparse systems is to use a sparse direct solution algorithm via a complete factorization of the matrix. However, for large scale problems concluding to large sparse linear systems, the computational time required for the factorization along with the storage requirements can be an insurmountable problem [36]. An alternative to direct solution methods is the use of iterative ones, which have significantly lower storage demands. Two classical Krylov Subspace iterative methods are used in this work. The Generalized Minimal Residual (GMRES) and the Biconjucate Gradient Stabilized (BiCGStab) algorithms. Both methods are implemented using the SPARSKIT package [26]. The optimal solution strategy depends on factors such as problem size, sparsity pattern and the matrix’s eigenspectrum. We note here that, during the course of this work several other iterative methods such the Flexible version of Generalized Minimal Residual Method (FGMRES), the Quasi Minimum Residual Method (QMR) and the Transpose Free QMR (TFQMR) where considered. However, the GMRES and the BiCGStab methods where proven more robust in solving the problem at hand and for brevity we present the efficient application and obtained results for these two.
Figure 6 depicts the performance (in terms of CPU time) of each method for two different type of grids, of the Equilateral type (left) and the Orthogonal I type. Each sub-figure depict the computational time needed as to solve the sparse system while the nodes are refined and the depth is increased.
Figure 6.
CPU time versus variable still water level to ratio for GMRES (solid line) and BiCGStab (dashed line).
For both cases we can see that the computational time grows dramatically not only as the mesh is refined (which is expected) but also as the still water level h (with respect to ) is increased. Furthermore, comparing the CPU times needed for each iterative method we can see that they start to vary as the water depth to ratio is increased. The BiCGStab method solves the linear systems produced (for both types of grid) in less time than the GMRES. In some cases both methods could not reach convergence for higher values. This was more prominent for GMRES. For both iterative methods a relative residual error tolerance was used as the convergence criterion. Following from the above, we can clearly understand that the usage of a preconditioning method is mandatory as to reduce the number of iterations needed and consequently the total computational time.
4.2. Application of Preconditioning Methods
Although Krylov Subspace iterative methods are well suited to solve relatively sparse system, they can exhibit slow convergence. Thus, it is essential to use a good preconditioning strategy as to enhance their convergence properties and reduce the computational cost. A survey of preconditioning techniques for large linear systems can be found in [25,37]. To this end, this section introduces the different techniques of preconditioning tested.
A good preconditioner must approximates matrix well, while at the same time being easy to solve. Let be a new non-singular matrix which is a good approximation of . The preconditioned system will have the same solution as system , and it can be solved easier. The non-singular matrix is called preconditioner. In the present work all preconditioning is done from the left even though preconditioning from the right has been found to be equally effective. Three preconditioning techniques, which are freely available as part of the SPARSKIT [26] package, were implemented and tested; the ILU(0), ILU(k) and ILUT preconditioners.
Since convergence of an iterative method, such as the GMRES and BiCGStab, depends on the eigenvalues of the matrix, generally speaking, preconditioning attempts to improve the spectral properties of [25]. A “good” preconditioner transforms the matrix so that the original sparse linear system can be solved easily, with low storage demands, at low computational expense. The preconditioned matrix should have eigenvalues away from zero. Even for non-symmetric matrices with a cluster of the eigenvalue spectrum away from zero can still lead to a rapid convergence.
One way to approach this is to use a direct method such as -decomposition, i.e., . The system can then be solved in two steps using the factors and . The drawback of this method is that during the factorization process, matrices and are dense so the computer storage demands and the dimension of the problem may become huge. One of the simplest ways of defining a preconditioner is to perform an incomplete factorization of the original matrix A. This entails a decomposition of the form where L and U have the same nonzero structure as the lower and the upper part of A, respectively and R is the residual matrix of the factorization. This incomplete factorization is known as ILU(0) and often leads to a crude approximation which in turn may result to the need of many iterations to reach converge. To remedy this, several alternative incomplete factorizations have been developed by allowing more fill-ins in L and U. In practice, and as to find the L and U, the Gaussian elimination process is used and a level of fill-in is attributed to each element which is dropped or not according to this value [24]. In general, a more accurate ILU factorization requires fewer iterations to converge but of course the preprocessing cost to compute the factors is higher. Incomplete factorizations that rely on the level of fill are blind to numerical values because elements that are dropped depend only on the structure of the matrix. Some methods are available based on dropping elements in the Gaussian elimination process according to their magnitude rather than their location.
Different factorization algorithms have different rules that govern the dropping or fill-in in the incomplete factors. In this work two rules were used; the level of fill approach and the use of a drop tolerance parameter. From now on applying a dropping rule to an element will only mean replacing the element with zero if it satisfies a set of criteria. The drop tolerance is a positive number which is used in a dropping criterion. The dropping strategy depends on the matrix, the non-zero numbers and the conditional number. The philosophy of that is to accept only the values greater (in absolute value) than the tolerance in the new fill-ins. In general, it is difficult to choose the right value for the drop tolerance, because one cannot predict the amount of storage that will be needed.
4.2.1. The ILU(0) Preconditioner
When the level of fill-in is equal to zero then the ILU(0) preconditioner in the ILU factorization is recovered. So, ILU(0) refers to a full factorization, with no reduction or fill-in, and it is also called the no-fill ILU preconditioner. We tested this preconditioner for systems produced for two types of grids, namely the Equilateral and Orthogonal of type I, using the two iterative methods (GMRES and BiCGStab). The computational values in the toy problem were m and For the Equilateral type of grid none of the two iterative methods was able to converge while for the Orthogonal I type of grid only the BiCGStab method for converged after 34,957 iterations. Even though the ILU(0) preconditioner is easy to implement and its computation is inexpensive, it is effective mainly for simpler problems. By simpler problems we mean, for example, low-order discretizations of scalar elliptic PDEs leading to non-singular matrices and diagonally dominated matrices. These are the type of problems for which these preconditioners were originally proposed [25]. For this case study the non-fill factorization resulted in too crude approximation of the matrix , so more sophisticated preconditioners had to be used.
4.2.2. The ILU(k) Preconditioner
The computation of the ILU(k) preconditioner requires only the fill-in criterion. So, if k is a non-negative integer, all the fill-ins whose level is greater than k are dropped. The limitation of this process is that for matrices that are not diagonal dominated ILU(k) requires to store many fill-ins that are small in absolute value leading to expensive computations and in preconditioner of lower quality. In this work, we used the value elements per row (in the factors) which has been found to be a good universal parameter for the problems presented here.
Figure 7 demonstrates how the performance of the iterative methods is affected by the application of the ILU(k) strategy in terms of the increasing still water level to ratio for system matrices produced by Equilateral type of grids. The iterations needed for convergence vary between 2 and 4 for both methods. Comparing with Figure 6, a substantial improvement in computational time can be observed. The results show that the total time needed for convergences is not affected by the ratio but, as expected, is highly increased while increasing the number of unknowns of the problem (for larger values i.e., smaller values). The two iterative methods needed the same computational time.
Figure 7.
CPU time versus for GMRES (solid line) and BiCGStab (dashed line) using the ILU(k) preconditioner with .
Table 3 presents the actual times and iterations needed for the solution of the linear systems produced using the Orthogonal I type of mesh. We must note here that, whenever the results are omitted or a dash is placed then the iterative method used failed to converge. It can be concluded from Table 3 that, as the ratio increases from a grid refinement, i.e., lower values of are produced, the computational time dramatically increases and in some cases convergence can not be obtained for system resulting from this type of grid. The main reasons for this are, the large amount of nodes in the mesh, compared to the other types of meshes (see Table 2) and the different coefficient matrix structure, which has a bigger bandwidth (please refer to Figure 3). For this grid type, and although its is a structured one, the ILU(k) preconditioner fails to improve the condition of the matrix for finer meshes even though the iterations needed for the converged are significantly decreased at coarser ones.
Table 3.
CPU time for the solution of linear systems resulting from Orthogonal I type of grids using the ILU(k) preconditioner, with , preconditioner.
It should be noted here that, the total time needed for the computation of the preconditioner is independent of the still water depth h. The actual CPU times needed are for matrices resulting from refined meshes for the two grid types are presented in Table 4. These times are very small compared to the total time needed for the iterative methods to reach convergence.
Table 4.
Times needed for the ILU(k) preconditioner’s computation.
4.2.3. ILUT() Preconditioner:
The general algorithm of preconditioner ILU with threshold, ILUT(), includes a set of rules for dropping small elements. Thus, the use of ILUT produces a lower storage factorization. The main idea here is to replace an element with zero if its value is smaller than a threshold value , set by the user. The dropping rule can be applied to a row, by checking all the elements and keeping/ignoring them depending on their arithmetic value [24]. By increasing the drop tolerance (threshold) the sparsity of the preconditioner increases and the amount of work that will be needed for applying the preconditioner decreases. The limitation here is that the application of the ILUT can affect the solution’s accuracy thus, the iterative methods may require more iterations to converge. An additional dropping rule which can be applied is to keep only the p-th largest elements in the L part of the row and the p-th largest elements in the U part of the row in addition to the diagonal element, which is always kept. In this work we use always The goal of this dropping step is to control the number of elements per row.
Figure 8 presents, like before, how the performance of the ILUT strategy is affected by the variable water depth to ratio, for the matrices produced by the equilateral type of grid. The threshold value used here was set to . The iterations needed for convergence varied for 3–13 but the CPU time needed is slightly less compered to the one obtained using the ILU(k) preconditioner (as presented in Figure 7) for higher values. For smaller ratios some time improvements can be noticed. We must acknowledge here that for the systems produced by the Orthogonal I type of grid both iterative methods failed to converge. Lowering the threshold value lead to no substantial improvement in computational efficiency or convergence of the iterative methods.
Figure 8.
CPU time versus for GMRES (solid line) and BiCGStab (dashed line) applying the ILUT preconditioner with .
5. Reordering
Since all preconditioners applied in the previous section failed to substantially improve the condition of the produced matrices as to decrease the iterations for convergence and/or the computational time needed and in some cases no convergence could be obtained, we must use a reordering technique so as to improve the stability of the incomplete factorizations. It is well known that the incomplete factorization preconditioners are sensitive to the ordering of unknowns and equations [25]. Optimal reordering strategies can be used with the dual purpose of limiting the bandwidth of the discrete operator A and to reduce excessive fill-in in the factorization of the involved operators [38]. In most cases, reordering techniques tend to affect the rate of convergence of preconditioned Krylov subspace methods [25]. These algorithms aim to minimize the bandwidth of the matrix . Two different approaches were used in this work namely, the Cuthill–McKee (CMK) and the Reverse Cuthill–McKee (RCM) permutations.
Figure 9 shows the CPU time versus the relative water depth for two types of grids using again the GMRES and BiCGStab algorithms implementing the ILU(k) preconditioner with and with CMK reordering. As can be seen in Figure 9 the usage of CMK reordering in addition to the ILU(k) preconditioner slightly improves the convergence process, when compared to the single usage of the preconditioner in to Figure 7 for the equilateral type of grids. However, it can also be observed that on grids with the same degrees for freedom, increasing the water depth does not affect the computational time needed.
Figure 9.
CPU time versus for GMRES (solid line) and BiCGStab (dashed line) using the ILU(k) preconditioner and CMK reordering.
Using the ILUT preconditioner and the CMK reordering (see Figure 10) the computational results vary. We tested different values of drop tolerance for this preconditioner. For values greater than none of the iterative methods was able to converge. Figure 10 shows the CPU time versus the relative water depth using CMK and ILUT with drop tolerance (left column). Even though we observe convergence to the solution for most of the test cases, the computational time increased as the relative water depth was increased. This makes the choice of the drop tolerance unsuitable for physical problems with relative large water depth. Using a smaller value for (see Figure 10 right column) the behavior of the linear systems, especially when using GMRES, closely approximates the behavior shown in Figure 9 since the CPU time is not affected by the variation of the ratio (using the same degrees of freedom). The results of this work confirm also those found in [10], where non-linear and highly dispersive Boussinesq equations are solve using a finite difference scheme. The ILUT preconditioner works quite well in shallow to intermediate water depths, however may rapidly lose effectiveness as the depth is further increased. However, a big improvement was that, for systems produced from Orthogonal I type grids, in this case, both iterative methods converged with satisfactory performance.
Figure 10.
CPU time versus variable water depth for GMRES (solid line) and BiCGStab (dashed line) using ILUT preconditioner with threshold (left) and (right) and CMK reordering, for Equilateral type of grid (up) and for Orthogonal I (down).
One of the most common reordering techniques, used mostly with the finite element method, is the Reverse Cuthill–McKee (RCM) ordering [39]. The work of Benzi et al. [40], on ordering for incomplete factorization, revealed that the use of RCM method was advantageous for non-symmetric problems.
Figure 11 depicts the CPU time versus the ratio using the ILU(k) preconditioner and the RCM reordering while Figure 12 shows the same but with the usage of the ILUT preconditioner and the RCM reordering. Again two different tolerances have been used. The same behavior as the one presented in Figure 9 and Figure 10 can be observed.
Figure 11.
CPU time versus variable water depth for GMRES (solid line) and BiCGStab (dashed line) using ILU(k) preconditioner and RCM reordering, for Equilateral type of grids (left) and for Orthogonal I types (right).
Figure 12.
CPU time versus variable water depth for GMRES (solid line) and BiCGStab (dashed line) using ILUT preconditioner with threshold (left) and (right) and RCM reordering for Equilateral type of grids (up) and Orthogonal I types (down).
Spatial Accuracy and Efficiency
We performed studies for the accuracy and efficiency of the FV numerical from [19] considering the propagation of a solitary wave over an undisturbed depth using distorted grids. The numerical test case consists of a solitary wave of amplitude m which propagates over a flat topography of depth m. The computational domain was [ m] × [ m]. To prove the validity of the current study we examined the total and per time- step CPU time need to advance the model in one time-step and additionally the total and per time-step CPU times for the solutions of the sparse linear system, using the BiCGStab method.
As shown in Figure 13 the CPU time grows like (linearly) while the time needed by the BiCGStab (as to solve the linear system) like , for the finer grids, where is the error of the free surface elevation, , measured in norm. However, and due to the increase of the number of time steps needed on finer grids, the total CPU time grows approximately like while the total time needed by the BiCGStab like and starts to dominate the overall time, as grids become refined.
Figure 13.
CPU times as a function of the free surface error measured in the norm.
To assess the effect that the increase of the number of grid points N has to the storage requirements of the non-zero elements () of the sparse linear system and to the computational efficiency, we present relevant comparison in Figure 14. As expected, the entries grow almost linearly with respect to N. The BiCGStab CPU time per time step scales like however, the total CPU time per time step is growing like , close to linear.
Figure 14.
Non-zero elements as a function of N (left) and CPU times as a function of N (right).
All computations were performed on a shared memory machine HP DL180G6 (HP, Palo Alto, CA, USA) consists of two 6-core Xeon X5660@2.8GHz type processor with 12 MB Level 3 cache memory. The total memory is 64 GB and the operating system is Oracle Linux version 6.1. The application is developed in double precision Fortran code using Oracle Solaris Studio compilers version 12.2.
6. Conclusions
In this work, we have attempted to highlight the importance of utilizing two well-known iterative methods along with preconditioning and reordering methods for the ill-conditioned sparce linear systems that arise from the numerical integration of the of the extended Boussinesq-type equations that model dispersive wave propagation. To this end, a numerical study is presented for solving these sparse linear systems that results from a finite volume discretization of the extended Boussinesq-type equations of Nwogu on unstructured triangular 2D meshes. The linear system arises due to the reformulation of the model’s equations in conservative-like form and its solution is essential to recover the actual velocity field in the flow. The resulting system’s coefficient matrices are structurally symmetric and sparse but in most cases ill-conditioned. The system’s numerical solution consumes much of the actual computation time needed for the numerical scheme to advance one discrete time step, especially as the meshes become finer. The effect of the grids resolution along with the physical value of the still water reference in the problem was investigated. Different iterative methods, precondition and reordering techniques were investigated as to to conclude to a an optimal and robust strategy for the system’s solution. For the resulting system, its coefficient matrix is constant in time. So it is constructed and stored, in a compressed sparse row CSR format, at a pre-processing stage at the beginning of each simulation. The present work concerns the solution process after the sparse matrix has been stored. Then it is reordered and the preconditioner of the reordered matrix is computed at this pre-processing stage and subsequently utilized to solve the linear system at each time step. The following major conclusions can be drawn from this work:
- BiCGSTAB and GMRES iterative methods give almost similar results for the resulting systems, with the BiCGSTAB to have been proven more robust in some cases and is the method of choice following from this work.
- The usage of preconditioning and/or reordering is mandatory as to achieve convergence for the different mesh types used.
- Using preconditioning and reordering we gained convergence for (all) systems in every water depth. Using only preconditioning we were able to solve efficiently systems that have a small condition number (usually derived from equilateral grids).
- Using a drop tolerance (for ILU(k) and ILUT preconditioners): CPU time using ILUT is less than that of using ILU(k) in average water depths. The usage of ILU(k) maybe more expensive in time but results on an overall the same CPU time in any water depth for the same grid resolution for convergence.
- As to correct the limitation of ILUT we decreased the drop tolerance and we observed that for larger water depths both iterative methods converge, but of course with an additional time cost. Like before the CPU time is independent on the relative water depth on each matrix.
- The Reverse Cuthill–McKee (RCM) ordering was proven more efficient compared to the Cuthill–McKee (CMK) ordering. This is found to greatly improve the efficiency of the ILUT preconditioner, since it constrains the factorized matrix to lie within a much narrower bandwidth and hence the incomplete factorization is generally more accurate for a prespecified amount of storage.
Author Contributions
Conceptualization, A.I.D. and M.K.; methodology, M.K. and M.G.; software M.K. and M.G.; validation, M.G., M.K. and A.I.D.; investigation, M.G., M.K. and A.I.D.; writing—original draft preparation, A.I.D. and M.K.; writing—review and editing, A.I.D. and M.K.; visualization, M.G.; supervision, A.I.D. and M.K. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Peregrine, D.H. Long waves on a beach. J. Fluid Mech. 1967, 27, 815–882. [Google Scholar] [CrossRef]
- Madsen, P.; Sørensen, O.R.; Schäffer, H.A. Surf zone dynamics simulated by a Boussinesq-type model: Part I. Model description and cross-shore motion of regular waves. Coast. Eng. 1997, 32, 255–288. [Google Scholar] [CrossRef]
- Nwogu, O. An alternative form of the Boussinesq equations for nearshore wave propagation. J. Waterw. Port Coastal Ocean Eng. 1994, 119, 618–638. [Google Scholar] [CrossRef]
- Madsen, P.A.; Sørensen, O.R. A new form of the Boussinesq equations with improved linear dispersion characteristics. Part 2: A slowing varying bathymetry. Coast. Eng. 1992, 18, 183–204. [Google Scholar] [CrossRef]
- Gobbi, M.F.; Kirby, J.T.; Wei, G. A fully non-linear Boussinesq model for surface waves. Part 2. Extension to O(kh4). J. Fluid Mech. 2000, 405, 181. [Google Scholar] [CrossRef]
- Lynett, P.; Liu, P.L.F. A two-layer approach to wave modeling. Proc. R. Soc. Lond. Ser. 2004, 460, 2637–2669. [Google Scholar] [CrossRef]
- Tissier, M.; Bonneton, P.; Marche, F.; Chazel, F.; Lannes, D. A new approach to handle wave breaking in fully non-linear Boussinesq models. Coast. Eng. 2012, 67, 54–66. [Google Scholar] [CrossRef]
- Wei, G.; Kirby, J.T. A time-dependent numerical code for extended Boussinesq equations. J. Waterw. Port Coastal Ocsean Eng. 1995, 120, 251–261. [Google Scholar] [CrossRef]
- Brocchini, M. A reasoned overview on Boussinesq-type models: The interplay between physics, mathematics and numerics. Proc. R. Soc. A 2013, 469. [Google Scholar] [CrossRef]
- Fuhrman, D.R.; Madsen, P.A. Simulation of nonlinear wave run-up with a high-order Boussinesq model. Coast. Eng. 2008, 55, 139–154. [Google Scholar] [CrossRef]
- Eskilsson, C.; Sherwin, S.J. Spectral/hp discontinuous Galerkin methods for modelling 2D Boussinesq equations. J. Comp. Phys. 2006, 210, 566. [Google Scholar] [CrossRef]
- Walkey, M.; Berzins, M. A finite element method for the two-dimensional extended Boussinesq equations. Int. J. Numer. Meth. Fluids 2002, 39, 865. [Google Scholar] [CrossRef]
- Ricchiuto, M.; Filippini, A.G. Upwind residual discretization of enhanced Boussinesq equations. J. Comp. Phys. 2014, 271, 306–341. [Google Scholar] [CrossRef]
- Erduran, K.S. Further application of hybrid solution to another form of Boussinesq equations and comparisons. Int. J. Numer. Methods Fluids 2007, 53, 827–849. [Google Scholar] [CrossRef]
- El Asmar, W.; Nwogu, O. Finite volume solution of Boussinesq-type equations on an unstructured grid. In Proceedings of the 30th International Conference on Coastal Engineering, San Diego, CA, USA, 3–8 September 2006; McKee Smith, J., Ed.; World Scientific: Singapore, 2006; pp. 73–85. [Google Scholar]
- Shi, F.; Kirby, J.T.; Harris, J.C.; Geiman, J.D.; Grilli, S.T. A high-order adaptive time-stepping TVD solver for Boussinesq modeling of breaking waves and coastal inundation. Ocean Model. 2012, 43–44, 36–51. [Google Scholar] [CrossRef]
- Roeber, V.; Cheung, K.F.; Kobayashi, M.H. Shock-capturing Boussinesq-type model for nearshore wave processes. Coast. Eng. 2010, 57, 407–423. [Google Scholar] [CrossRef]
- Tonelli, M.; Petti, M. Hybrid finite-volume finite-difference scheme for 2DH improved Boussinesq equations. Coast. Eng. 2009, 56, 609–620. [Google Scholar] [CrossRef]
- Kazolea, M.; Delis, A.I.; Nikolos, I.A.; Synolakis, C.E. An unstructured finite volume numerical scheme for extended 2D Boussinesq-type equations. Coast. Eng. 2012, 69, 42–66. [Google Scholar] [CrossRef]
- Kazolea, M.; Delis, A.I.; Synolakis, C.E. Numerical treatment of wave breaking on unstructured finite volume approximations for extended Boussinesq-type equations. J. Comp. Phys. 2014, 271, 281–305. [Google Scholar] [CrossRef]
- Zhang, S.; Zhu, L.; Li, J. Numerical Simulation of Wave Propagation, Breaking, and Setup on Steep Fringing Reefs. Water 2018, 10, 1147. [Google Scholar] [CrossRef]
- Liu, W.; Ning, Y.; Zhang, Y.; Zhang, J. Shock-Capturing Boussinesq Modelling of Broken Wave Characteristics Near a Vertical Seawall. Water 2018, 10, 1876. [Google Scholar] [CrossRef]
- Kazolea, M.; Delis, A. Irregular wave propagation with a 2DH Boussinesq-type model and an unstructured finite volume scheme. Eur. J. Mech. B/Fluids 2018, 72, 432–448. [Google Scholar] [CrossRef]
- Saad, Y. Iterative Methods for Sparse Linear Systems; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
- Benzi, M. Preconditioning Techniques for Large Linear Systems: A Survey. J. Comp. Phys. 2002, 182, 418–477. [Google Scholar] [CrossRef]
- Saad, Y. SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations, Version 2. Available online: https://people.sc.fsu.edu/~jburkardt/f77_src/sparsekit/sparsekit.html (accessed on 20 January 2021).
- Roe, P.L. Approximate Riemann Solvers, Parameter Vectors, and Difference Schemes. J. Comp. Phys. 1981, 43, 357–372. [Google Scholar] [CrossRef]
- van Leer, B. Towards the ultimate conservative difference scheme V. A second order sequel to Godunov’s method. J. Comp. Phys. 1979, 32, 101. [Google Scholar] [CrossRef]
- Barth, T.J. A 3-D Upwind Euler Solver for Unstructured Meshes; AIAA Paper 91-1548CP; AIAA: Reston, VA, USA, 1991. [Google Scholar]
- Delis, A.I.; Nikolos, I.K.; Kazolea, M. Performance and comparison of cell-centered and node-centered unstructured finite volume discretizations for shallow water free surface flows. Arch. Comput. Methods Eng. 2011, 18, 57–118. [Google Scholar] [CrossRef]
- Smith, T.M.; Barone, M.F.; Bond, R.B. Comparison of reconstruction techniques for unstructured mesh vertex centered finite volume schemes. In Proceedings of the 18th AIAA Computational Fluid Dynamics Conference, Miami, FL, USA, 25– 28 June 2007; pp. 1–22. [Google Scholar]
- Spiteri, R.J.; Ruuth, S.J. A new class of optimal high-order strong-stability-preserving time discretization methods. SIAM J. Numer. Anal. 2002, 40, 469. [Google Scholar] [CrossRef]
- Barth, T.J. Aspects of Unstructured grids and finite volume solvers for the Euler and NAvier-Stokes equations. In Special Course on Unstructured Grid Methods for Advection Dominated Flows; AGARD Report; NATO: Paris, France, 1992; Volume 787. [Google Scholar]
- Barth, T.J. Numerical Methods and Error Estimation for Conservation laws on Structured and Unstructured Meshes; VKI Computational Fluid Dynamics Lecture Series; VKI: Waterloosesteenweg, Belgium, 2003. [Google Scholar]
- Barth, T.J.; Ohlberger, M. Finite volume methods: Foundation and analysis. In Encyclopedia of Computational Mechanics; Stein, E., de Borst, R., Hudges, T., Eds.; John Wiley and Sons Ltd.: Hoboken, NJ, USA, 2004. [Google Scholar]
- Olshanskii, M.; Tyrtyshnikon, E. Iterative Methods for Linear Systems; SIAM: Philadelphia, PA, USA, 2015. [Google Scholar]
- Wathen, A. Preconditioning. Acta Numer. 2015, 24, 329–376. [Google Scholar] [CrossRef]
- Engsig-Karup, A.P. Unstructured Nodal DG-FEM Solution of High-Order Boussinesq-Type Equations. Ph.D. Thesis, Technical University of Denmark, Kongens Lyngby, Denmark, 2006. [Google Scholar]
- George, A.; Liu, J.W.H. Computer Solution of Large Sparce Positive Definite Systems; Prentice Hall: Englewood Cliffs, NJ, USA, 1981. [Google Scholar]
- Benzi, M.; Szyld, D.B.; Duin, A.V. Orderings for incomplete factorization preconditioning of nonsymmetric problems. SIAM J. Sci. Comput. 1999, 20, 1652–1670. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).