1. Introduction
In engineering practice, it is often the case that process equipment is designed according to various rules of thumb. No optimization is generally done and, at best, a single computational fluid dynamics (CFD) simulation is carried out to verify that the design meets the key requirements of the future operator of the apparatus. This means that suboptimal designs or solutions, potentially leading to operating problems, are not uncommon.
One of the ways to remedy the situation is to use simplified CFD models. In spite of them not being as accurate as the standard CFD models, it has been shown [
1] that they can provide useful quantitative information. What is more, these models feature significantly shorter computational times and their application in optimization algorithms is therefore much less cumbersome. To obtain solutions even faster, however, the numerical methods used to solve the underlying linear systems of equations can also be preconditioned. This means that instead of solving the original linear system
where
A is the coefficient matrix,
x the solution vector, and
b the right-hand side vector, one considers the system
here,
M denotes the preconditioning matrix such that
M−1A has a smaller condition number than
A and, therefore, linear system (2) features better convergence. It should also be noted that
M−1A often is not formed explicitly but, instead,
Mu =
v is solved for various auxiliary vectors
u and
v within the numerical solution method itself.
There are many different preconditioning techniques (i.e., ways to choose
M) available, and the selection of the best one for a particular purpose depends mainly on the type of equation that is being solved and the employed ordering of the variables. However, the most commonly used techniques are likely various flavors of incomplete lower–upper factorization (ILU) [
2] (these were numerically investigated by Chapman et al. [
3]) and symmetric successive overrelaxation (SSOR) [
4]. Although SSOR was originally intended for symmetric matrices, it was shown [
5] to also work when the matrices are not symmetric. Assuming the splitting
A =
D +
L +
U, where
D is the diagonal of
A and
L and
U its strictly lower and upper triangular parts, respectively, the SSOR preconditioning technique is applied within the numerical solution method as two SOR sweeps using different values of
ω. The iteration process for auxiliary vectors
u,
v (depend on the actual solution method used) can then be written as
where
ωF and
ωR denote the corresponding forward and backward relaxation factors. In other words, direct application of the inverse of the preconditioning matrix,
M−1, is replaced by preconditioned fixed point iteration.
The advantages of SSOR are evidenced by the existence of a multitude of papers discussing improved versions of this technique or its extensions to various specific applications. Bai [
6] studied SSOR-like preconditioners for non-Hermitian positive definite linear systems, for which the respective matrix was either Hermitian-dominant or skew–Hermitian-dominant. The paper also discussed the results of numerical implementations, showing that Krylov subspace iteration methods, when accelerated using SSOR-like preconditioners, are efficient solvers for classes of non-Hermitian positive definite linear systems. A “shifted” version of SSOR for non-Hermitian positive definite linear systems with a dominant Hermitian part was proposed by Tan [
7]. Zhang [
8], on the other hand, introduced an SSOR-like preconditioner for saddle point problems with a dominant skew-Hermitian part. A class of hybrid preconditioning methods for accelerated solution of saddle point problems was discussed by Wang [
9], while Chen et al. [
10] proposed a version of SSOR suitable for preconditioning of large dense complex linear systems arising from three-dimensional electromagnetic scattering. Wu and Li [
11] introduced a modified SSOR technique for the solution of Helmholtz equations.
Preconditioning can also be done block-wise. This was discussed, e.g., by Zhang and Cheng [
12] in terms of large sparse saddle point problems, and by Huang and Lu [
13], who focused on SSOR block preconditioners applied in image restoration. Because, in fact, preconditioning means obtaining an easily invertible approximation of the original matrix, one can also use SSOR for just this purpose as shown, for example, by Meng et al. [
14] in the context of fast recovery of density 3D data from gravity data. Similarly, a massively-parallel GPU implementation of the conjugate gradient method, which uses the approximate inverse matrix derived from SSOR as the preconditioning matrix, was proposed by Helfenstein and Koko [
15].
Performance comparisons of SSOR and other, simpler preconditioning techniques were presented, e.g., by Meyer [
16], who focused on genomic evaluation and by Sanjuan et al. [
17], who used SSOR to accelerate parallel wind field calculations. In the latter paper, the authors also evaluated a new, reordered sparse matrix storage format and showed that this format can markedly shorten computational time. This confirms the earlier findings of Duff and Meurant [
18], who investigated the effect of ordering on the convergence of the conjugate gradient method preconditioned, among others, using SSOR, or DeLong and Ortega [
19], who focused on parallel implementations of SOR in terms of natural and multicolor orderings. Chen et al. [
20], on the other hand, proposed a novel reordering technique for SSOR approximate inverse preconditioner, used together with a GPU-accelerated conjugate gradient solver, which should maximize the coalescing of global memory accesses.
Many SSOR-like, a priori preconditioned numerical solution methods using different splittings of the coefficient matrix have also been proposed for various types of problems. A three-parameter extension of the SSOR method intended for singular saddle point problems, which commonly arise, e.g., in fluid dynamics, was proposed by Li and Zhang [
21]. A differently accelerated generalized three-parameter method for both singular and non-singular problems was introduced by Pan [
22]. Similarly, a three-parameter unsymmetric SOR method for such saddle point problems was proposed, for example, by Liang and Zhang [
23], who also discussed the choice of optimal values of the parameters. Many different SSOR-like methods are available for augmented systems, as well. Wang and Huang [
24] introduced a four-parameter method, Louka and Missirlis [
25] introduced a five-parameter extrapolated form of SSOR, and Najafi and Edalatpanah [
26] introduced an improved version of the modified SSOR method for large sparse augmented systems, proposed earlier by Darvishi and Hessari [
27]. Another improved SSOR method intended for the solution of augmented systems was proposed by Salkuyeh et al. [
28]. In the case of complex systems, one can use, for instance, the accelerated method by Huang et al. [
29], that is, an accelerated version of the method by Edalatpour et al. [
30], in which the solution vector is split into two subvectors and different relaxation factors are used when solving for each of them. Likewise, one can employ the preconditioned variant of the generalized SSOR method by Hezari et al. [
31] or the method by Salkuyeh et al. [
32], which solves a real system obtained from the original, complex one. Block linear systems can be solved, e.g., using the block-preconditioned SSOR method by Pu and Wang [
33].
The majority of the papers mentioned above discuss convergence (or at least semi-convergence) of the proposed methods, and many also include some information on the optimal selection of the relaxation factors. Kushida [
34] focused on the estimation of convergence of the original SSOR preconditioner via a condition number, while general discussion related to SSOR-like methods for non-Hermitian positive definite linear systems was published in [
35]. Augmented systems were addressed, for example, by Wang and Huang [
36]. Similarly, there are papers focusing on convergence and optimal selection of the relaxation factors in the case of methods for block 2 × 2 linear systems [
37], saddle point problems [
38], parallel SSOR implementations [
39], the Poisson equation [
40], etc. In all these cases, however, convergence was investigated via spectral analysis, which is often prohibitively expensive [
41]. The present paper, focusing on fast estimation of suitable SSOR relaxation factors in engineering practice, therefore, investigates the convergence experimentally using several different simplified 3D CFD flow models. The suitability of specific combinations of relaxation factors is assessed on the basis of mean computational times needed to reach converged steady-state solutions. The best-performing combinations of relaxation factors are then given together with the obtained relaxation factor trends.
2. Materials and Methods
Three different flow systems, with both the “U” and the “Z” flow arrangements (“U”: outlet on the same side of the flow system as the inlet, “Z”: outlet on the opposite side), were used to generate test cases. Moreover, in two of these three flow systems, the mesh fineness was also varied (coarser and finer mesh). This yielded ten flow system configurations in total, with simplified, cuboid cell-only meshes of different sizes ranging from ~6000 cells to ~41,000 cells (see
Table 1). The meshes were generated automatically by the employed benchmarking software (see further) using the key parameters listed in
Table 1. Due to the cuboid nature of the meshes, cell sizes were, in all computational domains, governed primarily by how many cell faces comprised a tube cross section (coarser mesh: 1 face only, finer mesh: 2 × 2 faces) and by the utilized cell growth factor. Sample meshes are shown in
Figure 1.
As for boundary conditions, 0.5 kg/s of water at 300 K was fed into the inlet while pressure at the outlet was set to 101,325 Pa. All walls were adiabatic except for tube walls, where a specific heat flux of 15 kW/m
2 was set when the energy equation was enabled. Steady state simulations were carried out using the same CFD setup as in [
42], that is, the SIMPLEC pressure-velocity coupling [
43] was employed together with the Power Law discretization scheme [
44]. Standard scaled residual limits, i.e., 10
−3 for continuity and momentum and 10
−6 for energy, were used. Only the natural ordering of the variables was considered in this study.
The SSOR preconditioning technique was paired with two widely used numerical solution methods, which were shown by an earlier study [
42] to perform very well in simplified 3D CFD models. The conjugate gradient (CG) numerical solution method [
45] was employed in the pressure correction equation. The momentum and energy equations, on the other hand, were solved using the bi-conjugate gradient stabilized numerical method with the minimization of residuals over
L-dimensional subspaces (BiCGstab(
L)) [
46], with
L = 1, 2, or 3. Performance of the ILU preconditioner (which is efficient, but the construction of the respective preconditioning matrix may not always be possible) was taken as the baseline.
The SSOR-preconditioned numerical solution methods were tested with various tuples of ωF, ωR = 0.1, 0.2, …, 1.8, 1.9 in successive steps, depending on the obtained results. Promising combinations of ωF and ωR were then taken as pivots, and their square neighborhoods were tested further—that is, all combinations of ωF, ωR with the respective values being ω – 0.04, ω – 0.02, ω, ω + 0.02, and ω + 0.04 were evaluated except for the original pivot point (ωF, ωR). In order to be able to compare SSOR to the baseline (ILU), all the ten meshes were also evaluated using the ILU-preconditioned combinations of solution methods. In total, 50,620 CFD model setups were tested.
The same benchmarking simplified 3D CFD Java software application was used as in [
42], and, therefore, the reader is kindly referred to this paper for details (please note that the software is not publicly available). The benchmarking procedure itself was almost the same, as well, with the only difference being that the numbers of warm-up and test runs were smaller for the larger meshes (see
Table 2). Such a measure was necessary to keep the times required to complete the respective benchmarks within reasonable bounds. This did not introduce any problems, because with larger cell counts all Java initializations and compilations had been finished within much less warm-up runs, and, therefore, it was not needed to carry out many of them before the timing phase. Mean test-run computational time was then taken as the final performance metric. Unlike in [
42], however, only one machine (Intel Xeon E5 2698 v4 CPU, 128 GB RAM) was used instead of two largely disparate ones. The reason for this simplification was that, as shown in the respective paper, single-core computational times proved to be virtually identical, irrespective of whether the machine was a high-performance server or a regular laptop with an ultra-low voltage CPU. Please see the paragraph titled
Supplementary Materials on how to obtain the data set containing all the mean computational times together with other relevant information.
Because this study targeted fast computation, two kinds of limits were set in the solution process as detailed in
Table 2. The first one concerned the number of CFD solver iterations, while the second one applied to the actual computational time. Any combination of numerical solution methods and preconditioning techniques which exceeded at least one of these two limits was marked as failing to reach a solution. Additionally, since robustness is one of the factors that must be considered when evaluating the suitability of numerical solution methods, no user interventions (e.g., changes to the internal residual limits of the numerical methods, CFD relaxation factors, etc.) were allowed during a solution process.
4. Discussion
The aim of this study was to establish whether, in the case of simplified CFD models, the SSOR preconditioning technique can be a viable replacement for ILU. From the obtained data, it follows that SSOR should not be used in conjunction with CG to solve the pressure correction equation. When applied to the momentum and/or energy equations, computational times tend to be significantly longer even when the relaxation factors are chosen favorably (for the cases evaluated in this study, the increase was at least ~23% on average). However, because the SSOR preconditioning matrix can always be constructed, the respective techniques could be used in conjunction with ILU as a fallback option. From an engineering point of view, this would mean that ILU would be employed by default, and only in case of numerical issues would the CFD solver try to reach a converged solution using SSOR. Such an approach would capitalize on the efficiency of ILU while maintaining reasonable numerical robustness due to the possibility of falling back to a technique with guaranteed existence of the preconditioning matrix. The resulting models would, ultimately, be much more suitable for implementation in optimization algorithms or for other use cases where large batches of simulations must be carried out without user intervention.
The best-performing combinations of numerical solution methods and SSOR forward and backward relaxation factors differ according to whether energy transport is included in the model or not. In the flow-only scenario, the momentum equations should preferably be solved using BiCGstab(3), with ωF ≈ 0.9 and ωR slightly less than ωF, that is, both SSOR sweeps should be a little underrelaxed. Computational times obtained using other variants of BiCGstab(L) proved to be at least 80% longer. If also the energy transport is included and only the energy equation is preconditioned using SSOR, it is best to solve the momentum equations using BiCGstab(3) and the energy equation using BiCGstab(1). Here, both SSOR sweeps should be slightly overrelaxed, with ωF ≈ 1.2 and ωR ≈ 1.1. Similar performance can in some cases be obtained by employing BiCGstab(2) for both types of equations with the energy SSOR sweeps being overrelaxed using ωF ≈ ωR ≈ 1.1; however, a much greater solution failure probability can then be expected. If SSOR is utilized for both the momentum and the energy equations, then it is, again, preferable to use the combination of BiCGstab(3) and BiCGstab(1). The respective forward sweeps should be a little overrelaxed (ωF between ca. 1.1 and 1.3 for the momentum, and up to ca. 1.2 for the energy), while the backward sweeps should feature ωR slightly lower than ωF. The best-case computational times obtained with BiCGstab(2) proved to be up to ~67% longer and, therefore, the use of this numerical solution method is discouraged in this scenario.