An Efficient and Robust ILU(k) Preconditioner for Steady-State Neutron Diffusion Problem Based on MOOSE

: Jacobian-free Newton Krylov (JFNK) is an attractive method to solve nonlinear equations in the nuclear engineering community, and has been successfully applied to steady-state neutron diffusion k-eigenvalue problems and multi-physics coupling problems


Introduction
Due to the inherently featured multi-scale and multi-physics effects in nuclear reactor systems [1,2], a reactor simulation is a coupled problem including several phenomena.Nonlinear partial differential equations (PDEs) are usually used to describe this coupled system behavior.The steady-state neutron diffusion k-eigenvalue problem is the fundamental component of the multi-physics system.After the discretization of these nonlinear PDEs, it usually leads to a large-scale nonlinear algebraic equation problem [3].Compared with the traditional methods, such as the operator splitting method and Picard iteration method [4][5][6], the Jacobian-free Newton-Krylov (JFNK) algorithm [7] is a powerful solver for the nonlinear equations due to its high-order convergence rate.The JFNK algorithm has been widely used in the newly developed nuclear reactor simulator [8][9][10][11], such as the MOOSE platform [12] and LIME platform [13,14].In detail, several nuclear engineering programs have been made by Idaho National Laboratory based on the MOOSE platform, such as the reactor physics code MAMOTH [15], the advanced thermal-hydraulic code Pronghorn [16], and the two-phase flow code RELAP-7 [17].A multi-scale multi-physics where the items ϕ, D, Σ a , Σ s , Σ f , ν, k represent the neutron flux, diffusion coefficient, absorption cross-section, scattering cross-section, fission cross-section, average number of neutrons emitted per fission, and multiplication factor, respectively.The numerical subscript and g ′ represent the neutron energy group index.The first energy group is the fast group and the second one is the thermal group.
The 2D-LRA (Laboratorium für Reaktorregelung und Anlagensicherung) benchmark problem [22] is a well-known simplified neutron k-eigenvalue case.The reactor is square with a width of 330 cm, and a quarter has been modeled because of the symmetry as shown in Figure 1.Specifically, there are 5 materials in this reactor model, and the corresponding neutron cross-sections of these 5 materials are summarized in Table 1.In the central part is the reactor core (Region 1, 2, 3 and 4), which is surrounded by the reflector material (Region 5).
Energies 2024, 17, x FOR PEER REVIEW 4 of 16 developed the "NonlinearEigen" executioner to handle eigenvalue problems.The detailed implementation can be found in Ref [23].In MOOSE, the nonlinear PDE equation Equation ( 5) is solved by the preconditioned JFNK algorithm, as shown in Equation (6).
⃑ =  ⃑ −  ⃑  ⃑ = 0 (5) The JFNK method is a powerful nonlinear solver featured with two iteration layers, including the Newton iteration as the outer layer and the Krylov subspace iteration as the inner layer.The basic concepts of the JFNK method can be found in the literature [7].Like most iterative methods, the performance of the Krylov subspace method highly depends on the eigenvalue distribution of the system.The preconditioning process can improve the convergence behavior by clustering the eigenvalue distribution of the coefficient matrix.The preconditioning process is an equivalent transform of the original problem by multiplying it with the preconditioning matrix.
⃑ = −  ⃑ (6) where  ⃑ is the update step of the neutron flux vector in the Newton iteration, and  is the Jacobian matrix which can be represented as a block matrix form:  =   +  ,→ −  , −  , − ,→   (7) It should be noted that Equation ( 7) is only a non-zero element structural representation of the Jacobian matrix. and  are the discrete forms of the sum of diffusion operator and absorption term.The preconditioner  is an approximation of the Jacobian matrix , and can be expressed as:

Preconditioning Techniques in MOOSE
The original intention of the preconditioner is to improve the performance and reliability of Krylov subspace methods.Preconditioning attempts to improve the spectral Table 1.Two group constants for the 2D-LRA benchmark [22].

Region
Material Group g D g (cm) ∑ a,g cm −1 ν∑ f,g cm −1 ∑ s,1→2 cm −1 To simplify the following discussion, the k-eigenvalue problem (Equations ( 1) and ( 2)) is rewritten in matrix form: In Equation (3), ⇀ ϕ represents the two-group neutron flux after numerical discretization.The term M is the discrete form of the sum of diffusion operator, absorption term, and scattering term, while F represents the fission matrix and k is the eigenvalue of the system.Please note that, both neutron flux and eigenvalue k are unknowns here, as shown in Equation (3); therefore, the neutron diffusion k-eigenvalue problem is a nonlinear issue.This nonlinear equation system is solved using the JFNK method in this work.According to the Ref. [23], the constraint equation of eigenvalue k can be defined as: This work uses a technique to treat k as an intermediate variable, as shown in Equation ( 4), so that Equation ( 3) is eliminated in partial differential equations (PDEs).The nonlinear PDE equation is derived from Equation (3) and Equation ( 4), as shown in Equation ( 5).This treatment is inherited from the nonlinear elimination method, which is used to eliminate certain nonlinear variables to overcome the ill-posed issue in the original nonlinear equation system [24].MOOSE has integrated this special numerical technique and developed the "NonlinearEigen" executioner to handle eigenvalue problems.The detailed implementation can be found in Ref. [23].In MOOSE, the nonlinear PDE equation Equation ( 5) is solved by the preconditioned JFNK algorithm, as shown in Equation (6).
The JFNK method is a powerful nonlinear solver featured with two iteration layers, including the Newton iteration as the outer layer and the Krylov subspace iteration as the inner layer.The basic concepts of the JFNK method can be found in the literature [7].Like most iterative methods, the performance of the Krylov subspace method highly depends on the eigenvalue distribution of the system.The preconditioning process can improve the convergence behavior by clustering the eigenvalue distribution of the coefficient matrix.The preconditioning process is an equivalent transform of the original problem by multiplying it with the preconditioning matrix.
where δ ⇀ ϕ is the update step of the neutron flux vector in the Newton iteration, and J is the Jacobian matrix which can be represented as a block matrix form: It should be noted that Equation ( 7) is only a non-zero element structural representation of the Jacobian matrix.A 1 and A 2 are the discrete forms of the sum of diffusion operator and absorption term.The preconditioner P is an approximation of the Jacobian matrix J, and can be expressed as:

Preconditioning Techniques in MOOSE
The original intention of the preconditioner is to improve the performance and reliability of Krylov subspace methods.Preconditioning attempts to improve the spectral properties of the coefficient matrix.It can cluster the spectrum which results in rapid convergence, particularly when the preconditioned matrix is close to normal.Here are some basic preconditioning concepts.
If P is a nonsingular matrix that approximates A (in some sense), the equation can be written as follows: Energies 2024, 17, 1499 5 of 16 Equation ( 9) is the left preconditioning and has the same solution as original equations, but it is easier to solve.The right preconditioning can be performed as: It is not necessary to compute P −1 A or AP −1 explicitly during Krylov subspace iterations.Instead, matrix-vector products and linear systems of the form P ⇀ z = ⇀ r are performed, which is utilized in the JFNK method.As for residual minimizing methods, like GMRES [25], the right preconditioning [26] is often used.In addition, the residuals for the right-preconditioned system are identical to the true residuals The preconditioning process can be divided into the matrix construction phase and matrix factorization phase, as depicted in Figure 2. The MOOSE platform provides a variety of preconditioning matrix construction methods, such as the physics-based preconditioner (PBP), single matrix preconditioner (SMP), and field split preconditioner (FSP).They can be used to describe the multi-physics coupling problems, and can also be used to depict the scattering and fission effects between neutron energy groups.These preconditioners need to provide analytical expressions to construct the coefficient matrix, that is, the coefficient matrix is constructed by grid material composition and cross-section.Especially for practical problems with complex physical properties, it is difficult to derive the analytical expressions of the coefficient matrix elements.The finite difference preconditioner (FDP) can construct a matrix element automatically for a general preconditioner.However, it is extremely slow because of the external computational cost in the implementation of FDP.The coloring technique can significantly reduce the number of nonlinear function evaluations.This method can improve the computational efficiency of FDP based on the topological relationships of the coupling terms, and the details will be discussed in Section 3.1.
convergence, particularly when the preconditioned matrix is close to normal.Here are some basic preconditioning concepts.
If  is a nonsingular matrix that approximates  (in some sense), the equation can be written as follows: Equation ( 9) is the left preconditioning and has the same solution as original equations, but it is easier to solve.The right preconditioning can be performed as: It is not necessary to compute   or  explicitly during Krylov subspace iterations.Instead, matrix-vector products and linear systems of the form  ⃑ =  ⃑ are performed, which is utilized in the JFNK method.As for residual minimizing methods, like GMRES [25], the right preconditioning [26] is often used.In addition, the residuals for the right-preconditioned system are identical to the true residuals  ⃑ =  ⃑ −  ⃑ .
The preconditioning process can be divided into the matrix construction phase and matrix factorization phase, as depicted in Figure 2. The MOOSE platform provides a variety of preconditioning matrix construction methods, such as the physics-based preconditioner (PBP), single matrix preconditioner (SMP), and field split preconditioner (FSP).They can be used to describe the multi-physics coupling problems, and can also be used to depict the scattering and fission effects between neutron energy groups.These preconditioners need to provide analytical expressions to construct the coefficient matrix, that is, the coefficient matrix is constructed by grid material composition and cross-section.Especially for practical problems with complex physical properties, it is difficult to derive the analytical expressions of the coefficient matrix elements.The finite difference preconditioner (FDP) can construct a matrix element automatically for a general preconditioner.However, it is extremely slow because of the external computational cost in the implementation of FDP.The coloring technique can significantly reduce the number of nonlinear function evaluations.This method can improve the computational efficiency of FDP based on the topological relationships of the coupling terms, and the details will be discussed in Section 3.1.As depicted in Figure 2, the matrix factorization method is required after building the preconditioning matrix.In MOOSE, the main factorization methods for the preconditioning matrix include the incomplete factorization method [27] and iterative method [3].Each of these methods has its advantages and applicability.The incomplete LU As depicted in Figure 2, the matrix factorization method is required after building the preconditioning matrix.In MOOSE, the main factorization methods for the preconditioning matrix include the incomplete factorization method [27] and iterative method [3].Each of these methods has its advantages and applicability.The incomplete LU factorization method (ILU) is often used in preconditioning for the non-symmetrical matrix issue, as shown in Equation (7).Particular attention needs to be paid to the location of the fill-in elements during the factorization process, which is discussed in Section 3.2.In addition, the ILU factorization can be optimized according to the sparsity of the matrix; this can be referred to in the Section 3.3 reordering method.

Numerical Techniques in Preconditioning
This section mainly discusses the numerical algorithms used in preconditioning.The coloring algorithm can significantly reduce the huge computational cost in FDP, which is discussed in Section 3.1.The computational cost and its robustness of preconditioning factorization phase are also analyzed.ILU with the reordering algorithm could enhance the computational behavior and its robustness, as provided in Sections 3.2 and 3.3.

Coloring
The preconditioning matrix is a partial Jacobian matrix, which can be calculated automatically by the finite difference of the nonlinear function.In this work, the coloring algorithms [28] are utilized for the preconditioning matrix to reduce the number of nonlinear function evaluations and enhance the computational performance.Here the principle of coloring algorithms is introduced.
The nonlinear problem can be described by the residual function f : R n −→ R n , such as Equation ( 5), which describes the steady-state neutron diffusion problem.The number of elements in solution vectors is n = 50 for illustration, so the dimension of the preconditioning matrix P is n × n (50 × 50).The non-zero pattern of the preconditioning matrix is shown in Figure 3a, whose elements could be automatically calculated by the finite difference column by column.For the kth column of the preconditioning matrix P, it could be calculated by [F(x + ϵe k ) − F(x)]/ϵ, where e k is a unit vector with 1 in the kth row and 0 in all other rows and ϵ is a small step size.Please note that, there is an additional function evaluation, F(x + ϵe k ), when calculating the kth column of the preconditioning matrix.Hence, if sparsity pattern is not considered, it requires n additional function evaluations in total for a preconditioning matrix with n columns.
factorization method (ILU) is often used in preconditioning for the non-symmetrical matrix issue, as shown in Equation ( 7).Particular attention needs to be paid to the location of the fill-in elements during the factorization process, which is discussed in Section 3.2.In addition, the ILU factorization can be optimized according to the sparsity of the matrix; this can be referred to in the Section 3.3 reordering method.

Numerical Techniques in Preconditioning
This section mainly discusses the numerical algorithms used in preconditioning.The coloring algorithm can significantly reduce the huge computational cost in FDP, which is discussed in Section 3.1.The computational cost and its robustness of preconditioning factorization phase are also analyzed.ILU with the reordering algorithm could enhance the computational behavior and its robustness, as provided in Sections 3.2 and 3.3.

Coloring
The preconditioning matrix is a partial Jacobian matrix, which can be calculated automatically by the finite difference of the nonlinear function.In this work, the coloring algorithms [28] are utilized for the preconditioning matrix to reduce the number of nonlinear function evaluations and enhance the computational performance.Here the principle of coloring algorithms is introduced.
The nonlinear problem can be described by the residual function :  ⟶  , such as Equation ( 5), which describes the steady-state neutron diffusion problem.The number of elements in solution vectors is  = 50 for illustration, so the dimension of the preconditioning matrix  is  ×  (50 × 50).The non-zero pattern of the preconditioning matrix is shown in Figure 3a, whose elements could be automatically calculated by the finite difference column by column.For the kth column of the preconditioning matrix  , it could be calculated by [( +  ) − ()]/, where  is a unit vector with 1 in the kth row and 0 in all other rows and  is a small step size.Please note that, there is an additional function evaluation, ( +  ), when calculating the kth column of the preconditioning matrix.Hence, if sparsity pattern is not considered, it requires  additional function evaluations in total for a preconditioning matrix with  columns.By considering the sparse pattern, the columns of the preconditioning matrix could be divided into several groups, where any two columns in the same group are both not non-zeros in a common row.It means the columns in the same group are structurally orthogonal [29].For example, the first column and the sixth column are structurally orthogonal in Figure 3a, therefore, the first and sixth columns are both in the same color (green).Now define an additional vector  to represent a structurally orthogonal vector By considering the sparse pattern, the columns of the preconditioning matrix could be divided into several groups, where any two columns in the same group are both not non-zeros in a common row.It means the columns in the same group are structurally orthogonal [29].For example, the first column and the sixth column are structurally orthogonal in Figure 3a, therefore, the first and sixth columns are both in the same color (green).Now define an additional vector d to represent a structurally orthogonal vector group.For the vector columns in the same structurally orthogonal group (the same color), the corresponding position in d is represented by 1; otherwise, the other elements of d are 0. In this case, for the green group, it is d = [1, 0, 0, 0, 0, 1, 0, . . . , 0].The elements in structurally orthogonal columns can be easily acquired by only one additional function evaluation, F(x + ϵd), along the vector d.In this way, by partitioning the columns of P into the fewest groups, the required number of additional function evaluations is minimized.Figure 3b shows the compressed representation of the preconditioning matrix.By reasonably di-viding structurally orthogonal columns, the compressed matrix has only eight columns.This means that only an additional eight function evaluations, F(x + ϵd), are required to complete the construction of the preconditioning matrix.
The graph theory is used in this work to partition the columns of P into the fewest groups.Here are some basic graph theory definitions about the matrix and its coloring algorithms.A graph G is an ordered pair (V, E), containing vertex set V{V : v 1 , v 2 , v 3 , . . . . . . ,v n } and edge set E. The degree of a vertex v i in a graph G is the number of edges having v i as an endpoint, and can be represented by deg(v i ).The matrix could be equivalently represented by a graph, where the columns in the matrix are the vertices in the graph.If there is an edge link between two vertices, it means that these two corresponding columns in the matrix are non-orthogonal.The graph coloring issue is to find the coloring partition where any adjacent vertices have different colors.Therefore, the matrix coloring problem is equivalent to the minimum graph coloring issue.The vertex will be gradually colored according to the order, and assign the smallest color.This method is also called the greedy coloring algorithm [30], outlined in Algorithm 1.

Algorithm 1 Sequential (greedy) coloring algorithm
Formulate a vertex ordering {v 1 , v 2 , v 3 , . . . . . . ,v n } Assign v 1 as color 1 For i = 2 to n do Assign v i the smallest color not used by any of its neighbors End for End procedure The coloring produced by the sequential algorithm is dependent on the ordering of columns [31].Taking a matrix in Figure 4a as an example, there are seven columns where gray squares are the non-zero elements, and its corresponding graph is shown as Figure 4b.The vertices {v 1 , v 2 , v 3 , v 4 , v 5 , v 6 , v 7 } in Figure 4b denote the columns in the matrix in Figure 4a.The degrees of these vertices are {1, 2, 5, 1, 3, 2, 2}, respectively.As shown in Figure 4b, the index of vertex is represented in red numbers inside the circle, and dev(v i ) is represented in black numbers above the circle.equivalently expressed by the graph, where the ordered vertices need to be colored one by one, as presented in Figure 5c.In this case, the matrix columns could be divided into three groups (colors) since there are three columns in the compressed matrix.Similar operations could be performed with the SL algorithm.In this case, both LF and SL coloring algorithms have the same groups.In addition to these two methods, this work also evaluates the performance of the incidence degree (ID) coloring algorithm, and its specific principles can be referred to in the literature [34][35][36][37].
(a) (b) (a) According to the truncated-max degree bound theorem in graph theory [32], it is evident from the sequential coloring procedure that coloring the columns of large degree first gives the upper bound of the coloring.The determination of a sequential color corresponding to such an ordering is termed the largest-first algorithm (LF).The maxsubgraph min-degree bound is always sharper than the truncated-max degree bound in graph theory [33].Inspired by this theorem, the small-last algorithm (SL) ranks the column of the smallest degree last and continues to search for such column in the remaining columns.The process of obtaining the ordering by LF algorithms can be referred to in Figure 5.In detail, from the large vertex degree to small vertex degree, the vertices could be rearranged as {3, 5, 6, 7, 2, 4, 1}.For the LF algorithms, from the perspective of matrix form, columns will be removed from the original matrix column by column based on LF ordering, as shown in Figure 5a, to generate a compressed matrix, as shown in Figure 5b.The compressed matrix could be colored based on Algorithm 1, as shown in Figure 5b.This process can be equivalently expressed by the graph, where the ordered vertices need to be colored one by one, as presented in Figure 5c.In this case, the matrix columns could be divided into three groups (colors) since there are three columns in the compressed matrix.Similar operations could be performed with the SL algorithm.In this case, both LF and SL coloring algorithms have the same groups.In addition to these two methods, this work also evaluates the performance of the incidence degree (ID) coloring algorithm, and its specific principles can be referred to in the literature [33][34][35][36].

Incomplete LU Factorization Method
The Jacobian matrix of the neutron diffusion k-eigenvalue problem is a non-symmetrical matrix, as shown in Equation (7), and the ILU factorization, rather than the incomplete Cholesky factorization is utilized in this work.In the preconditioning, the computational performance highly depends on the process of matrix inversion, which is mainly realized by matrix factorization.Even though the matrix is sparse, extra fill-in non-zero elements usually take place after factorization.This means the triangular factor L and U are considerably less sparse than the original one.A new form of preconditioner, P = LU, is obtained by discarding part (or all) of the fill-in non-zero elements during the factorization process [37].This factorization can form a powerful preconditioner, also known as incomplete LU factorization.
To illustrate this method, we formally define a subset of matrix element locations S, in which the main diagonal and all (i, j) that {a ij ⊂ P | a i,j ̸ = 0} are usually included.In addition, S also contains other fill-ins, which are allowed to be non-zeros during the factorization process.Consequently, an incomplete factorization step can be described as: where k is recursive for k < i, j.If S is the same as the non-zero positions in P, the no-fill ILU factorization, or ILU(0), is obtained.Subset S governs the dropping of fill-in in the incomplete factors and becomes the criteria in different ILU variants [38].However, no-fill ILU factorization can only provide a relatively low-quality preconditioner.In order to obtain better preconditioning quality, more fill-ins need to be considered in the incomplete factorization process.
A hierarchical ILU preconditioner based on the "level of fill-in" concept has been proposed [39].The method defines a rule that governs the dropping of fill-in in the incomplete factors.The definition of "level of fill-in" is as follows, and the initial level of fill of a matrix entry a i,j is: After an ILU process, the level of fill must be updated: Let k be a nonnegative integer.In an ILU(k) preconditioner, all fill-ins whose level is greater than k are dropped.The user-defined parameter k is a key issue for ILU(k).On one hand, with the increase in fill-in level k, the preconditioning quality is better and the Krylov iteration number is decreased.On the other hand, with the increase in k, the fill-ins and computational cost per iteration rise rapidly for the natural ordering ILU(k).Therefore, the computational performance of the original natural ordering ILU(k) preconditioner is sensitive to the choice of fill-in level k, and it is not an easy task for natural ordering ILU(k) to choose the suitable fill-in level k.

Reordering
In many practical problems, the preconditioning matrix is a highly sparse matrix, such as the neutron eigenvalue problem.Exploring mathematical algorithms based on sparsity to decrease the additional fill-ins of ILU(k) is essential.The reordering algorithms are chosen so that pivoting down the diagonal in order of the resulting permuted preconditioning matrix RPR T = LU produces much less fill-in.In addition, the order and permutation matrix R can save the cost when calculating the factors in LU [40].Therefore, compared with the natural ordering ILU(k), the computational performance of a reordering-based ILU(k) preconditioner is not sensitive to the fill-in level k.
The principle of the permutation is briefly described here.Suppose an arbitrary n × n sparse matrix P = a i,j , and determine a permutation matrix R such that RPR T has a small bandwidth and a small profile.The bandwidth of P is defined by the maximum of the set |i − j| : a i,j ̸ = 0 .To acquire the profile of P, set f i = min j : a i,j ̸ = 0 with all a i,i ̸ = 0 and let d i = i − f i .The profile is defined by ∑ N 1 d i .The permutation matrix can reduce the storage and computational cost when solving linear equations.One of the main Energies 2024, 17, 1499 10 of 16 objectives is to cluster non-zeros as much as possible in the main diagonal.The generation of permutation matrices can be attributed to the reordering method, which focuses mainly on the bandwidth and the profile of matrices.Moreover, the current mainstream reordering rule is based on graph theory.
According to the optimization objectives, reordering methods can be divided into two categories: reduced fill-in elements algorithm, and reduced bandwidths and profiles algorithm.The algorithms in the first category include the quotient minimum degree (QMD), the one-way dissection (1WD), and the nested dissection (ND) method [41].While the algorithm belonging to the second category is mainly the reverse Cuthill-McKee (RCM) method [21].To illustrate the differences between several reordering methods, the steady state neutron diffusion model in Section 3.1 is taken as an example to show the ILU factorization effect after different reordering algorithms.Because the matrix dimension is small, the ILU factorization with fill-in level 20 is used to show the different reordering algorithm performances.The structures of the LU matrix after ILU(20) factorization by different reordering algorithms are shown in Figure 6.The general conclusion was that the reverse Cuthill-McKee (RCM) algorithms usually produced the smallest bandwidths.The QMD and ND methods do not guarantee the optimal bandwidth, but reduce the additional fill-in elements and computational complexity during the matrix factorization.In addition, the ND algorithm usually guarantees the minimum number of additional fill-in elements.The effect of reordering depends on the specific sparse structure of the matrix, and there is no optimal algorithm at present.
The principle of the permutation is briefly described here.Suppose an arbitrary  ×  sparse matrix  = { , }, and determine a permutation matrix  such that  has a small bandwidth and a small profile.The bandwidth of  is defined by the maximum of the set {| − |:  , ≠ 0} .To acquire the profile of  , set  = min {:  , ≠ 0} with all  , ≠ 0 and let  =  −  .The profile is defined by ∑  .The permutation matrix can reduce the storage and computational cost when solving linear equations.One of the main objectives is to cluster non-zeros as much as possible in the main diagonal.The generation of permutation matrices can be attributed to the reordering method, which focuses mainly on the bandwidth and the profile of matrices.Moreover, the current mainstream reordering rule is based on graph theory.
According to the optimization objectives, reordering methods can be divided into two categories: reduced fill-in elements algorithm, and reduced bandwidths and profiles algorithm.The algorithms in the first category include the quotient minimum degree (QMD), the one-way dissection (1WD), and the nested dissection (ND) method [42].While the algorithm belonging to the second category is mainly the reverse Cuthill-McKee (RCM) method [21].To illustrate the differences between several reordering methods, the steady state neutron diffusion model in Section 3.1 is taken as an example to show the ILU factorization effect after different reordering algorithms.Because the matrix dimension is small, the ILU factorization with fill-in level 20 is used to show the different reordering algorithm performances.The structures of the   matrix after ILU (20) factorization by different reordering algorithms are shown in Figure 6.The general conclusion was that the reverse Cuthill-McKee (RCM) algorithms usually produced the smallest bandwidths.The QMD and ND methods do not guarantee the optimal bandwidth, but reduce the additional fill-in elements and computational complexity during the matrix factorization.In addition, the ND algorithm usually guarantees the minimum number of additional fill-in elements.The effect of reordering depends on the specific sparse structure of the matrix, and there is no optimal algorithm at present.

Results and Discussions
The 2D-LRA benchmark is utilized to evaluate the performance of proposed preconditioners.The mesh of the benchmark is generated by the mesh generation software, using a triangle adaptive mesh leading to 7696 meshes.The total degree of freedoms (DOFs) is 15,392, including two groups of neutron flux variables.The MOOSE is used in this work and all programs are executed serially.As mentioned before, we mainly focus on the construction and factorization processes to pursue an efficiency and robustness scheme.

Results and Discussions
The 2D-LRA benchmark is utilized to evaluate the performance of proposed preconditioners.The mesh of the benchmark is generated by the mesh generation software, using a triangle adaptive mesh leading to 7696 meshes.The total degree of freedoms (DOFs) is 15,392, including two groups of neutron flux variables.The MOOSE is used in this work and all programs are executed serially.As mentioned before, we mainly focus on the construction and factorization processes to pursue an efficiency and robustness scheme.

Preconditioning Matrix Construction Techniques
In order to meet the requirement of automatically building the preconditioning matrix, FDP is used here.In addition, FDP is a powerful preconditioner, especially when the preconditioning matrix elements of the problem cannot be explicitly given.However, the direct use of FDP will bring a huge computational cost, which will seriously increase the whole solution time.The coloring method can utilize the sparse structure of the preconditioning matrix and partition the columns into the fewest groups of structurally orthogonal columns.Therefore, it can significantly reduce the evaluation times in FDP, and reduce the cost of constructing the preconditioning matrix.As shown in Table 2, the computational efficiency of the FDP with coloring could be about 60 times higher than that of the preconditioner-without-coloring algorithm.x is defined by Equation ( 5).Three coloring algorithms exhibited only a slight variation in these evaluation parameters.In addition, the number of colors needed by the SL algorithm tended to be slightly less than the LF and ID algorithm.The results also show that the SL algorithm's min-degree bound is stricter than that of the LF algorithm, which can provide fewer colors.It is necessary to illustrate the enormous preconditioning construction cost when not using the coloring algorithm.When the Jacobian matrix is used as a preconditioner's coefficient matrix, the number of residual function calls is equal to that of unknowns, which will bring a huge computational cost, especially for a large-scale problem.By comparing the residual history in Figure 7, it can be inferred that the preconditioning matrices constructed by the three algorithms are identical.The "normalized nonlinear residual" here represents the initial residual norm (L2 norm) as one, and the dotted line in Figure 7 also shows the convergence criterion of the algorithm.

Preconditioning Matrix Construction Techniques
In order to meet the requirement of automatically building the preconditioning matrix, FDP is used here.In addition, FDP is a powerful preconditioner, especially when the preconditioning matrix elements of the problem cannot be explicitly given.However, the direct use of FDP will bring a huge computational cost, which will seriously increase the whole solution time.The coloring method can utilize the sparse structure of the preconditioning matrix and partition the columns into the fewest groups of structurally orthogonal columns.Therefore, it can significantly reduce the evaluation times in FDP, and reduce the cost of constructing the preconditioning matrix.As shown in Table 2, the computational efficiency of the FDP with coloring could be about 60 times higher than that of the preconditioner-without-coloring algorithm.For each coloring algorithm we cite five statistics, including total computational time, number of residual evaluations, number of colorings, nonlinear steps, and total linear steps.The convergence criterion is set as ‖( ⃗)‖ <  ‖( ⃗)‖, where  = 10 .The nonlinear residual ( ⃗) is defined by Equation ( 5).Three coloring algorithms exhibited only a slight variation in these evaluation parameters.In addition, the number of colors needed by the SL algorithm tended to be slightly less than the LF and ID algorithm.The results also show that the SL algorithm's min-degree bound is stricter than that of the LF algorithm, which can provide fewer colors.It is necessary to illustrate the enormous preconditioning construction cost when not using the coloring algorithm.When the Jacobian matrix is used as a preconditioner's coefficient matrix, the number of residual function calls is equal to that of unknowns, which will bring a huge computational cost, especially for a large-scale problem.By comparing the residual history in Figure 7, it can be inferred that the preconditioning matrices constructed by the three algorithms are identical.The "normalized nonlinear residual" here represents the initial residual norm (L2 norm) as one, and the dotted line in Figure 7 also shows the convergence criterion of the algorithm.

Preconditioning Matrix Factorization Techniques
After constructing the preconditioning matrix, it is also necessary to factorize it and ensure each iteration is affordable.The ILU(k) algorithm has been widely used as a generalpurpose technique.It should be noted that the numerical examples in this subsection are carried out using FDP with the SL coloring algorithm.All the comparisons here are to expose the performance of the factorization techniques.
The total computational time and ILU factorization time with natural ordering for LRA problems under different fill-in levels are shown in Figure 8. Figure 9 shows total linear steps and non-zero elements under different fill-in levels.With an increase in the fill-in level, the number of linear steps required for calculation will decrease due to the faster convergence rate.Thus, the number of linear iteration steps is reduced.At the same time, as the fill-in level increases, the number of non-zero elements in the factorized matrix also increases, which means more computational complexity.This ultimately leads to a rise in the factorization time of the ILU(k) method, as shown in Figure 8.The factorization time of ILU (20) with original ordering is seven times that of ILU(0), indicating that the computational performance of the original natural ordering ILU(k) preconditioner is sensitive to the choice of fill-in level k.The ILU factorization process dominates the entire computational cost at high fill-in levels.Please note that, in practice, the user-defined fill-in level k is usually problem-dependent, which is not easy to determine.

Preconditioning Matrix Factorization Techniques
After constructing the preconditioning matrix, it is also necessary to factorize it an sure each iteration is affordable.The ILU(k) algorithm has been widely used as a genera pose technique.It should be noted that the numerical examples in this subsection are ca out using FDP with the SL coloring algorithm.All the comparisons here are to expos performance of the factorization techniques.
The total computational time and ILU factorization time with natural ordering for problems under different fill-in levels are shown in Figure 8. Figure 9 shows total linear and non-zero elements under different fill-in levels.With an increase in the fill-in leve number of linear steps required for calculation will decrease due to the faster conver rate.Thus, the number of linear iteration steps is reduced.At the same time, as the fill-in increases, the number of non-zero elements in the factorized matrix also increases, w means more computational complexity.This ultimately leads to a rise in the factorization of the ILU(k) method, as shown in Figure 8.The factorization time of ILU (20) with or ordering is seven times that of ILU(0), indicating that the computational performance original natural ordering ILU(k) preconditioner is sensitive to the choice of fill-in level k ILU factorization process dominates the entire computational cost at high fill-in levels.P note that, in practice, the user-defined fill-in level k is usually problem-dependent, wh not easy to determine.

Preconditioning Matrix Factorization Techniques
After constructing the preconditioning matrix, it is also necessary to factorize it an sure each iteration is affordable.The ILU(k) algorithm has been widely used as a genera pose technique.It should be noted that the numerical examples in this subsection are c out using FDP with the SL coloring algorithm.All the comparisons here are to expo performance of the factorization techniques.
The total computational time and ILU factorization time with natural ordering fo problems under different fill-in levels are shown in Figure 8. Figure 9 shows total linear and non-zero elements under different fill-in levels.With an increase in the fill-in lev number of linear steps required for calculation will decrease due to the faster conver rate.Thus, the number of linear iteration steps is reduced.At the same time, as the fill-in increases, the number of non-zero elements in the factorized matrix also increases, means more computational complexity.This ultimately leads to a rise in the factorizatio of the ILU(k) method, as shown in Figure 8.The factorization time of ILU (20) with or ordering is seven times that of ILU(0), indicating that the computational performance original natural ordering ILU(k) preconditioner is sensitive to the choice of fill-in level ILU factorization process dominates the entire computational cost at high fill-in levels.note that, in practice, the user-defined fill-in level k is usually problem-dependent, wh not easy to determine.In order to find a robust preconditioner, the key point is to make the preconditioner not sensitive to the user-defined fill-in level k.Therefore, it should reduce the computational cost of ILU(k) at the high fill-in levels.Here, the reordering algorithms are used for ILU(k) factorization, and the fill-in levels of k = 10 are considered in this work.Compared with the ILU (10) under the natural ordering, the number of non-zeros after factorizations under reordering algorithms is reduced, as shown in Table 3.Therefore, compared with the natural ordering, the total computational time of ILU (10) under reordering algorithms is relatively robust.In order to further analyze the performance of the reordering-based ILU(k) preconditioner, a steady-state neutron diffusion k-eigenvalue problem with thermal-hydraulic feedback is also utilized as a supplement.It is a simplified 2D-PWR (Pressurized Water Reactor) reactor model, which includes the steady-state neutron diffusion k-eigenvalue equation as well as three other physical fields to consider the thermal-hydraulic feedback effect: coolant temperature, pressure, and velocity.As with most reactor systems, the neutronics and thermal hydraulics are tightly two-way coupled.The governing equations, coefficients, dimensions, and boundary conditions can be found in the reference paper [42,43].As a simplified model, a 40 × 40 cylindrical grid is used here.Here, the convergence criterion consistent with the previous example is used, and the SL coloring algorithm is used to generate the coefficient matrix automatically.
Figure 10 shows the results of the total computational time and the number of non-zero elements after factorization under different fill-in levels and reordering algorithms.The results of fill-in levels from 0 to 12 are provided, and the higher fill-in levels are redundant in practice for this multi-physics coupling problem.With the increase in fill-in level k, the factorization time of ILU(k) with natural ordering grows up significantly, as shown in Figure 10a, indicating it is sensitive to the choice of fill-in level k.Compared with the ILU factorization under natural ordering, the total computational time of ILU with reordering algorithms is not sensitive to the fill-in levels.Additionally, the computational performances are close to the optimal computational cost with natural ordering, as shown in Figure 10.
In order to find a robust preconditioner, the key point is to make the precondition sensitive to the user-defined fill-in level .Therefore, it should reduce the computation of ILU(k) at the high fill-in levels.Here, the reordering algorithms are used for ILU(k) fa zation, and the fill-in levels of  = 10 are considered in this work.Compared wi ILU (10) under the natural ordering, the number of non-zeros after factorizations unde dering algorithms is reduced, as shown in Table 3.Therefore, compared with the natu dering, the total computational time of ILU (10) under reordering algorithms is relative bust.In order to further analyze the performance of the reordering-based ILU(k) pre tioner, a steady-state neutron diffusion k-eigenvalue problem with thermal-hydraulic back is also utilized as a supplement.It is a simplified 2D-PWR (Pressurized Water Re reactor model, which includes the steady-state neutron diffusion k-eigenvalue equat well as three other physical fields to consider the thermal-hydraulic feedback effect: c temperature, pressure, and velocity.As with most reactor systems, the neutronics and th hydraulics are tightly two-way coupled.The governing equations, coefficients, dimen and boundary conditions can be found in the reference paper [43,44].As a simplified m a 40 × 40 cylindrical grid is used here.Here, the convergence criterion consistent wi previous example is used, and the SL coloring algorithm is used to generate the coef matrix automatically. Figure 10 shows the results of the total computational time and the number of non elements after factorization under different fill-in levels and reordering algorithms.The r of fill-in levels from 0 to 12 are provided, and the higher fill-in levels are redundant in pr for this multi-physics coupling problem.With the increase in fill-in level k, the factori time of ILU(k) with natural ordering grows up significantly, as shown in Figure 10a, indi it is sensitive to the choice of fill-in level k.Compared with the ILU factorization under n ordering, the total computational time of ILU with reordering algorithms is not sensi the fill-in levels.Additionally, the computational performances are close to the optima putational cost with natural ordering, as shown in Figure 10.In detail, Table 4 summarizes the results of different reordering algorithms under the optimal fill-in levels.The factorization times of low fill-in levels are relatively small, but more nonlinear/linear steps are required to achieve convergence.The reordering algorithms can reduce the cost of factorization at high fill-in levels.Although it is still time-consuming, it can improve efficiency by reducing the number of iteration steps.As a result, The performance of ILU (11) with the ND algorithm is slightly superior to ILU(3) with natural ordering.So, although natural ordering only could achieve good computational efficiency at low fill-in levels, such as k = 3, the computational time will increase sharply as the fill-in level increases.The superiority of reordering algorithms emerges in this situation.The reordering-based ILU preconditioner can adapt to a wide range of fill-in levels without empirical selection.

Conclusions
Efficient and robust preconditioners are a key issue to solve nonlinear problems in the nuclear engineering community, especially within the framework of the JFNK method.There are two important steps for the preconditioning matrix, constructing the preconditioner and its matrix factorization.For the first step, an efficient preconditioningbased coloring algorithm is developed in this work, which significantly reduces the cost of finite difference computation by partitioning the columns of the coefficient matrix.For the second step, the robust reordering-based ILU(k) preconditioner is developed, which could achieve a high computational performance for a wide range of fill-in levels.As a preliminary work, the 2D-LRA neutron eigenvalue problem and a simplified PWR model are provided to demonstrate the performance of the preconditioner.The results show that the proposed preconditioner can automatically generate the preconditioning matrix and has strong robustness.The main conclusions are: 1.
The finite difference technique combined with the coloring algorithm is utilized to automatically construct a preconditioning matrix, whose computational efficiency could be about 60 times higher than that without the coloring algorithm.

2.
With the increase in fill-in level k, the factorization time of ILU(k) with natural ordering grows up significantly, indicating its computational performance is relatively sensitive to the choice of fill-in level k, since the additional non-zero fill-in element increases rapidly.

3.
The reordering algorithms are utilized to reduce the non-zero fill-in element in ILU(k).The reordering-based ILU(k) algorithm is a robust preconditioning matrix factorization method.Moreover, its performance under different fill-in levels are comparable to the optimal computational cost with natural ordering.
The reordering-based ILU(k) factorization presented in this work exhibited good computational performance and robustness.In this work, the steady-state neutron diffusion k-eigenvalue problem was considered, which is the key part of the physics design of the nuclear reactor.Future work will focus on the performance of the proposed preconditioner for practical transient neutronic/thermal-hydraulic coupling problems, and to perform the nuclear reactor safety analysis.

Figure 3 .
Figure 3. Preconditioning matrix and its compressed representation in steady-state neutron diffusion problem.(The color dots represent non-zero matrix elements, and each color represents a structurally orthogonal group) (a) Preconditioning matrix; (b) Its compressed representation.

Figure 3 .
Figure 3. Preconditioning matrix and its compressed representation in steady-state neutron diffusion problem.(The color dots represent non-zero matrix elements, and each color represents a structurally orthogonal group) (a) Preconditioning matrix; (b) Its compressed representation.

Figure 4 .
Figure 4. Sparse matrix and its corresponding graph.(a) Sparse matrix; (b) Corresponding graph with degrees.

Figure 4 .
Figure 4. Sparse matrix and its corresponding graph.(a) Sparse matrix; (b) Corresponding graph with degrees.

Figure 5 .
Figure 5. Matrix columns partitioned based on largest-first ordering.(a) Columns rearrangement in matrix form based on LF ordering; (b) Compressed matrix generation based on LF ordering; (c) Columns rearrangement in graph form based on LF ordering.

Figure 7 .
Figure 7. Residual history in three coloring methods using FDP.Figure 7. Residual history in three coloring methods using FDP.

Figure 7 .
Figure 7. Residual history in three coloring methods using FDP.Figure 7. Residual history in three coloring methods using FDP.

Figure 8 .Figure 9 .
Figure 8.The total computational time and factorization time in ILU(k) algorithm.(a) Compu tional time; (b) Factorization time.

Figure 8 .
Figure 8.The total computational time and factorization time in ILU(k) algorithm.(a) Computational time; (b) Factorization time.

Figure 8 .Figure 9 .
Figure 8.The total computational time and factorization time in ILU(k) algorithm.(a) Comp tional time; (b) Factorization time.

Figure 9 .
Figure 9.The linear steps and non-zeros after ILU factorization.(a) Total linear steps; (b) Nu of non-zeros.Figure 9.The linear steps and non-zeros after ILU factorization.(a) Total linear steps; (b) Number of non-zeros.

Figure 10 .
Figure 10.The total computational time and non-zeros of ILU factorization for simplified PWR model.(a) Total computational time; (b) Number of non-zeros.

Table 2 .
Computing performance of different coloring methods.For each coloring algorithm we cite five statistics, including total computational time, number of residual evaluations, number of colorings, nonlinear steps, and total linear steps.The convergence criterion is set as f →x 0 , where e r = 10 −8 .The nonlinear residual f →

Table 2 .
Computing performance of different coloring methods.

Table 4 .
The optimal fill-in level and computational performance in different reordering algorithms.