Block Preconditioning Matrices for the Newton Method to Compute the Dominant λ-Modes Associated with the Neutron Diffusion Equation

In nuclear engineering, the λ-modes associated with the neutron diffusion equation are applied to study the criticality of reactors and to develop modal methods for the transient analysis. The differential eigenvalue problem that needs to be solved is discretized using a finite element method, obtaining a generalized algebraic eigenvalue problem whose associated matrices are large and sparse. Then, efficient methods are needed to solve this problem. In this work, we used a block generalized Newton method implemented with a matrix-free technique that does not store all matrices explicitly. This technique reduces mainly the computational memory and, in some cases, when the assembly of the matrices is an expensive task, the computational time. The main problem is that the block Newton method requires solving linear systems, which need to be preconditioned. The construction of preconditioners such as ILU or ICC based on a fully-assembled matrix is not efficient in terms of the memory with the matrix-free implementation. As an alternative, several block preconditioners are studied that only save a few block matrices in comparison with the full problem. To test the performance of these methodologies, different reactor problems are studied.


Introduction
The neutron transport equation is a balance equation that describes the behavior of the neutrons inside the reactor core.This equation for three-dimensional problems is an equation defined in a phase space of dimension seven, and this makes the problem very difficult to solve.Thus, some approximations are considered such as the multigroup neutron diffusion equation by relying on the assumption that the neutron current is proportional to the gradient of the neutron flux by means of a diffusion coefficient.
Given a configuration of a nuclear reactor core, its criticality can be forced by dividing the production operator in the neutron diffusion equation by a positive number, λ, obtaining a neutron balance equation: the λ-modes problem.For the two energy groups approximation and without considering up-scattering, this equation can be written as [1]: where φ 1 and φ 2 denote the fast and thermal flux, respectively.The macroscopic cross-sections D g , Σ ag , and νΣ f g , with g = 1, 2, and Σ 1,2 , are values that depend on the position.The largest eigenvalue in magnitude, called the k-effective (or multiplication factor), indicates a measure of the criticality of the reactor, and its corresponding eigenfunction describes the steady-state neutron distribution in the reactor core.Next, dominant eigenvalues and their corresponding eigenfunctions are useful to develop modal methods for the transient analysis.
To make a spatial discretization of Problem (1), a high order continuous Galerkin finite element method is used, leading to a generalized algebraic eigenvalue problem of the form: where the matrix elements are given by: ) where N i is the prescribed shape function for the ith node.The vector φ = φ1 , φ2 T is the algebraic vector of the finite weights corresponding to the neutron flux in terms of the shape functions.The shape functions used in this work are Lagrange polynomials.The subdomains Ω e (e = 1, . . ., N t ) denote the cells in which the reactor domain is divided and where the cross-sections are assumed to be constant.Similarly, Γ e is the corresponding subdomain surface, which is part of the reactor boundary.More details on the finite element discretization can be found in [2].For the implementation of the finite element method, the open source finite elements library Deal.II [3] has been used.In this work, a matrix-free strategy for the blocks of the matrix M and for the non-diagonal blocks of L is developed.In this way, matrix-vector products are computed on the fly in a cell-based interface.For instance, we can consider that a finite element Galerkin approximation that leads to the matrix M 1,1 takes a vector u as input and computes the integrals of the operator multiplied by trial functions, and the output vector is v.The operation can be expressed as a sum of N t cell-based operations, where P e denotes the matrix that defines the location of cell-related degrees of freedom in the global vector and M e 1,1 denotes the submatrix of M 1,1 on finite element e.This sum is optimized through sum-factorization.Details about the implementation are explained in [4].This strategy greatly reduces the memory used by the matrix elements.
Calculation of the dominant lambda mode has traditionally utilized the classical power iteration method, which although robust, converges slowly for dominance ratios near one, as occurs in some practical problems.Thus, acceleration techniques are needed to improve the convergence of the power iteration method.Some approaches in diffusion theory are, for instance, Chebyshev iteration [5] and Wielandt shift [6].Alternative approaches to the power iteration method have been studied in an attempt to improve upon the performance of accelerated power iteration methods [7,8].The subspace iteration method [9], the Implicit Restarted Arnoldi method (IRAM) [10], the Jacobi-Davidson [11], and the Krylov-Schur method [2] implemented in the SLEPclibrary [12] have been used to compute the largest or several dominant eigenvalues for the neutron diffusion equation and their corresponding eigenfunctions.More recently, other Krylov methods have been used to compute these modes for other approximations of the neutron transport equation [7,13].Usually, applying these kinds of methods requires either transforming the generalized problem (2) into an ordinary eigenvalue problem or applying a shift and invert technique.In both cases, in the solution process, it is necessary to solve numerous linear systems.These systems are not well-conditioned, and they need to be preconditioned.Thus, the time and computational memory needed to compute several eigenvalues become very high.
One alternative is to use a method that does not require solving any linear system, such as the generalized Davidson, used for neutron transport calculations in [14].Other methods are the block Newton methods that have been shown to be very efficient in the computation of several eigenvalues in neutron diffusion theory.These methods either do not need to solve as many linear systems as the Krylov methods or avoid solving any linear system with some hybridization.One of these Newton methods is the modified block Newton method, which has been considered for the ordinary eigenvalue problem associated with Problem (2) [15] or directly for the generalized eigenproblem (2) [16].One advantage of these block methods is that several eigenvectors can be approximated simultaneously, and as a consequence, the convergence behavior improves.The convergence of the eigensolvers usually depends on the eigenvalue separation, and if there are clustered or multiple eigenvalues, the methods may have problems finding all the eigenvalues.In practical situations of reactor analysis, the dominance ratio corresponding to the dominant eigenvalues is often near unity, resulting in a slow convergence.In the block methods, this convergence only depends on the separation of the group of target eigenvalues from the rest of the spectrum.Another advantage is that these methods do not require solving as many linear systems as the previous methods.However, these linear systems still need to be preconditioned.Another of this kind of Newton method is the Jacobian-free Newton-Krylov methods that have been studied with traditional methods such as the power iteration used as the preconditioner [17,18] or with a more sophisticated Schwarz preconditioner [19].In this work, we use the Modified Generalized Block Newton Method (MGBNM) presented in [16], and we propose several ways to precondition the linear systems that need to be solved in this method in an efficient way.
The structure of the rest of the paper is as follows.In Section 2, the modified generalized block Newton method is described.In Section 3, the different preconditioners for the MGBNM are presented.The performance of the preconditioners is presented in Section 4 for two different benchmark problems.Finally, Section 5 synthesizes the main conclusions of this work.

The Modified Generalized Block Newton Method
This method was presented by Lösche in 1998 [20] for ordinary eigenvalue problems, and an extension to generalized eigenvalue problems was studied in [16].Given the partial generalized eigenvalue problem (2) written as: where X ∈ R n×q is a matrix with q eigenvectors and Λ ∈ R q×q is a diagonal matrix with the q eigenvalues associated, we suppose that the eigenvectors can be factorized as X = ZS, where Z is an orthogonal matrix.Moreover, the biorthogonality condition W T Z = I is introduced, where W is a fixed matrix.Thus, if we denote K = SΛS −1 , the problem (4) can be rewritten as: We construct this projection to ensure that the method converges to independent eigenvectors.Then, the solution of Problem ( 4) is obtained by solving the non-linear problem: By applying Newton's method, a new iterated solution arises as: where ∆Z (k) and ∆K (k) are solutions of the system obtained when the equations ( 6) are substituted into the equations ( 5), and these are truncated at the first order terms.
The matrix K (k) is not necessarily a diagonal matrix, and as a result, the system is coupled.To avoid this problem, the modified generalized block Newton method (MGBNM) needs to apply the previous two steps.Firstly, the modified Gram-Schmidt process is used to orthonormalize the matrix Z (k) .Then, the Rayleigh-Ritz projection method for the generalized eigenvalue problem [21] is applied.Thus, i ∈ R are obtained from the solutions of the linear systems: The solution of these systems is computed by using the Generalized Minimal Residual method (GMRES) computing the matrix vector products with block matrix multiplications.However, these systems need to be preconditioned (in each iteration and for each eigenvalue) to reduce the condition number of the matrix.

Preconditioning
The first choice for a preconditioner is assembling the matrix: and constructing the full preconditioner associated with the matrix.We use the ILUT(0) preconditioner since A is a non-symmetric matrix.There are no significant differences if the preconditioner obtained for the matrix associated with the first eigenvalue is used for all eigenvalues in the same iteration because in the matrix, A only changes the value of λ i , and usually, the eigenvalues in reactor problems are clustered.This preconditioner is denoted by P.
To devise an alternative preconditioner without the necessity of assembling the matrix A, we write the explicit inverse of A, by using its block structure, where: We desire a preconditioner for A by suitably approximating A −1 .Let us call P J a preconditioner for J.For instance, P J = (LU) −1 , where L, U are the incomplete L and U factors of J. Thus, we can define, after setting C T 2 = Z T P J , the preconditioner of A as: The previous preconditioner does not need to assemble the entire matrix A, but it needs to assemble the matrix J to build its ILUpreconditioner.Therefore, the next alternative that we propose is using a preconditioner of −L instead of J = M − λ 1 L.This preconditioner works well because in the discretization process, the L matrix comes from the discretization of the differential matrix that has the gradient operators and the diffusion terms.In addition, in nuclear calculations, λ 1 is near 1.0.Thus, we can build a preconditioner of −L instead of the matrix J.We denote by PL the preconditioner PJ where the preconditioner of −L is used to precondition the block J.
Finally, the last alternative is avoiding assembling the matrix L taking advantage of its block structure.For that purpose, we carry out a similar process as the one used for matrix A. We write the explicit form of the inverse of L as: and substitute the inverses of the blocks by preconditioners.Thus, the preconditioner of L has the following structure: are symmetric and positive definite.Then, we can use as preconditioner the Incomplete Cholesky decomposition (IC(0)).However, the main advantage of this preconditioner is that it permits using a matrix-free implementation that does not require allocating all matrices.We only need to assemble the blocks L 11 and L 22 to construct the associated IC(0) preconditioners.The application of PJ with −Q L to precondition J is called PQ .

Numerical Results
In this section, the performance of the proposed preconditioners has been tested on two different problems: a version of the 3D NEACRPreactor [22] and a configuration of the Ringhals reactor [23].The neutron diffusion equation in both problems has been discretized using the finite element method presented in Section 1 using Lagrange polynomials of degree three because it is shown in previous works that this degree is necessary to obtain accurate results in similar reactor problems [2].The number of eigenvalues computed was four for each reactor.
The incomplete lower-upper preconditioner with Level 0 of fill (ILU) has been provided by the PETScpackage [24].
As the modified generalized block Newton method needs an initial approximation of a set of eigenvectors, a multilevel initialization with two meshes was used to obtain this approximation (for more details, see [25]).
The stopping criteria for all solvers has been set equal to 10 −6 in the global residual error, where λ i is the i th eigenvalue and x i its associated eigenvector such that x i = 1.
The modified block Newton method has been implemented using a dynamic tolerance in the residual error of the solution in the linear systems.The tolerance values have been set to {10 −2 , 10 −3 , 10 −5 , 10 −8 , 10 −8 , . . .}.
The methods have been implemented in C++ based on the data structures provided by the library Deal.ii [3] and PETSc [24].The computer used for the computations was an Intel R Core TM i7-4790 @3.60 GHz with 32 Gb of RAM running on Ubuntu GNU/Linux 16.04 LTS.

NEACRP Reactor
The NEACRP benchmark in a near-critical configuration [22] is chosen to compare the proposed methodology.The reactor core has a radial dimension of 21.606 cm × 21.606 cm per assembly.Axially, the reactor, with a total height of 427.3 cm, is divided into 18 layers with height (from bottom to top): 30.0 cm, 7.7 cm, 11.0 cm, 15.0 cm, 30.0 cm (10 layers), 12.8 cm (two layers), 8.0 cm, and 30.0 cm. Figure 1 shows the reactor geometry and the distribution of the different materials.The cross-sections of materials are displayed in Table 1.The total number of cells of the reactor domain is 3978.Zero flux boundary conditions were set.The spatial discretization of the neutron diffusion equation, by using polynomials of degree three, gave a number of 230,120 degrees of freedom.The mesh built to obtain an initial guess had 1308 cells, and the computational time needed to obtain this approximation was 24 s.The four dominant eigenvalues computed are collected in Table 2.This table shows that the spectrum associated with the problem is clustered with two degenerated eigenvalues.A representation of the fast flux distribution for each mode is displayed in Figure 2.      First, we show the convergence history of the MGBNM to obtain the solution of the eigenvalue problem. Figure 3 shows the number of iterations against the residual error for the NEACRP reactor.It is observed that the MGBNM only needed four iterations to reach a residual error equal to 10 −6 .Table 3 collects the average number of iterations obtained by directly applying the ILU preconditioner of A and the total time that the GMRES method needs to reach the residual error in the linear systems given in Tol( b − Ax ).The time spent to assemble the matrices and to build the preconditioner (setup time (s)) is also displayed.These data are presented for each iteration and in a total sum.This table shows that the number of iterations is not very high, but the time spent to assemble the matrix and to construct the preconditioner increases the total CPU time considerably.It is necessary to build in each iteration a new preconditioner for A because the columns related to the block Z change considerably in each updating.Table 4 displays these data related to the block preconditioner proposed PJ that uses the ILU preconditioner for approximating the inverse of M − λ 1 L. It is observed that we only needed to assemble the matrix M − λ 1 L once in the first iteration to build the preconditioner.This is because we only needed a preconditioner of M − λ 1 L, and the value of λ 1 was very similar for all iterations.The mean of the number of iterations of the GMRES preconditioned with PJ was larger than in the previous case, but the total CPU time of using this block preconditioner was reduced by 26 s with respect to the full preconditioner.Table 5 shows the data related to the block preconditioner PJ , but in this case, we have used the Geometric Multigrid (GMG) preconditioner to approximate the inverse of M − λ 1 L. The results show, in comparison with the results of Table 4, that in spite of the total number of iterations and the setup time being much lower for the GMG, the total computational time is much higher.This is due to the application of the GMG preconditioner being more expensive than the application of the ILU preconditioner.The next results were obtained by using the block preconditioner, P, but in these cases, approximating the (M − λ 1 L) −1 by the ILU preconditioner of −L ( PL ) and by a block preconditioner of −L (Q L).The most relevant data to compare the preconditioners considered in this work are exposed in Table 6.They were the total iterations of the GMRES, the total setup time, the total time to compute the solution, and the maximum computational memory spent by the matrices.We observe that the number of iterations increased when worse approximations of the inverse of A were considered, but the setup time that each preconditioner needs became smaller.Moreover, the maximum CPU memory was also reduced significantly.In the total CPU times, we observed that the block preconditioner ( P), in all of its versions, improved the times obtained by applying the ILU preconditioner of A directly.Between the possibilities for obtaining a preconditioner of M − λ 1 L, there were no big differences in the computational times, but there was an important savings of the computational memory.The best results were obtained by PL if the computational memory consumption was taken into account.Table 7 shows the timings and the memory spent in the matrix allocation by using the matrix-free technique or without using this strategy.The results show that not only the matrix memory consumption and the time to assemble were reduced, but also the time spent to compute the matrix-vector products.That implies that the matrix-free strategy reduced the total CPU time by about 30%.Finally, we compared the MGBNM with this methodology against other eigenvalue solvers commonly used in the neutron diffusion computations (Table 8).We show the results by using a different number of computed eigenvalues (No. eigs).In particular, we have chosen for this comparison the generalized Davidson preconditioned with the block Gauss-Seidel preconditioner and the Krylov-Schur method by previously reducing the generalized eigenvalue problem to an ordinary eigenvalue problem as in [2].To use both methods, the library SLEPc has been used [12].From the computational times, we can deduce that the MGBNM was twice as fast as the rest of solvers for the computation of one and two eigenvalues, and it was very competitive at computing four eigenvalues.

Ringhals Reactor
For a practical application of the preconditioners in a real reactor, we have chosen the configuration of the Ringhals rector.Particularly, we have chosen the C9 point of the BWRreactor Ringhals I stability benchmark, which corresponds to a point of operation that degenerated in an out-of-phase oscillation [23].It is composed of 27 planes with 728 cells in each plane.A representation with more detail of its geometry can be observed in Figure 4.The spatial discretization using finite elements of degree three gave 1,106,180 degrees of freedom.The coarse mesh considered to obtain an initial guess for the MGBNM had 6709 cells and the problem associated with this mesh a size of 386,768 degrees of freedom.The computed dominant eigenvalues were 1.00191, 0.995034, 0.992827, and 0.991401.The corresponding fast fluxes are represented in Figure 5.The convergence history of the MGBNM associated with the Ringhals reactor is represented in Figure 6.For this reactor, the number of iterations needed to reach the tolerance (10 −6 ) was also equal to four.

2 .
(a) 1st mode (b) 2nd mode (c) 3rd mode (d) 4th mode Fast fluxes' distribution of the NEACRP reactor corresponding to the first four modes.

Figure 3 .
Figure 3. Convergence history of the Modified Generalized Block Newton Method (MGBNM) for the NEACRP reactor.

Figure 5 .
Figure 5. Fast fluxes' distribution of the Ringhals reactor corresponding to the first four modes.
P 22 denote a preconditioner of L 11 and L 22 , respectively.The block matrices L 11 , L 22

Table 2 .
Eigenvalues for the NEACRP reactor.

Table 3 .
Data for the preconditioner P for the NEACRP reactor.GMRES, Generalized Minimal Residual.

Table 4 .
Data for the preconditioner PJ with ILUfor the NEACRP reactor.

Table 5 .
Data for the preconditioner PJ with the Geometric Multigrid (GMG) for the NEACRP reactor.

Table 6 .
Data obtained by using different preconditioners for the NEACRP reactor.

Table 7 .
Data obtained using different matrix implementations for the NEACRP reactor.

Table 8 .
Computational times for the MGBNM with PQ , the generalized Davidson method, and the Krylov-Schur method for the NEACRP reactor.eigs, eigenvalues.