Acceleration and Parallelization of a Linear Equation Solver for Crack Growth Simulation Based on the Phase Field Model

: We aim to accelerate the linear equation solver for crack growth simulation based on the phase ﬁeld model. As a ﬁrst step, we analyze the properties of the coefﬁcient matrices and prove that they are symmetric positive deﬁnite. This justiﬁes the use of the conjugate gradient method with the efﬁcient incomplete Cholesky preconditioner. We then parallelize this preconditioner using so-called block multi-color ordering and evaluate its performance on multicore processors. The experimental results show that our solver scales well and achieves an acceleration of several times over the original solver based on the diagonally scaled CG method.


Introduction
Crack growth is a ubiquitous phenomenon that affects the strength and functions of materials and structures. Since cracks can grow very quickly, simulation is a useful tool to study their growth process in detail. Simulation can also be used to predict the generation of cracks under given stresses or other conditions. In the conventional method of crack growth simulation, the finite element method (FEM) is used and the mesh is regenerated at every time step so that the mesh boundary conforms to the crack boundary. However, this incurs huge computational cost. Moreover, to determine the direction of crack growth, it is usually necessary to evaluate the total energy for various possible scenarios. This also adds to the computational cost.
To resolve these problems, a crack growth simulation method based on the phase field model has been proposed [1][2][3][4]. In this model, a new continuous dependent variable z(x, t) called the phase field [5,6] is introduced in addition to the displacement u(x, t). This variable expresses the degree of the crack at each point: z 0 if there is no crack and z 1 if there is a crack. Moreover, a partial differential equation (PDE) describing the time evolution of z(x, t) is also derived along with that of u(x, t). This makes it possible to determine the direction of crack growth without the total energy evaluations. Hence, the method is a promising candidate for real-time three-dimensional crack growth simulation. Takaishi et al. implemented a crack growth simulation program based on this method and showed that it works well in various examples. For evaluation purposes, a two-dimensional FreeFEM code based on the method is also available.
When implementing this method, the time-discretization of the PDEs both for u(x, t) and z(x, t) is usually done with the semi-implicit method to ensure numerical stability. In that case, the time taken to solve the resulting linear simultaneous equations is often dominant in the total computation time. Takaishi et al.'s simulation program uses the conjugate gradient (CG) method with a diagonal scaling preconditioner for both of the equations for u(x, t) and z(r, t), which is a simple preconditioner applicable to a wide class of matrices but is not very efficient. The reason for using this preconditioner is that the properties of the coefficient matrices have not been fully understood yet.
In this paper, we aim to accelerate this linear equation solver by employing a more powerful preconditioner. This will help to speed up the linear equation solution that accounts for a large part of the computing time and open a way to solve larger scale and more realistic problems. To this end, we make the following contributions. First, we analyze the properties of the coefficient matrices obtained by applying semi-implicit time discretization and space discretization by FEM to the PDEs for u(x, t) and z(x, t). We show that, under appropriate boundary conditions, both of these coefficient matrices become symmetric positive definite (SPD). This justifies the use of the incomplete Cholesky (IC) preconditioner, which is more powerful than diagonal scaling. Second, we show that the IC preconditioner can be parallelized efficiently using the block multi-color ordering proposed by Iwashita et al. [7,8]. In fact, our numerical experiments suggest that the number of CG iterations increases only slightly, if at all, by this parallelization method compared to the sequential case. Finally, we optimize several performance parameters such as the block division scheme and show that the resulting parallel solver is several times faster than the diagonally scaled CG solver on multicore processors. Our results will be applicable to crack growth simulation in a variety of fields in science and engineering, such as the prediction of cracking in buildings and bridges and the analysis of solder cracking in circuit boards [2].
The rest of this paper is structured as follows. In Section 2, we briefly describe the crack simulation method based on the phase field model and its space and time discretization. In Section 3, the properties of the coefficient matrices arising from the discretization are analyzed, and their symmetric positive definiteness is proved under certain conditions. Section 4 describes the parallelization of the IC preconditioner using block multi-color ordering. Numerical results are given in Section 5. Finally, Section 6 concludes the paper.

Crack Growth Simulation Based on the Phase Field Model
In this section, we briefly explain the crack growth simulation method based on the phase field model, which was proposed by Takaishi and Kimura [3]. We begin with the two-dimensional case and then proceed to the three-dimensional case.

The Two-Dimensional Case
Let us consider crack growth in a thin panel as shown in Figure 1. Here, we focus on the so-called mode 3 crack, in which the displacement of the panel is in the direction perpendicular to the panel. Thus, we treat the problem as two-dimensional and denote the displacement at a point x = (x 1 , x 2 ) by a scalar variable u(x, t). We denote the region by Ω, its boundary by Γ, and the crack, which is modeled as a curve on Ω, by Σ. Hence, u(x) is discontinuous across Σ. In the example shown in Figure 1, the Dirichlet boundary condition is imposed on Γ D ⊂ Γ, while the Neumann boundary condition is imposed on The basic idea of the crack growth simulation method to be described below goes back to Griffith [9]. He proposed the expression of the total energy of the system as a sum of the elasticity energy E 1 and the surface energy E 2 due to the existence of a crack. Both of these energies depend on the crack Σ, as well as on u(x, t), so we denote them as E 1 [u, Σ] and E 2 [u, Σ]. Griffith assumed that crack growth occurs if the total energy decreases due to that factor. In the two-dimensional case, E 1 and E 2 can be written as follows [1]: where γ(x) > 0 is fracture toughness at x and µ is one of Lamé's constants. Equation (1) means that the elasticity energy is expressed as an integral of µ 2 |∇u| 2 over the entire region Ω, excluding the crack Σ. This is because the difference of u across Σ does not contribute to the elasticity energy. On the other hand, the surface energy (2) is expressed as a line integral along the crack.
While in principle (1) and (2) can be used to study the development of crack Σ, they are not convenient for numerical computation because the regions of the integral depend on Σ and change from step to step. To resolve this problem, Bourdin et al. [10] introduced a phase field variable z(x, t) that expresses the degree of crack at (x, t): z 0 if there is no crack and z 1 if there is a crack. z(x, t) is assumed to be a smooth function of x, and the transition between z = 0 and z = 1 is assumed to occur across a narrow region of width , where > 0 is a regularization parameter [11]. Under these assumptions, Bourdin et al. propose the use of the following regularized total energy functional instead of E[u, Σ]: In this formulation, the region of the integral is the entire region Ω for both the elasticity and surface energies, which greatly simplifies the numerical procedure.
For the efficient computation of u(x, t) and z(x, t) based on (3), Takaishi and Kimura [3] proposed the use of the gradient flow where τ is a virtual time parameter and α 1 and α 2 are time constants. It can easily be shown that if u(x, t) and z(x, t) evolve according to (4), the total energy functional (3) decreases monotonically. Thus we can expect that the (local) minimum of E [u, z; ] is reached as τ → ∞. Furthermore, if α 1 and α 2 are sufficiently large, u(x, t) and z(x, t) are expected to reach the minimizer of E [u, z; ] for given external conditions such as the boundary conditions and external forces (if any) very quickly. Thus, we can regard u(x, t) and z(x, t) determined by (4) as instantaneous reactions to the external conditions and (4) as approximately describing the time evolution of u(x, t) and z(x, t). By computing δE δu and δE δz explicitly, we obtain the following set of PDEs: where we changed τ to t and set µ = 1 for simplicity. g(r, t) denotes the Dirichlet boundary condition that causes the development of the crack. The symbol (·) + in the second equation means max(·, 0), which expresses the fact that a crack does not vanish once it is created [12]. See [13] for the treatment of partial differential equations with such terms. ∂ ∂n denotes the partial derivative in the direction of the outgoing normal vector.
Crack growth simulation based on (5) has the following advantages: • The direction of crack growth is automatically determined by the PDEs. Hence, total energy evaluation under multiple possible scenarios, which is needed in simulation methods based directly on (3), is not necessary; • By introducing the phase field variable z(x, t) and the regularization parameter , the divergence of the stress at the tip of the crack is kept to a level manageable by numerical methods; • It is not necessary to regenerate the mesh at every time step to conform to the crack boundary.
Our theoretical analysis and numerical experiments in the two-dimensional case are based on these PDEs.

The Three-Dimensional Case
In the three-dimensional case, the displacement becomes a vector field variable u(x, t) = (u 1 (x, t), u 2 (x, t), u 3 (x, t)) , where x = (x 1 , x 2 , x 3 ). The phase field variable z(x, t) is still a scalar field variable. Using the same idea as in the previous subsection, we can derive the set of PDEs corresponding to (5). In the following, we assume isotropic materials for simplicity. First, let us define the strain tensor ij = 1 and the stress tensor σ ij . These tensors are connected by Hooke's law, which has the following form in the case of isotropic materials: where λ and µ are Lamé's constants and δ ij is Kronecker's delta. We also write the stress tensor as σ = (σ ij ) = (s 1 , s 2 , s 3 ).
Using these tensors, the elasticity energy density e(u), which corresponds to µ 2 |∇u| 2 in the two-dimensional case, can be defined as follows: Using this, the regularized total energy functional is defined as which has the same form as (3) except that µ 2 |∇u| 2 is replaced with e(u). Note that now Ω is a three-dimensional region and Ω · dx denotes a volume integral. By considering the gradient flow as in (4), we obtain the following set of PDEs after some calculations [2]: Here, where n = (n 1 , n 2 , n 3 ) denotes the unit normal vector. Note that the first equation of (10) can also be written as This can be verified directly using the definitions of ij , σ ij and s j . The fourth equation of (10) is often expressed as σ · n = 0. (14) and is known as the stress-free boundary condition.

Temporal Discretization
For the temporal discretization of (5) and (10), we use a semi-implicit method. More specifically, to compute the solution (u t+∆t , z t+∆ t )at time t + ∆t from (u t , z t ) at time t, we replace u and z in the right-hand side of the equation for ∂u/∂t by u t+∆t and z t , respectively. Similarly, in the right-hand side of the equation for ∂z/∂t, we replace u and z by u t and z t+∆t , respectively. Thus, in the two-dimensional case, we obtain the following set of equations, which are linear both in u t+∆t and z t+∆t :.
Here, (17) corresponds to taking (·) + In the numerical simulation; we set α 1 = 0, which corresponds to assuming that the displacement u(x, t) responds to the changes of z(x, t) and the boundary conditions instantaneously.
The equations for the three-dimensional case are as follows: We set α 1 = 0 also in this case. Using expression (13), Equation (18) can also be written as where s t+∆t,j is the jth column of the stress tensor σ at time t + ∆t.

Properties of the Coefficient Matrices Arising from Phase Field-Based Crack Growth Simulation
In this section, we study the properties of the coefficient matrices arising from the finite element discretization of the basic equations for the phase field-based crack growth simulation. In particular, we prove that these matrices become symmetric positive definite under certain assumptions. This is important to be able to apply the efficient incomplete Cholesky preconditioner to these matrices.

The Two-Dimensional Case
The weak forms We start with the time-discretized Equation (15) with α 1 = 0. This is a Poisson equation for u t+∆t with variable coefficient (1 − z t ) 2 , and it is well known that its weak form can be written as follows: where v(x) is a test function with v(x) = 0 on Γ D . Finding u t+∆t satisfying (15) with α 1 = 0 is equivalent to finding u t+∆t satisfying (22) for arbitrary test functions v(x). Next, we derive the weak form for (16). By multiplying (16) by a test function w(x), integrating over Ω, using a slight extension of Green's identity (see Lemma 1), with φ = w, ψ =z t+∆t and f = γ(x), and noting that ∇z r+∆t · n = 0 on Γ N = Γ, we obtain Equations (17), (22) and (24) constitute the weak forms for the two-dimensional case.
Properties of the coefficient matrix for u t+∆t Now, we consider the properties of the matrix obtained by applying the finite element discretization to the weak form (22). Let φ 0 (x) be a function satisfying the boundary condition φ 0 (x, t) = g(x, t) (x ∈ Γ D ) and {φ j } m j=1 be basis functions that are zero on Γ D . We approximate u t+∆t byû t+∆t (x), which is a linear combination of these functions: Inserting this into (22) and choosing the test function as By defining the matrix C = (c ij ) ∈ R m×m and the vector and letting a = (a 1 , a 2 , . . . , a m ) , (26) can be written as follows.
Since c ij = c ji from (27), it is clear that C is a symmetric matrix. Moreover, for an arbitrary nonzero vector p = (p 1 , p 2 , . . . , p m ) , we have Noting that ∇ ∑ m i=1 p i φ i is not identically zero from the linear independence of {φ i }, we know that the integral in the right-hand side is positive as long as ∀x, 0 ≤ z t (x) < 1. Hence, C is positive definite, and we obtain the following theorem. Theorem 1. If ∀x, 0 ≤ z t (x) < 1, then the coefficient matrix C of the equation for u t+∆t is symmetric positive definite.

Properties of the coefficient matrix forz t+∆t
For the phase field variablez t+dt , the boundary condition consists of only the Neumann boundary condition. We therefore choose the basis functions {ψ j } m j=1 and approximatez t+dt by the following function:ẑ Inserting this into (24) and choosing the test function as By defining the matrix E = (e ij ) ∈ R m×m and the vector f = ( f i ) ∈ R m by and letting b = (b 1 , b 2 , . . . , b m ) , we have the following linear simultaneous equation.
Since e ij = e ji from (33), E is symmetric. Furthermore, the first term of e ij , which is Ω α 2 ψ i ψ j dx, is a Gram matrix and is therefore positive definite if {ψ i } is linearly independent. It is also clear that the remaining parts of e ij are also symmetric positive semidefinite. Thus, we arrive at the following theorem. Theorems 1 and 2 ensure that the CG method preconditioned by the incomplete Cholesky decomposition can be used to solve (29) and (35).

The Three-Dimensional Case
We next consider the three-dimensional case described by (18) through (21). First, we prepare the following lemma.

Lemma 1.
Let Ω be a bounded region in the three-dimensional space, Γ its boundary, and n the outward normal vector on Γ. Additionally, let φ(x), ψ(x) and f (x) be scalar fields and w(x) be a vector field defined in a region containing Ω. Then, the following equations hold: where Γ · dS denotes the surface integral on Γ.
Proof. First, we integrate both sides of the identity over Ω and apply Gauss's theorem to the left-hand side to transform it to Γ φ f w · n dS. By moving the terms, we have (36). Then, we obtain (37) by letting w = ∇ψ.

The weak forms
We derive the weak form for u t+∆t by considering a vector test function v(x) that becomes 0 on Γ D , computing its inner product with both sides of (21), assuming α 1 = 0, and integrating the results over Ω (this is equivalent to deriving a weak form for each component of (18) by multiplying it by a scalar test function and integrating the result over Ω. This is because if we choose a special vector test function with its y and z components identical to zero, we obtain the same result as if we multiply the x component of (18) by a scalar test function). By letting φ = v q , f = (1 − z t ) 2 and w = s t+∆t,q (q = 1, 2, 3) in (36) and summing both sides over q, we have where, in the second equality, we use the fact that v q = 0 on Γ D and s t+∆t,q · n = 0 on Γ N and therefore the surface integral on Γ vanishes. By equating the right-hand side of (39) to zero, we obtain the weak form for u t+∆t . The weak form forz t+∆t is exactly the same as (24) for the two-dimensional case, except that |∇u t | 2 is replaced with 2e(u t ).

Properties of the coefficient matrix for u t+∆t
For finite element discretization of the weak form for u t+∆t , we approximate each component u t+dt,j (j = 1, 2, 3) of u t+dt as a linear combination of a function φ j,0 satisfying the boundary condition on Γ D and basis functions {φ j, } m =1 that become zero on Γ D : Now, we choose as the test function v a function whose ith element is φ i,k and whose other elements are 0. Thus, v q = δ qi φ i,k . Inserting this along with (40) and the definition of ij into the weak form for u t+∆t gives By moving the terms containing the unknowns {a j, } to the left-hand side and the other terms to the right-hand side, we have Equations (39) for i = 1, 2, 3 and k = 1, 2, . . . , m constitute linear simultaneous equations of order 3m in {a j, }. Let us write this equation as where C is a 3m × 3m coefficient matrix, a is a 3m-dimensional unknown vector and d is a 3m-dimensional right-hand side vector. To investigate the positive definiteness of C, we compute p Cp, where p is a nonzero 3m-dimensional vector. To this end, we replace a j, with p j, in the left-hand side of (41), multiply the result with p i,k and sum over i and k. Then, after some calculations, we obtain It is clear from the expression in the middle that C is symmetric, since interchanging (i, k) and (j, ) leaves it invariant. Furthermore, if ∀x, 0 ≤ z t (x) < 1, we have p Cp ≥ 0 from the last expression, so C is also positive semidefinite. Finally, we investigate whether p Cp = 0 can occur for some p = 0. By writing w i = ∑ m k=1 p i,k φ i,k and w = (w 1 , w 2 , w 3 ) , we can rewrite (43) as For the right-hand side to be zero, under the assumption that ∀x, 0 ≤ z t (x) < 1, ∂w i ∂x j + ∂w j ∂x i must be identically zero for i, j = 1, 2, 3. This means that the strain tensor computed from w can have only a rotational component. However, if the target solid is fixed at three or more points, no such rotation is possible, and therefore the vector field w must be identical to zero, implying that p = 0. Thus, we arrive at the following theorem.
Theorem 3. If ∀x, 0 ≤ z t (x) < 1, then the coefficient matrix C of the equation for u t+∆t is symmetric positive definite.

Properties of the coefficient matrix forz t+∆t
The weak form forz t+∆t in the three-dimensional case is identical to that of the twodimensional case except that |∇u t | 2 is replaced with 2e(u t ). Since e(u t ) is nonnegative as well as |∇u t | 2 , by repeating the same argument that led to Theorem 2, we obtain the following theorem: So far, we have assumed that both λ and µ are constants over Ω. However, a close examination of the derivation of Theorems 3 and 4 reveals that they are valid even when λ and µ are continuous functions of x. In some applications, λ(x) and µ(x) might have discontinuities. In such a case, we can approximate them with smooth functions of x, by using sufficiently fine mesh around the discontinuities.

Application of the Incomplete Cholesky Preconditioner and Its Parallelization
Now that we have established that the coefficient matrices are symmetric positive definite, we can apply the IC preconditioner, which is more effective than diagonal scaling. In this section, we first describe the IC conditioner briefly and then explain how to parallelize it using the block multicolor ordering proposed by Iwashita et al.

The Incomplete Cholesky Preconditioner
Let A = (a ij ) ∈ R n×n be a sparse symmetric positive definite matrix and consider solving a linear simultaneous equation Ax = b by the conjugate gradient (CG) method. In the preconditioned conjugate gradient method, one applies the CG method to the modified Here, K ∈ R n×n is a nonsingular matrix designed so that K −1 AK − has a smaller condition number than A. Thus, the convergence of the CG method is accelerated. The diagonal scaling preconditioner uses diag( √ a 11 , √ a 22 , . . . , √ a nn ) as K. This preconditioner is simple and applicable to a wide class of matrices but not as effective in reducing the number of iterations of the CG method.
In this study, we use a more powerful preconditioner based on the incomplete Cholesky decomposition without fill-ins (the IC(0) decomposition). In this decomposition, we compute the Cholesky decomposition of A approximately by allowing the elementl ij of the approximate Cholesky factorL to be nonzero only when a ij = 0. Thus, fill-ins in the Cholesky decomposition are suppressed, and the computational cost and the memory requirement are reduced. The algorithm of the IC(0) decomposition is shown as Algorithm 1. Here, in the sums ∑  In the incomplete Cholesky preconditioner,L computed by Algorithm 1 is used as K. WhileL satisfies only an approximate relation A LL , it is often a sufficiently good approximation to the true Cholesky factor to make K −1 AK − much better conditioned than A.
For a sparse matrix A arising in the finite element method, a simplified IC(0) decomposition is sometimes used instead of Algorithm 1. In this variant, the off-diagonal elements are not updated and the fourth line of Algorithm 1 is replaced withl ij = a ij /l jj . We use this variant in this study.

Parallelization by the Block Multi-Color Ordering
The IC(0) decomposition algorithm inherits the sequential nature of the original Cholesky decomposition. Suppose that i < j and a ij = 0. Then, sincel ii depends onl ij by line 2 of Algorithm 1 andl ij depends onl jj by line 4,l ii depends onl jj . In the finite element method using triangular (or tetrahedral) elements and piecewise linear basis functions, the matrix element a ij is nonzero if and only if the nodes i and j belongs to the same element. The dependency thus caused gives rise to a difficulty in parallelizing the IC(0) decomposition.
One of the techniques to resolve this problem is multi-color ordering [14]. In this ordering strategy, we assign colors 1, 2, . . . , c to the nodes in such a way that nodes belonging to the same element have different colors and the number of required colors is minimal. Then, we renumber the nodes so that the nodes with color 1 are numbered first, those with color 2 are numbered next, and so on. Then, if nodes i and j have the same color, they do not belong to the same element, and therefore a ij = 0. Thus, the computation ofl ii andl jj can be done in parallel.
However, it has been pointed out that this reordering can degrade the quality of the IC(0) preconditioner, thereby increasing the number of CG iterations. Consult [15] for more details about this phenomenon. As a remedy, Iwashita et al. proposed block multi-color ordering [7,8], which partitions the set of nodes into blocks and applies the multi-color ordering to the blocks rather than to the individual nodes. It is known that this modification is effective in retaining the quality of the IC(0) preconditioner, since the ordering within each block can be made the same as the natural ordering. The block multi-color ordering applied to a two-dimensional triangular mesh and the nonzero pattern of the resulting matrix are depicted in Figures 2 and 3, respectively. Here, c = 4, and thus the matrix has a 4 × 4 block structure. Each of the four diagonal blocks has a 2 × 2 block diagonal structure, reflecting the fact that there are two blocks of each color. Thus, the IC(0) decomposition of these two (small) diagonal blocks can be performed in parallel. We use this ordering strategy in our numerical experiments.

Numerical Results
In this section, we apply the conjugate gradient method with the IC(0) preconditioner to the linear simultaneous equations to compute the time evolution of u(x, t) and z(x, t) in the phase field-based crack growth simulation. We parallelize the IC(0) preconditioner using the block red-black ordering and evaluate its parallel performance, as well as the convergence acceleration effect compared with the diagonal scaling preconditioner. Both two and three-dimensional problems are used as test problems.

The Two-Dimensional Case
We used the 2-D phase field-based crack growth simulation code developed by Takaishi et al. as a basis and replaced its linear equation solver, which uses the CG method with diagonal scaling, with our solver. Our solver is based on the CG method with IC(0) preconditioning and is thread-parallelized using multi-color ordering. We used four colors, as in Figure 2, and set the number of blocks equal to four times the number of threads. The program was written in C and OpenMP, and all computations were performed in double precision arithmetic. In the numerical experiments in this subsection, we used an Intel Xeon processor E5-2660 v2, which has 10 cores, and Intel C Compiler Ver. 16.0.0.109 with the -O3 option.
The computational region Ω used in the numerical experiments is as shown in Figure 1. It is a square region (panel) with the initial crack Σ running from the left edge to the center. The Dirichlet boundary conditions, which represent the forces to widen the crack, are applied to the upper and lower edges. More specifically, the problem is defined as follows.

•
Computational region: Convergence criterion of the CG method: relative residual ≤ 10 −10 . We first checked the positive definiteness of the coefficient matrix C for u(x, t) (see (27)) using a small problem with 11 × 11 to 50 × 50 mesh. Note that the matrix E for z(x, t) (see (33)) is obviously positive definite because it is an O(∆t) perturbation of the positive definite Gram matrix. It was confirmed by the numerical experiment that the smallest eigenvalue of C is always positive, and thus the matrix C is positive definite. The smallest eigenvalue λ min and the largest eigenvalue λ max of C for each mesh before, during and after crack growth are shown in Table 1. We also show the change of the condition number of C as the crack grows in Figure 4. It can be seen that the condition number leaps up suddenly around t = 35, where the crack grows rapidly. It seems that this is because the area of z(x, t) 1 is widened, causing the near-singularity of C, as can be inferred from (30).  The time evolution of z(x, t) for this problem is shown in Figure 5. Here, the mesh size is 101 × 101 and the period of simulation is from t = 0 to t = 34.5. Until t = 28.5, z(x, t) does not change significantly and z(x, t) 1 only along the line connecting (−1, 0) and (0, 0), showing that the crack exists only in this region. As time passes, the region of z(x, t) 1 extends to the right edge of the region, meaning that the panel has broken into two pieces. The evolution of z(x, t) for other initial conditions, which was computed using a FreeFEM code, is shown in Appendix A. We next evaluated the parallel acceleration of our linear equation solver by varying the number of threads from 1 to 10. The results for 101 × 101 and 201 × 201 meshes are shown in Figures 6 and 7, respectively. The number of blocks in the x 1 and x 2 directions, which we denote by nbx and nby, are shown in Table 2. These numbers were determined experimentally to achieve the best performance in each case, under the condition that nbx × nby equals four times the number of threads. It can be seen that the solution of the linear simultaneous equations for u and z achieves an acceleration of up to 5 and 4 times, respectively, using 10 threads.

The Three-Dimensional Case
For the three-dimensional case, we again used Takaishi et al.'s code and replaced its linear equation solver with our parallel CG solver with IC(0) preconditioning and multi-color ordering. We used a semi-structured mesh as shown in Figure 10, which is unstructured in the (x 1 , x 2 ) plane and is structured in the x 3 direction, sliced it horizontally into 2× (number of threads) panels, and colored them in an alternating manner using two colors. Then, each pair of panels was allocated to one thread for parallel execution. In the numerical experiments in this subsection, we used an Intel Xeon Gold 6148 Processor with 20 cores (1 node of the Grand Chariot supercomputer at the Hokkaido University Information Initiative Center) and Intel C Compiler Ver. 18.0.3 with the -O3 option. The test problem is crack growth in a rectangular parallelepiped region, defined as follows.
The parallel acceleration of our solver for the 51 × 51 × 52 and 101 × 101 × 102 meshes is shown in Figures 11 and 12, respectively. For the latter mesh, the solution of the linear system for u and z was accelerated by up to 6.7 and 4.4 times, respectively. We also show the acceleration of each component of our solver and the breakdown of the execution time in Figures 13-16. Our solver consists of a matrix-vector product, forward and backward substitutions corresponding to the application ofL −1 andL − , respectively, and other parts such as vector additions, dot products and computation of norms. Figures 13 and 15 show that the matrix-vector product and the forward/backward substitutions are reasonably well accelerated, but Figures 14 and 16 reveal that the other parts are not accelerated at all. The latter are difficult to parallelize efficiently because they are vector operations with small computational work and a relatively large amount of data transfer. If this part could be improved, our solver could achieve further acceleration.    Finally, we compare the execution time of our solver with that of the original solver in Figures 17 and 18. Thanks to the use of the IC(0) preconditioner with multi-color ordering, our solver attains an acceleration of 6.1 and 8.3 for the 51 × 51 × 52 and 101 × 101 × 102 meshes, respectively.

Conclusions
In this paper, we accelerated the linear equation solution part in phase field-based crack growth simulation using the conjugate gradient method with IC(0) preconditioning. To this end, we first analyzed the properties of the coefficient matrices both for the displacement u(x, t) and the phase field z(x, t) and proved that they are symmetric positive definite under certain conditions. Thus, the use of the IC(0) preconditioning is justified. Then, we parallelized the IC(0) preconditioner using the block multi-color ordering and evaluated its performance on multicore processors. The experimental results show that our solver scales well both for the two and three-dimensional problems and achieves an acceleration of several times over the original solver based on the diagonally scaled CG method. Our future work will include the distributed-memory parallelization of our solver and its application to real-world crack growth problems.
Author Contributions: Conceptualization, T.T. and Y.Y.; theoretical analysis, G.I. and Y.Y.; coding, G.I. and T.T.; parallelization and optimization, G.I.; numerical experiments, G.I. All authors have read and agreed to the published version of the manuscript.
Funding: This study is partially supported by JSPS KAKENHI Grant Numbers 17H02828, 17K19966 and 19KK0255.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: The parallel ICCG solver developed in this work, as well as the FreeFEM code used in the Appendix A, is available from the authors upon request. Some of these programs are also downloadable from the following URL: https://github.com/yusakuyamamoto/ Crack-growth-simulation, accessed on 20 August 2021 .