A Compact FEM Implementation for Parabolic Integro-Differential Equations in 2D

Although two-dimensional (2D) parabolic integro-differential equations (PIDEs) arise in many physical contexts, there is no generally available software that is able to solve them numerically. To remedy this situation, in this article, we provide a compact implementation for solving 2D PIDEs using the finite element method (FEM) on unstructured grids. Piecewise linear finite element spaces on triangles are used for the space discretization, whereas the time discretization is based on the backward-Euler and the Crank–Nicolson methods. The quadrature rules for discretizing the Volterra integral term are chosen so as to be consistent with the time-stepping schemes; a more efficient version of the implementation that uses a vectorization technique in the assembly process is also presented. The compactness of the approach is demonstrated using the software Matrix Laboratory (MATLAB). The efficiency is demonstrated via a numerical example on an L-shaped domain, for which a comparison is possible against the commercially available finite element software COMSOL Multiphysics. Moreover, further consideration indicates that COMSOL Multiphysics cannot be directly applied to 2D PIDEs containing more complex kernels in the Volterra integral term, whereas our method can. Consequently, the subroutines we present constitute a valuable open and validated resource for solving more general 2D PIDEs.

Against that background, the purpose of this article is to provide a compact formulation for solving PIDEs in two spatial dimensions (2D) using FEM on unstructured grids (we made the resulting software, which we programmed in MATLAB, openly available). This is done by extending the already available FEM formulations for PDEs [40,43,44,48,49] to PIDEs. In particular, we: • Use piecewise linear finite element spaces on triangles for the space discretization; • Vase the time discretizations on the backward-Euler and the Crank-Nicolson methods; • Choose the quadrature rules for discretizing the Volterra integral term so as to be consistent with the time-stepping schemes.
In addition, we demonstrate that for PDEs, vectorization techniques can be used to speed up the sets of code for PIDEs. Importantly, these sets of code can be easily extended to more complex geometries, and to PIDEs with different kernels in the Volterra integral term. A comparison with results obtained using COMSOL Multiphysics for a test example shows that our method is competitive. Moreover, further consideration indicates that COMSOL Multiphysics cannot be directly applied to 2D PIDEs containing more complex kernels in the Volterra integral term, whereas our method can.
The rest of the paper is organized as follows. The model problem and its weak formulation are described in Section 2. The discretization is sketched in Section 3. The data representation of the triangulation is described in Section 4, together with the discrete space. The assembly of the stiffness matrix, mass matrix and load vector are presented in Section 5. Improved pseudo-codes using vectorization is presented in Section 6, and an implementation in MATLAB is discussed in Section 7. In Section 8, a comparison of the results obtained using the different sets of code is presented; results obtained using COMSOL Multiphysics are also compared. Conclusions are drawn in Section 9.

Model Problem and Weak Formulation
Given a Lebesgue measurable set Ω, we denote by L p (Ω), 1 ≤ p ≤ ∞ the Lebesgue spaces. When p = 2, the space L 2 (Ω) is equipped with an inner product ·, · . For an integer m > 0, we use the standard notation for Sobolev spaces W m,p (ω) with 1 ≤ p ≤ ∞. When p = 2, we denote W m,2 (Ω) by H m (Ω). The function space H 1 0 (Ω) consists of elements from H 1 (Ω) that vanish on the boundary of Ω, where the boundary values are to be interpreted in the sense of a trace.
Consider the following initial-boundary value problem for a linear PIDE of the form Here, 0 < T < ∞ and Ω ⊂ R 2 is a bounded Lipschitz domain with boundary ∂Ω. Further, Au(x, t) = −∆u(x, t) and B(t, s)u(x, s) = −∇ · (B(t, s)∇u(x, s)), where ∆ denotes the Laplacian and ∇ denotes the spatial gradient. We assume that the coefficient matrix B(t, s) = {b ij (x; t, s)} is 2 × 2 in L ∞ (Ω) 2×2 . If we assume the initial function u 0 (x) to be in H 2 (Ω) ∩ H 1 0 (Ω), the source term f (x, t) to be in L 2 (0; T; L 2 (Ω)) and then the problem (1) admits a unique solution The existence results are discussed in detail in Chapter 2 of [9]. For regularity and stability results of such problems, please refer to [10,28]. Let a(·, ·) : H 1 0 (Ω) × H 1 0 (Ω) → R be the bilinear form corresponding to the operator A defined by Similarly, let b(t, s; ·, ·) be the bilinear form corresponding to the operator B(t, s) defined on Then, the weak formulation of the problem (1) may be stated as follows: find u : u(·, 0) = u 0 .

Galerkin Discretizations
Let T denote a regular partition of the domain Ω ∈ R 2 into disjoint triangles K of diameter h K such that: where P 1 is the space of polynomials in d variables of degree at most 1. With N = {z 1 , · · · , z M } as the set of nodes of T , we consider the nodal basis B = {φ 1 , · · · , φ M }, where the hat function φ l ∈ P 1 is characterized by φ l (z k ) = δ kl , where δ kl is the Kronecker delta.

Backward-Euler Scheme
Let I h be the Lagrange interpolant corresponding to V. Then, the fully discrete backward-Euler scheme may be stated as follows: given U 0 , where U 0 = I h u 0 , find U n ∈ V, n ∈ [1 : N] such that where if the left rectangular rule is used to discretize the integral term, or if the right rectangular rule is used. In any case, both are consistent with the backward-Euler scheme and the order of accuracy is same. The discrete problem (6) becomes: where the mass matrix M, the stiffness matrix A and the load vector b n are given by respectively.

Crank-Nicolson Scheme
We now state the Crank-Nicolson scheme as follows: given U 0 , where U 0 = I h u 0 , find U n ∈ V, n ∈ [1 : N] such that Ω ∂U n φdx + a(U n−1/2 , φ) = σ n (b(t n−1/2 ; U, φ)) + where Here, the trapezoidal rule is used to discretize the integral term in order to be consistent with the Crank-Nicolson scheme.
Using (13), we obtain the following matrix equation for the Crank-Nicolson scheme: for n > 1 and where M is the mass matrix, A is the stiffness matrix andb n is the load vector:

Remark on More General Boundary Conditions
Problems involving more general boundary conditions can also be addressed in a similar manner. Suppose that the domain boundary is split into two components Γ D and Γ N such that Γ D ∪ Γ N = ∂Ω and Γ D ∩ Γ N = ∅, assuming that Γ D is closed and has a nonzero measure. For sufficiently regular Neumann boundary data g N , consider a problem posed with mixed-type boundary conditions: We assume there is a function u D (x, t) ∈ H 1 (Ω) for t ∈ [0, T] such that u D (x, t) = g(x) for x ∈ ∂Ω, t ∈ [0, T]. Then, the weak form of this problem reads:

Data Representation
To fix ideas ahead of benchmark computations, consider the L-shaped domain Ω = [−1, 1] 2 \[−1, 0] × [0, 1] with a closed polygonal boundary Γ = Γ D , as shown in Figure 1, where nodes are represented as blue squares and the exterior continuous line represents the boundary with Dirichlet boundary condition Γ D ; this domain was chosen as it is often used for benchmarking FEM computations [41,44,50,51].
In order to represent the relationship between node and spatial coordinates, and between elements and nodes, it is necessary to define arrays that store this information. Thus, we define the array c4n, which represents the node-coordinate relation, and the array n4e, which represents the element-node relation to generate the domain; these are given in Table 1. In addition, the Dirichlet boundary conditions are represented by a data structure dir which contains nodal information; this is also given in Table 1. The first columns of the arrays c4n, n4e and dir represent the node number, element number and edge number, respectively.

Assembly
In this section, we assemble the stiffness matrix, the mass matrix, the load vector and the initial and boundary conditions. The assembly is standard and builds on the ideas for PDEs presented in [40,44]; however, we include it here for completeness, ensuring that the standard routines and the non-standard ones related to PIDEs are available in one location. In what follows, we take x = (x, y) and let (x 1 , y 1 ), (x 2 , y 2 ) and (x 3 , y 3 ) be the vertices of an element K and φ 1 , φ 2 and φ 3 the corresponding local basis functions in V. The sets of code which make use of the subroutines presented in this section are given in Appendix A.
For the sets of pseudocode that follow, we first need to establish a common notation; this is given in Table 2. Table 2. Some common functions and operations.

A(:)
All the elements of A, treated as a single column Return entries of A without repeating ZEROS(m, n) m × n matrix with all entries equal to zero [, ] Horizontal concatenation [; ] Vertical concatenation

Assembling the Stiffness Matrix
Since a simple calculation reveals Hence, we obtain Here, the indices are to be understood modulo 3, and K is the area of triangle K. Thus, using (11), the entries of the stiffness matrix can be computed as with indices modulo 3. This can be written as The pseudocode given in Algorithm 1 computes the local stiffness matrix contributions, based on Equations (17) for j ← 1 to r, do 5: A(n4e(j, :), n4e(j, :)) ← A(n4e(j, :), n4e(j, :)) + STIMA(c4n(n4e(j, :), :)) 6: end for 7: return A 8: end procedure

Assembling the Mass Matrix
The entries of the mass matrix M can be computed as For triangular, piecewise affine elements, we obtain the pseudocode for M, based on Equations (22) and (23), is given in Algorithm 3.

Assembling the Load Vector
The volume forces, which are calculated using Algorithm 4, are used for assembling the load vector.
Using the value of f at the center of gravity (x S , y S ) of K, the integral K f φ j dx is approximated by In addition, the load vector contains a contribution from the integral terms containing the kernel B; this is given in Algorithm 5.
Algorithm 4 Source function.

Boundary and Initial Conditions
The sets of pseudocode for the boundary and initial conditions given by Equations (2) and (3), respectively, are given in Algorithms 6 and 7, respectively. u 0 ← sin(π * x) * sin(π * y) 5: return u 0 6: end procedure

Improved Assembly
To improve the implementation, we present in this section an optimization of the algorithm that switches away from a looping approach towards the utilization of vector tools, available in software such as MATLAB. A brief description of these improvements follows. Similar approaches can be found in [40,43], and the sets of code supplied are modified versions of those given in [52]; they are included for completeness. The sets of pseudocode which make use of the subroutines presented in this section are given in Appendix B.

Assembling the Stiffness Matrix
The vectorization of the code for the stiffness matrix is done by using the commands SPARSE and RESHAPE, which allow us to store in the vectors I and J the indices related to the contributions of each node of each element for building the stiffness matrix A. This provides the possibility to evaluate the stiffness matrix A in one line, i.e., without a loop, as shown in Algorithm 8.  Similarly to the previous paragraph, we can build the mass matrix M by using the SPARSE command. We remark that, in this case, the evaluation of the matrix is reduced to the calculation of only two vectors, as shown in Algorithm 9.

Algorithm 9
Assembling the mass matrix with vectorization. A 6 ← 1 6 * area T

Assembling the Load Vector
In order to assemble the right-hand side, we refer to what we have done in Section 5.3 considering the volume forces and the center of gravity. Here, however, the improvement consists of neglecting the loop by inserting the MATLAB command ACCUMARRAY, which creates an array of given size, where the values are collected through the REPMAT command, which in this case returns a 3 × 1 array containing copies of the corresponding evaluation of the element.

Boundary and Initial Conditions
For the vectorized version also, the sets of pseudocode for the boundary and initial conditions are given in Algorithms 6 and 7, respectively.

Overview of an Implementation in Matlab
A minimal MATLAB implementation of the presented algorithm is available from Supplementary Materials. There are six zipfiles which contain the necessary MATLAB files therein:  In particular, the computation of the solution is carried out by SCHEME_NAME_PIDE.m, which sets the initial conditions via U0.m, and assembles the stiffness matrix and the mass matrix via StiffAssemb.m and MassAssemb.m, respectively; thereafter, solve_SCHEME_NAME.m updates the boundary conditions via Ud.m, assembles the load vector via f.m and obtains the solution at the current time step. Furthermore, note that whilst most of the subroutines are the same in both the vectorized and unvectorized versions of the code, those for StiffAssemb.m and MassAssemb.m are necessarily slightly different, in line with the discussion in Sections 5 and 6.

Numerical Results
In this section, we study a numerical example and compare the results obtained using different versions of the MATLAB-based sets of code; in addition, solutions were obtained using COMSOL Multiphysics. For the example, we consider the L-shaped domain and which is a kernel that occurs in different settings in many of the references given earlier, and others [2][3][4]7,14,53,54]; this leads to the following exact solution for u: u(x, y, t) = exp(−π 2 t) sin(πx) sin(πy). Figure 2a shows the exact solution to the problem, whereas Figure 2b corresponds to the backward-Euler approximation, where the left rectangular rule is applied to treat the Volterra integral term. Figure 2c,d show the solutions obtained using the backward-Euler approximation with the right rectangular rule and the Crank-Nicolson approximation with the trapezoidal rule, respectively, to treat the Volterra integral term. In addition, Figure 3 shows a comparison of the analytical solution with the solution computed by COMSOL Multiphysics with 66142 P1 elements for t = T, y = −0.5, where T = 0.1; as expected, the agreement is very good.
To illustrate the advantage of the vectorized code for each numerical scheme over the unvectorized one, which uses for loops for the assembly process, we have presented a comparison of the runtimes with respect to the number of elements, N. This is given in Table 3, which also shows the runtimes for COMSOL Multiphysics. Note that all sets of code were run on an Intel Core i7 Notebook with a 1.7 GHz processor and 8 GB of RAM. We can note from Table 3 that vectorized versions of the MATLAB sets of code presented in this article on the finest mesh get close to COMSOL's performance. Nevertheless, there are several factors behind the results. On the one hand, in the developed MATLAB sets of code, we discretize the Volterra integral term; as a result, one has to store the memory term values at all of the time discretization points. On the other hand, because of the particular form of the kernel in (26), and as explained in Appendix C, COMSOL Multiphysics does not need to evaluate the integral at all; even so, the difference in computational time is not decisive. However, and as also explained in Appendix C, we would not have been able to use COMSOL Multiphysics at all in this way if the kernel had not had a separable form, whereas the sets of code presented in this article do not have this limitation. In addition, we define an efficiency factor, C e f f , given by C e f f = t unvec /t vec , where t unvec is the runtime of the unvectorized code and t vec is the runtime of the vectorized code; the results for this are shown in Table 4. From the latter, we observe that refining the mesh increases the efficiency factor C e f f . This shows that the relative performance of the vectorized code improves significantly as the number of elements is increased. Table 3. Quantitative information on the runtimes of the unvectorized and vectorized MATLAB sets of code with different numerical schemes, and the same for code from COMSOL Multiphysics, when solving the PIDE. Abbreviations: BELR, backward-Euler scheme with left rectangular rule; BERR, backward-Euler scheme with right rectangular rule; CN, Crank-Nicolson scheme with trapezoidal rule; N, number of elements; t SCHEME unvec , runtime of the unvectorized code in seconds involving loops in the assembly process, using a particular method SCHEME (= BELR, BERR or CN); t SCHEME vec , runtime of the vectorized code in seconds without loops in the assembly process, using a particular method SCHEME; t C , runtime for COMSOL Multiphysics.

Conclusions
In this paper, we have presented a compact implementation using the finite-element method for solving linear PIDEs in arbitrary 2D geometries, both in terms of sets of pseudocode and in terms of a MATLAB code, which we have made openly available; we note that such a code is not available in other competing FEM software, such as freefem++ or deal.II. In addition, it was found that a vectorized version of the MATLAB code solves the model problem considered around three times more quickly than the unvectorized code on finer meshes. Furthermore, because of the particular form of the kernel in the Volterra integral term, the model problem could be represented in a form that was amenable to solution using the commercial finite-element software COMSOL Multiphysics. For this particular problem, in which the kernel has a separable form, COMSOL Multiphysics slightly outperforms even our vectorized code when using a mesh of around 66,000 elements.
Future development of this work would involve extending the sets of code to handle non-linear integral terms [55] and to three dimensions.

Acknowledgments:
The sixth author acknowledges the award of a visiting researcher grant from the University of São Paulo.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.