Adaptive Reﬁnement in Advection–Diffusion Problems by Anomaly Detection: A Numerical Study

: We consider advection–diffusion–reaction problems, where the advective or the reactive term is dominating with respect to the diffusive term. The solutions of these problems are characterized by the so-called layers, which represent localized regions where the gradients of the solutions are rather large or are subjected to abrupt changes. In order to improve the accuracy of the computed solution, it is fundamental to locally increase the number of degrees of freedom by limiting the computational costs. Thus, adaptive reﬁnement, by a posteriori error estimators, is employed. The error estimators are then processed by an anomaly detection algorithm in order to identify those regions of the computational domain that should be marked and, hence, reﬁned. The anomaly detection task is performed in an unsupervised fashion and the proposed strategy is tested on typical benchmarks. The present work shows a numerical study that highlights promising results obtained by bridging together standard techniques, i


Introduction
Advection-diffusion problems occur in many applications, such as the simulation of temperature or concentration transport. The numerical solution of this kind of problem has attracted great attention, especially in the case when advection is dominant. Indeed, in this case, standard finite element methods may lead to numerical solutions with nonphysical oscillations, due to a lack of stability. One way to circumvent these difficulties is by adding some artificial diffusion to the discretization. Different stabilization methods have been proposed in the literature; among them, the streamline-upwind Petrov-Galerkin (SUPG) method and streamline-diffusion method (SDM) [1] are probably the most popular ones. In addition, in the numerical solution of stationary linear advection-dominated advection-diffusion problems, two main issues have to be considered. On the one hand, discontinuities in the form of shock-like fronts can occur. Hence, we need a discretization able to approximate these layers on the boundary or in the interior of the domain properly. On the other hand, in order to identify regions where the solution is less regular, it is necessary to have reliable estimates of the accuracy of the computed numerical solution. A priori estimates are often insufficient, since they only yield information on the asymptotic behavior of the error and require regularity assumptions about the solution that are not satisfied in the presence of singularities arising from interior or boundary layers. Thus, a posteriori error estimators should be considered. A posteriori error estimators are computable quantities that provide information about the numerical error, so that they may be used for making judicious mesh modifications. Their computation should be less expensive than the computation of the numerical solution. Moreover, the error estimator should be local and should yield reliable upper and lower bounds for the true error in a userspecified norm. Local lower bounds are necessary to ensure that the grid is correctly refined so that one obtains a numerical solution with a prescribed tolerance using a nearly minimal number of grid points. Indeed, a general approach to approximate the layers properly and reduce the number of unknowns is by using highly non-equidistant meshes instead of equidistant (or uniform) meshes. Starting from some uniform mesh, a numerical solution can be computed; then, by using some information from this, the grid can be adapted with some a posteriori knowledge, thereby obtaining a grid more suited to the problem at hand. This technique is referred to as an adaptive method based on a posteriori error estimation. Adaptive local refinement is therefore an important component for obtaining, in an efficient way, an accurate solution to advection/reaction-dominated problems and has received a great deal of attention in the past three decades; see, for instance, [2,3] for unstructured meshes and [4,5] for structured ones.
A key issue for adaptive refinement is good a posteriori error estimation. For advectiondiffusion-reaction equations, one of the initial studies for the comparison of different estimators using the streamline upwind Petrov-Galerkin (SUPG) solution of advectiondiffusion-reaction equations was done in [6], and it was shown that none of the estimators was robust with respect to the diffusion coefficient.
In [7], a robust estimator is proposed in the same norm in which the a priori analysis is performed for the SUPG method, namely the SUPG norm. Other examples of estimators obtained using different techniques can be found, for instance, in [8][9][10][11][12].
A final component needed to set up an efficient adaptive scheme is the choice of a proper marking strategy, which allows the selection of the regions to be refined. In theory, the marking strategy should ensure a specific error reduction with respect to the number of refined elements; see [13] and references therein for a review. In practice, most of the time, there are one or more hyper-parameters that need to be correctly tuned. In the present work, we rely on an anomaly detection algorithm that is able to cluster the mesh elements into two sets: the normal and the anomalous ones. In particular, the latter are the elements that will be refined. Anomaly detection usually refers to the task of identifying those behaviors that greatly differ from the standard trend; see [14] for a review. Some recent papers, e.g., [15][16][17], have exploited artificial intelligence techniques based on supervised learning to produce adaptive strategies for the solution of PDEs. In the current work, we adopt an unsupervised technique that does not require any training or any a priori knowledge of the correct mesh elements that should be marked. The anomaly detection task is performed in a completely unsupervised fashion by using the error estimators and the isolation forest algorithm.
The paper is organized as follows: in Section 2, we formulate the problem and we recall the main features of the adopted error estimators and the anomaly detection algorithms. In Section 3, we report several examples, which confirm the good performance of the proposed strategy. Finally, in Section 4, we discuss some possible future work.

Problem Formulation
We consider scalar advection-diffusion-reaction problems that, in general, can be modeled as: where Ω in R 2 is a Lipschitz domain with a polygonal boundary Γ. In particular, Γ consists of two non-overlapping components Γ D e Γ N , such that Γ =Γ D ∪Γ N and Γ D ∩ Γ N = ∅. In Equation (1), the function u is the unknown, while ε is the diffusivity coefficient is the advective field with ∇ · b = 0, α ≥ 0 a.e. in Ω is the reaction coefficient, f is the source and n is the unit outward normal vector. We will also assume the following: (Ω) and g ∈ L 2 (Γ N ). (A3) There exist two constants, β ≥ 0 and c b ≥ 0, independent of ε, such that As already mentioned in the Introduction, scalar advection-diffusion equations describe the transport of scalar quantities, such as temperature or concentration. This transport term is composed of a diffusive part and an advective part.
From the mathematical proprieties of these two components, it is possible to distinguish regular or exponential layers, parabolic layers and internal or boundary layers. The accurate computation of these singularity regions is crucial to obtain a reliable approximation of the numerical solution of advection-diffusion problems. Standard finite element methods may lead to numerical solutions with nonphysical oscillations, due to the lack of stability of the considered method. (Here, stability is not meant in the sense of the continuous dependence of the solution on the initial data, but rather in the sense of reducing the oscillations occurring in the numerical solution). Setting the weak formulation of problem (1) is: where we introduced the bilinear form a : and the functional : V → R: Let {T h } h be a family of conforming triangulations ofΩ into triangles T with diameter h T ≤ h, with h = max T∈T h h T . Let P m be the space of polynomials of degree at most m, and we set By projecting in the space V h ⊂ V of continuous piecewise linear finite elements and stabilizing with the SUPG method [1], we can write the generalized formulation: where The terms b h (u h , v h ) and m h (v h ) are the stabilizing terms and they are necessary to reduce the possible numerical oscillations due to the classical Galerkin formulation.

Mesh Adaptation
We are interested in those applications where the diffusive term −div( ∇u) is dominated either by the convective term b · ∇u or by the reaction term αu. In such cases, the solution may exhibit some boundary layers that arise from strong gradient variations. Since the presence of abrupt changes in the behavior of the solution is usually localized to certain regions, it is meaningful to adopt adaptive strategies in order to increase the number of degrees of freedom in those local regions only. This allows us to have more control and hence better final accuracy is reached in the approximation of the solution. In practice, the adaptivity of the mesh is obtained by adopting a posteriori error estimators. In the present work, we decided to compare the performance of three error estimators considered in their basic formulation; see [18]. In fact, according to the type of problem at hand, more advanced and specific versions of each of the considered error estimator can be developed. In our case, we decided to design a strategy that would make equivalent the different behaviors that are usually observed when different marking strategies are adopted, or when different problem settings are considered. For the sake of clarity, we recall the main features of the adopted error estimators.

Residual Error Estimator
The first error estimator considered is the residual type η R , defined as: where we denote by Res T and Res E the residual computed on each triangle T and on each edge E, respectively. Meanwhile, h T and h E refer to the diameter of the triangle T and to the length of the edge E in the triangle T. The usual definitions for Res T and Res E are adopted: where n E denotes the unit normal vector to the edge E and the symbol [·] E denotes the jump of a function across the interface E.

A Zienkiewicz-Zhu Type Error Estimator
This type of error estimator is based on reconstructing the gradient of the original solution by a simple averaging technique. In particular, it is defined as: In our case, the function G(u h ) is constructed by evaluating the function ∇u h on the barycenter of each triangle and then by averaging across those triangles that share a common vertex.

Error Estimator Based on the Solution of Auxiliary Local Problems
As the last error estimator, we consider an error estimator constructed by solving auxiliary local Neumann problems: where the symbol | · | denotes the energy norm and v T is a function belonging to the set V T , which approximates the solution of the following auxiliary problem: Note that, in definition (10), we use the bubble functions related to the triangle T and to the edge E; see [19] for additional details.
The mesh refinement and the computation of the finite element approximate solution were performed by using the software FreeFem ++, version 3.5 [20]. In particular, the software makes available an automatic mesh generator, based on the Delaunay-Voronoi algorithm [21], and a metric-based anisotropic mesh adaptation function [22].

Anomaly Detection
The term anomaly detection is used in the context of time-series or Big Data whenever there are outliers or anomalous points to be identified. In particular, points that follow a standard trend or can be observed to have expected behavior are labeled as normal; otherwise, they become anomalous. In the present work, we use the Isolation Forest (IF) algorithm [23,24], which refers to an unsupervised technique developed to detect anomalies in the considered dataset. In order to isolate the points that deviate from the expected trend, the IF constructs a forest of binary trees by randomly selecting a feature and then randomly selecting a split value. The resulting number of required splittings is equivalent to the path length from the root node to the terminating node. Anomalies usually produce rather shorter paths compared to normal points. IF has a linear time complexity and it does not need any labeled data; hence, it works in a completely unsupervised fashion. In the present work, we use the implementation provided by the Python library scikit-learn [25]. In order to control the randomness, and hence to make the experiments reproducible, we set the parameter random_state = 0. All the other parameters are used with the default settings. The algorithm that we propose employs any error estimator described in the previous subsection in order to acquire an estimate of the error function, localized at every triangle. Then, the IF takes as input an array of values containing the error estimator for each triangle. The anomalies are those values where the estimated error is rather large. Therefore, the anomalous values correspond to those triangles that should be refined. In order to improve the performance of the adopted error estimator, we also ran some benchmarks by setting the contamination parameter c = 0.3. The contamination acts as a cutoff on the returned values. In particular, c = 0.3 means that only the top 30% negative scores are labeled as anomalies. The set-up for c was experimentally derived and was in line with the fact that the boundary layers are usually confined to specific regions; hence, usually, there is no need to refine large areas of the computational domain. In certain cases, the use of c = 0.3 helped to achieve a final refinement even more localized to the problematic areas and, thus, the performance of the three error estimators became equivalent in terms of the reached accuracy per number of refinements. An outline of the proposed algorithm is presented in Algorithm 1.
A: vector containing 1 for normal η T and −1 for anomalous η T ; end end begin Construction of anisotropic mesh adaptation function h;

Numerical Results
In this section, we show the numerical results on four different benchmarks. Since the exact solution is always unknown, we estimated the L 2 norm and the H 1 seminorm of the error e := u ex − u h , by computing u ex with a very fine triangular mesh. In this case, the initial mesh is globally and uniformly refined for several levels. Global uniform refinement prevents an accurate representation of the solution from being reached; hence, we stopped the adaptive refinement procedure when the error values became stagnant or oscillating around a certain threshold. In order to increase the stability of the computation of the solution u ex , we improved the used quadrature formula in the evaluation of the integrals for the Galerkin method by setting the parameter qft = qf7pT or by using qft = qf9pT, which are Gaussian quadrature rules of order 8 and 10, respectively; see, e.g., [26], available with the FreeFem integration routines. In general, when the order increases, the Gaussian rule may become unstable; moreover, due to the triangular domains, suitable rules should be constructed; see, e.g., [27][28][29] and references therein. In our tests, we could obtain more stable results for the computation of the L 2 norm but, for the H 1 seminorm, we could still observe some divergent cases. For every example, we compared the achieved results with the global uniform refinement strategy, in terms of accuracy and in terms of the order of convergence. In order to obtain the appropriate error reduction, the produced mesh should satisfy certain optimality requirements; see, for example, [13,30]. Since this aim goes beyond the scope of the current paper, we only conducted the following experiments by using the anisotropic mesh adaptation function provided by the FreeFem library.

Example 1: A Reaction-Diffusion Problem
The first example is a reaction-diffusion problem ( [31]), where the velocity field b is zero. In this case, the computational domain is the unit square, Ω = [0, 1] × [0, 1], the source f = 0, ε = 10 −3 e α = 1. The boundary conditions are shown in Figure 1, while an approximation of the solution is shown in Figure 2.  Since homogeneous boundary conditions are assigned, besides at the four corners, we expect the solution to be zero almost everywhere except at the corners where it spikes to one. In Figure 3, on the left, we report the adapted mesh by using the residual error estimator. The four corners are highly refined, but we can also observe a rather dense grid in the middle of Ω, which seems unnecessary for this example. Therefore, we applied a threshold to the percentage of values that should be considered anomalous. By trying several different values, we experimentally derived that a good setting for the contamination parameter in the IF algorithm was 0.3. In these terms, only the points with the top 30% negative values will be considered anomalous. With this strategy, we can localize even more by neglecting a certain percentage of triangles that would otherwise be marked as anomalous. The obtained result with this setting can be seen in Figure 3, right. In Table 1, we report the estimated error in the L 2 norm and in H 1 seminorm, together with the number of elements (NT) per level of refinement (n), in both cases, i.e., c = auto and c = 0.3. The improvement with the second strategy is evident especially at the final stage: with the same order of magnitude of employed elements, one order of accuracy is gained in the L 2 norm. Finally, in Figure 4, we show the comparison with global refinement.   In Figure 5, we show the results obtained by using the gradient recovery-based error estimator η Z . Moreover, in this case, we performed two tests: one with c = auto and one with c = 0.3. Table 2 collects the L 2 norm error and H 1 seminorm error. In this case, the benefit of introducing the cutoff percentile is less evident than in the previous example, but we notice that by reducing the number of elements at the first level, more elements were marked at the successive levels compared with the standard case (i.e., c = auto). This behavior did not lead to a significant gain in accuracy, but the resulting mesh appears more suitable to handle this example since the refinement is more localized at the four corners. In Figure 6, we show the comparison with global refinement. Finally, in Figure 7, the results obtained with η N and with c = auto (on the left) and c = 0.3 (on the right) are shown. In Table 3, we report the results for the L 2 norm and H 1 seminorm estimates. The resulting mesh, adapted with c = 0.3, does not improve the reached accuracy compared to the case c = auto. In fact, the refinement, besides appearing not to be symmetric, appears to be less localized. This can happen as the contamination parameter is only useful to select the top 30% of anomalous triangles, which are not necessarily the ones that exhibit the largest error among the anomalies. In Figure 8, we show the comparison with global refinement. In all three cases, the adaptive refinement achieved better accuracy and a better order of convergence compared to global refinement per used degrees of freedom.

Example 2: The Channel Test
We report the second benchmark presented in [32], "the channel test ", which is an advection-dominant problem. The domain Ω is chosen to be L-shaped: The data of the problem are ε = 10 −3 , f = 0, α = 0, b = [y, −x], while the boundary conditions are shown in Figure 9. In Figure 10, on the left, we show the final stage of the adapted mesh according to the residual-based error estimator, while, on the right, we show a 3D plot of the computed solution on the final mesh. The error estimates for the L 2 norm and H 1 seminorm are reported in Table 4.  The η R and the anomaly detection strategy in this case led to the refinement of very few elements at each step. At the end, the two circular layers and the boundary layer near the non-convex angle were correctly identified and refined. The error e in both L 2 norm and H 1 seminorm steadily decreased. In this case, using c = 0.3 appeared meaningless as few elements were refined at each level. In Figure 11, we show the comparison between the adaptive refinement obtained by using η R and the global refinement strategy. The adaptive refinement at certain levels decreases with a superlinear order of convergence in the L 2 norm. Meanwhile, on the H 1 seminorm, the behavior is not monotonic but we can still observe convergence, with better final accuracy compared to the global refinement approach. The resulting mesh obtained by using the gradient recovery error estimator and the η N error estimator are shown in Figure 12, and in Tables 5 and 6, we report the error estimates.   We note that, for this example, the adaptive refinement was interrupted when the error became stagnant and, hence, by looking at the reached accuracy, the three error estimators seem rather equivalent. Looking at the resulting adapted meshes, the one obtained with η Z is the less refined at the larger circular inner layer. Regarding the boundary layer on the short edge of the L-domain and the smaller circular layer, η R , η Z and η N provided good error estimation for the IF algorithm to correctly label the anomalous triangles. Finally, in Figures 13 and 14, we show the comparisons with the global refinement. Moreover, in this case, the adapted meshes were better than the uniform globally refined mesh, as we could observe a greater decrease in the error and hence better final accuracy per used degrees of freedom.

Example 3: The Pinched Domain
In this example, we consider a disk with a hole. The boundary of the domain consists of two disconnected components: Γ 1 := {x = cos(t); y = sin(t) ; t ∈ [0 , 2π]} is the outer boundary, while Γ 2 := {x = 0.3 + 0.3 cos(t); y = 0.3 sin(t) ; t ∈ [0 , 2π]} is the boundary of the inner hole. The input data are: ε = 10 −10 , f = 0, α = 1, b = [2, 1]. We apply discontinuous Dirichlet boundary conditions: u = 0 on Γ 1 and u = 1 on Γ 2 . Therefore, we expect the rise of the two inner layers. The results for the adapted meshes with the three error estimators are shown in Figures 15-17. In Tables 7-9, we report the error estimates in the L 2 norm and H 1 seminorm.      For this example, the poorest performance is obtained by the η Z error estimator. Indeed, the final mesh displayed in Figure 16, on the left, shows an over-refinement spread along two narrow stripes across the whole domain Ω. This seems unnecessary as the main change should happen around the boundary Γ 2 ; hence, we produced another mesh-see Figure 16 on the right-to limit the number of triangles that would be refined. The use of c = 0.3 did not produce any significant change when η R was employed, while, when η N was adopted, we could appreciate a drastic reduction in the refined number of elements at every level, which did not affect the reached accuracy.
In Figures 18-20, we report the comparison with the global refinement strategy. Regarding the use of η R and the contamination parameter c = 0.3, the performance becomes worse, as it becomes comparable to global refinement. Concerning the η Z error estimator, although the error reduction is still better than global refinement in the L 2 norm, we can observe a rather uniform behavior when it comes to the H 1 seminorm. Regarding the η N , the achieved results are better than global refinement for the L 2 norm, while, in the H 1 seminorm, the use of the contamination parameter yields a relevant reduction in the error per used degrees of freedom.

Example 4: Parabolic and Exponential Boundary Layers
As the last example, we consider the problem reported in [6]. The computational domain Ω is the unit square. The input data are ε = 10 −6 , b = [1; 0], α = 0, f = 1, Γ = Γ D and u D = 0. The solution exhibits two parabolic layers for y = 0 and y = 1, and one boundary layer on the right edge of Ω. In Figure 21, we show a 3D plot of the computed solution on the final adapted mesh obtained with η R .
The results of the refined meshes are shown in Figures 22 and 23 for the residual-based, gradient-based and η N error estimators, respectively.   In Table 10, we report the results obtained by using c =auto and c = 0.3 when η R is employed. By setting a contamination value that differs from the default setting, we can better refine the regions with the interested layers. Finally, in Tables 11 and 12, we report the error estimates computed with η Z and η N , respectively, and c = auto. This final benchmark is the only one where the use of η N failed to capture the boundary layers. In fact, the values of η N,T are very small almost for every triangle T, at every level; thus, at the end, the adaptive refinement almost appeared as a uniform global refinement. In this case, even the use of a contamination parameter could not help to significantly improve the final result. The test was also repeated with smaller values of contamination, but the final mesh was still not satisfactory. In Figure 23, on the right, the final mesh obtained at level n = 5 with c = 0.1 is shown. We cannot appreciate a significant difference with the error obtained with η R or η Z but simply because we are comparing with a solution obtained exactly by uniformly and globally refining the initial mesh. Only the mesh obtained with η R and η Z is correctly refined at the boundary layer location.  In Figure 24, we show the comparison of η R against the global refinement strategy. In this example, the L 2 norm of the global refinement is more accurate than the adaptive mesh obtained either with c = auto or with c = 0.3. The H 1 seminorm of the global refinement is very unstable, instead.
In Figure 25, the mesh adapted with η Z is able to produce a smaller error in the H 1 seminorm than the global refinement, while the final reached accuracy is comparable in the case of the L 2 norm.
In Figure 26, we can observe how the behavior of η N is very similar to the global refinement strategy, at least for this last example. Therefore, in this case, both strategies give comparable results in terms of accuracy and in terms of the order of convergence.

Discussion
The numerical results obtained with the proposed strategy are interesting and rather promising. The introduction of an anomaly detection technique, with default parameters, in place of deciding for a marking strategy, improves the robustness of the performance of the three error estimators in terms of the reached accuracy per level of refinement. Further research should be devoted to designing suitable algorithms to choose different contamination settings, either for the IF or for other unsupervised techniques. At the current stage, we do not exploit the contamination parameter c at its best as only the top percentage c anomalies are the triangles that will be refined. To improve the robustness of the choice of which triangle should be neglected, an extra module should be added to the algorithm in order to interact with the connectivity matrix for the produced triangulation inside the FEM procedure. Moreover, other tools of unsupervised learning approaches could be added; see, e.g., the recent work [33]. Thus far, by only using a fully automatic setting, the residual-based error estimator η R was the only one to correctly detect all the boundary layers analyzed. A deeper statistical analysis could also be conducted by taking into account more examples belonging to the same category, e.g., reaction-dominant problems or convective-dominant problems, or only problems with circular layers or parabolic layers, etc. Hence, the creation of a rich database would allow us to test and to objectively validate the obtained results. Having robust strategies that allow for better control and thus a smaller number of degrees of freedom is of fundamental importance, not only from the computational point of view but also for better accuracy in reproducing the observed physical phenomenon.