Groundwater Contaminant Transport Solved by Monte Carlo Methods Accelerated by Deep Learning Meta-Model

: Groundwater contaminant transport modeling is a vitally important topic. Since modeled processes include uncertainties, Monte Carlo methods are adopted to obtain some statistics. However, accurate models have a substantial computational cost. This drawback can be overcome by employing the multilevel Monte Carlo method (MLMC) or approximating the original model using a meta-model. We combined both of these approaches. A stochastic model is substituted with a deep learning meta-model that consists of a graph convolutional neural network and a feed-forward neural network. This meta-model can approximate models solved on unstructured meshes. The meta-model within the standard Monte Carlo method can bring signiﬁcant computational cost savings. Nevertheless, the meta-model must be highly accurate to obtain similar errors as when using the original model. Proposed MLMC with the new lowest-accurate level of meta-models can reduce total computational costs, and the accuracy of the meta-model does not have to be so high. The size of the computational cost savings depends on the cost distribution across MLMC levels. Our approach is especially efﬁcacious when the dominant computational cost is on the lowest-accuracy MLMC level. Depending on the number of estimated moments, we can reduce computational costs by up to ca. 25% while maintaining the accuracy of estimates.


Introduction
Groundwater quality might be affected by many natural and industrial processes. There are many sources of groundwater contamination, for instance, polluted precipitation, runoff from roadways, leaking barrels of waste chemicals, etc. [1] (p. 500). High concentrations of dissolved substances can result in water being unfit for drinking or irrigation. Our research focuses on observing groundwater processes in the vicinity of a deep geological repository (DGR) of radioactive waste. Despite all the protective layers, radioactive material will be exposed to the geosphere in the distant future. As a result, groundwater may become contaminated. Since it is not feasible to measure these processes directly, we model and simulate them numerically. Taking the uncertainties into account, we consider bedrock properties as random variables, specifically, spatially correlated random fields (SRF). Consequently, a quantity of interest (QoI) is also a random variable. We investigate statistical moments of a QoI in the first place and a probability density function (PDF) afterward.
Models can be process-based or data-driven [2] (p. 4). Our research takes advantage of both approaches. First, we run a stochastic process-based contaminant transport model. A physical process is numerically solved by the finite element method on an unstructured mesh with prescribed SRF. To estimate the expectation of QoI, the model run (=simulation) is performed multiple times in accordance with the Monte Carlo method (MC); for some applications, see [3][4][5]. However, MC suffers from high computational costs. Many highly accurate simulations must be performed to obtain statistical estimates with reasonably low variance, see [6] (p. 2). To save the cost, it is beneficial to construct a hierarchical system of less accurate but cheaper models that approximate the same QoI; for details, see [7] (p. 552). In the field of numerical modeling, the accuracy of a model is given by the number of mesh elements. Thus, we consider the hierarchy of models with gradually coarsened computational meshes, see [7] (p. 555). For this purpose, we adopt the multilevel Monte Carlo method (MLMC) [6] described in more detail in Section 4. For some MLMC applications, see [8][9][10][11].
A meta-modeling approach is another option for decreasing the total computational cost of MC. A function of inputs and parameters of a complex and expensive model is approximated by a simpler and faster meta-model, also called a surrogate model. Metamodeling techniques are widely used across scientific fields (see [12]), and groundwater modeling is no exception [13][14][15][16]. In terms of the classification of meta-models by Robinson [17] and Asher et al. [13], we concentrate our effort on data-driven meta-models. In our case, the input of the model is a 2D SRF, and the output is some scalar QoI. Asher et al. [13] (p. 5958) also provides a list of popular data-driven meta-modeling techniques, such as radial basis functions, Gaussian processes, support vector regression (SVR), and neural networks (NN). For their brief description, see [18,19]. We have focused on meta-models based on neural networks, the applicability of which to hydrological processes was discussed in [20,21]. The latter article also provides the basics of deep learning and briefly introduces some popular DNN architectures. Neural networks and deep learning have already been utilized in groundwater modeling to predict its: level [22], quality [23,24], flow [25,26], or contaminant transport [27]. With regard to the nature of input data of our model, it would be felicitous to use a convolutional neural network (CNN) as a meta-model. CNNs excel in learning spatial relationships within data on a regular grid such as an image. Since we use unstructured meshes, we cannot adopt a CNN directly (see [28]). We briefly discussed some ways to overcome this difficulty in [29]. Representing the structure of SRF as a graph (for graph theory, see [30]) and then using a graph convolutional neural network (GCNN) [31] is the option we selected.
The general objective of the presented article is to reduce the computational cost of Monte Carlo methods by incorporating a meta-model. In this regard, we build on the findings of our initial article [29], which addressed the same objective with a GCNN metamodel of a groundwater flow problem. In this article, we specifically focus on improving the accuracy of the meta-model and its applicability for groundwater contaminant transport problems. To the best of our knowledge, no previous research in groundwater modeling has investigated the computational cost reduction in Monte Carlo methods using a meta-model.
The article is organized as follows. First, the contaminant transport model is described. Next, its deep learning meta-model is presented in Section 3. The multilevel Monte Carlo method is briefly introduced, and the proposed incorporation of the meta-model is delineated in Sections 4 and 5. Section 6 is devoted to numerical experiments, meta-models are assessed, and their usage is put into the context of MC and MLMC. The article is concluded with a summary and discussion of the important findings.

Groundwater Contaminant Transport Model
We consider a conceptual 2D model of the transport of contaminants from DGR, see Figure 1. The Darcy flux of groundwater q [ms −1 ] is described by the equations: where p is the pressure head [m], and κ is the hydraulic conductivity considered to be smaller at a stripe near the surface. The Darcy flow is driven by the seepage face condition: on the top boundary, with q 0 denoting a precipitation flux. No flow is prescribed on the remaining boundaries. The surface geometry is intentionally overscaled to pronounce its impact on the groundwater flow. The advection equation is used as the contaminant transport model: where ϑ is the porosity and c is the concentration [kg m −3 ] of a contaminant in terms of mass in the unit volume of water. The equation is solved for the unknown concentration with the initial condition c 0 = 1 in the repository and 0 elsewhere. Zero concentration is prescribed at the inflow part of the boundary. The ϑ is given as a correlated random field. Then, the random field of hydraulic conductivity κ is determined from ϑ using the Kozeny-Carman equation and slight random scaling given by an independent random field. Figure 1 displays the geometry of the model, the random conductivity field (conductivity in the figure), and the concentration field (X_conc in the figure) at the time t = 3 of the total simulation time T = 20. The simulation is performed using the Flow123d software [32].

Graph Convolutional Neural Network Meta-Model
Graph neural networks (GNNs) are deep-learning-based models that operate on graph data [31]. They are adopted in many areas, including natural language processing, computer vision, bioinformatics, and physics. Similar to standard neural networks [33], they are categorized into several groups, such as graph convolutional neural networks (GCNN), graph attention networks, graph recurrent networks, etc. We employ GCNNs, which have a variety of different architectures. They can differ in many ways, such as the representation of a convolution operation, the size of a convolutional filter, the number of graph vertices taken into account, and many others. For a comprehensive survey on GCNNs, see [31,34].
In order to use GCNNs, the dual graph G = (V, E) of the computational mesh is considered with the vertices V corresponding to mesh elements and associated with the values of the input random fields. The edges correspond to the adjacent mesh elements. Figure 2 illustrates the graph representation of the SRF prescribed on an unstructured mesh.

ChebNet GCNN
After testing several architectures of GCNN, we ascertain that GCNN with the Chebyshev convolutional layer (ChebNet GCNN) [35] provides the best results in our case. It pertains to spectral GCNNs, which represent the graph convolution in terms of polynomials of a graph Laplacian matrix L.
For a brief general description, let * G be the graph convolutional operator, x ∈ R V be a vector of input values associated with graph vertices, and g θ ∈ R K×Ch be the convolutional kernel. The variable K stands for the maximal polynomial degree, and Ch represents the number of channels of the kernel [33] (ch. 9). Then, the graph convolution of x and g θ is: where p is a polynomial of eigenvalues of L and has maximum degree K. When using ChebNet GCNN, p K (L) is defined via Chebyshev polynomials T k : where θ k ∈ R Ch represents the learnable parameters of the kernel that are adjusted during the neural network learning procedure. The kernel is K-localized, i.e., the filtered signal at a particular vertex depends on information from vertices in its K-hop neighborhood. For a comprehensive explanation of GCNN, see [31] (p. 81).

Architecture of Meta-Model
Our deep learning meta-model consists of ChebNet GCNN followed by a deep feedforward neural network (FNN) [33] (ch. 6). As depicted in Figure 3, the input graph enters the ChebNet GCNN with K = 4 and Ch = 8. The following global summation pool sums up information of all vertices for each channel. This flattened output of ChebNet GCNN forms the input to the deep FNN with two hidden layers of 64 and 32 neurons.
To provide complete information about the neural networks employed, it is necessary to specify hyperparameters (generally about hyperparameters, see [33] (p. 120)). Our settings: Adam optimizer with learning rate 0.001; hidden layer activation function: ReLU; output activation function: identity; loss function: MSE; maximum number of epochs: 2000; and patience: 1000.

Assessment of Meta-Model
The following procedure was employed to assess different meta-models. Let samples, x i ∈ R V be an input vector of the hydraulic conductivity values of an SRF, and c i ∈ R be the corresponding simulated concentration on the surface (see Section 2). Since we conduct supervised learning [33], D is divided into learning (=training) samples L and test samples T . Then, L is evenly divided into L s , s = 1, . . . , S to repeat the whole meta-model learning procedure S times. For each repetition, data are pre-processed to facilitate learning: where std denotes the sample standard deviation.
Given L s , a meta-model learns a function f s (x) ∶ R x → R. The quality of the approximation is generally evaluated by a loss function, in our case, the mean squared error Namely, the training loss λ(L s , f s ) and test loss λ(T , f s ) are observed. For the sake of comparing the accuracy of different meta-models, we use the normalized root mean squared error (NRMSE = √ MSE std).
The training NRMSE std(c T ) are calculated. This metric allows us to draw a comparison among models on different mesh sizes. The lower the NRMSE, the better. Moreover, if the NRMSE is above 1, we can use a simple random generator instead of a complex meta-model.

Multilevel Monte Carlo Method
The multilevel Monte Carlo method [6] (MLMC) is based on the variance reduction principle [6] (p. 3). Many simulation samples are collected at low levels, with less accurate but inexpensive approximations of the model available. While much fewer samples are collected at high levels, there are differences between approximations of the model that are more accurate but computationally expensive. If these differences have a significantly smaller variance, we obtain estimates with the same accuracy but at a fraction of the cost compared to the standard Monte Carlo method.
Let P be a random variable and P 1 , . . . , P L be a sequence of its approximations, where P l−1 ≈ P l . The approximations are becoming more accurate from P 1 to P L . Then, the expected value of P L satisfies the identity: The MLMC estimates an individual expectation as follows: where L is the total number of levels, N l stands for the number of simulation samples on level l, ⟨⟩ N l denotes the average over N l samples. Pairs of random inputs (x l i , x l−1 i ) are both discrete evaluations of a single realization of a random field x(ω l,i ), while the random states ω l,i are independent and come from a given probability space (Ω, Σ, P).
When using estimator (7), we are particularly interested in its total computational cost C and estimated varianceV: where C 1 ,V 1 denote the cost and the estimated variance of P 1 . Meanwhile, for l > 1, C l ,V l represent the cost and the estimated variance of differences P l − P l−1 . For a comprehensive MLMC theory, see [6].
In the presented way, it is feasible to estimate the expectations of quantities derived from the original QoI. In particular, we utilize the so-called generalized statistical moments of QoI. In our study, the moments' functions φ(x) are Legendre polynomials, as they are suitable for the PDF approximation (see [36]).

Optimal Number of Samples
As previously mentioned, the principal motivation of our research is to reduce C. However, it is also essential to keep the variance of the estimator sufficiently low. To achieve this, we prescribe the target variance V t of the estimator. Then, the optimal N l is found by minimizing function (8) under the constraint After some calculus: whereV r l is an estimated variance of φ r l (x) − φ r l−1 (x) for r-th moment function on level l, φ r 0 = 0. Finally, N l = max r=1,...,R N r l . This procedure is crucial in our study and determines the results presented.
The Python library MLMC [37], developed by the authors of this article, is used to schedule simulation samples and post-process the results. The software also implements the maximal entropy method (MEM) to determine the PDF from moments estimated by MLMC. The description of MEM is beyond the scope of this article. The interested reader is advised to read [36,38] for a comprehensive study or [29] for a brief introduction.

Monte Carlo Methods with a Meta-Model
In this section, we propose combining Monte Carlo methods with a meta-model. We investigate two scenarios. The first one, MC-M, uses a meta-model within the standard MC. The latter, denoted as MLMC-M, extends the MLMC by a level of meta-model approximations.

MC-M
A meta-model fully replaces an original model within the standard Monte Carlo method. LetP L denote a meta-model approximation of P L , which is the most accurate approximation of P used in MC. Thus, the MC-M estimator has this form: This approach is strictly dependent on the approximation quality of the meta-model. The calculation of variance (Formula (9)) is unchanged, whereas the computational cost (Formula (8)) takes this form: where C sim 1 L s represents the cost of P L runs needed for learning procedure of the metamodel. The cost of a meta-model sample C 1 includes the necessary preprocessing and the cost of prediction. In this case, it comprises the costs needed for generating a random field and its adjustment to the form of the input of the GCNN. The learning cost of the metamodel C ml (L s , ψ) depends not only on learning samples in L s but also on hyperparameters ψ, especially the number of epochs, batch size, and the number of learnable parameters.

MLMC-M
The meta-model approximations form the new lowest-accuracy level of MLMC, which we denote as the meta-level. LetP 1 be a meta-model approximation of P 1 , and the difference between P 1 andP 1 forms the subsequent first level. Thus, the MLMC estimator is extended by a single level: Again, the total computational cost (Formula (8)) is affected: where C sim 2 is the cost of P 1 , and C meta 2 represents the cost of preprocessing an SRF, which was already generated for the corresponding simulation. It also includes the cost of the meta-model prediction, which, however, is negligible.
The distribution of the computational cost across levels affects the possible savings in C. Considering the MLMC estimator theorem (M. Giles [6] (Theorem 1)), there are bounds of V l ≤ c 1 2 −βl and C l ≤ c 2 2 γl , where β, γ, c 1 , c 2 are positive constants. Our approach is especially effective in the case of β > γ, when the dominant computational cost is at low accuracy levels. At the same time, adding another coarse level brings no savings, which is a case of unsatisfactorily scaling computational cost for models on low levels, often due to the overhead of the numerical solver. If β = γ, then the computational cost is spread evenly across levels, and the amount of savings depends on the total number of levels L. If the dominant computational cost is on high-accuracy levels (β < γ), then we cannot expect significant savings at all.

Analysis of Meta-Models
In order to evaluate the meta-model in accordance with our assessment procedure described in Section 3.4, the contaminant transport model was run on meshes of different sizes, and obtained data are used as D. From the Monte Carlo method's point of view, we generally set V t ≤ 10 −5 to obtain sufficiently accurate estimates of moments to decently approximate PDF by MEM. Consequently, we have at least 2000 samples at the lowest Monte Carlo level. Thus, from now on, we undertake all meta-model learnings with L s = 2000, and 400 samples out of L s account for validation samples, T = 50,000, S = 12. Table 1 shows the accuracy of meta-model approximations of models on different mesh sizes. Two architectures of meta-models are compared. The Deep meta-model is the one proposed in this article, whereas the Shallow meta-model was propounded in our previous study [29]. The accuracy is characterized by NRMSE (see Section 3.4). Apparently, the Deep meta-model outperforms the Shallow meta-model in all observed cases. Since J L stays almost steady and J L J T ≈ 1, our approach can be used regardless of the mesh size, at least up to the ca. 20,000 elements, larger meshes were not tested. Considering J L is just slightly below 0.8, we believe there is still some room for improvement in future research.
All meta-models were trained on GeForce GTX 1660 Ti Mobile graphics card with 16 Gb RAM. The time of meta-model learning ranges from around 400 s for a model on 53 mesh elements to about 3000 s for a model on 18,397 mesh elements. Table 1. Comparison of accuracy of meta-models for models on different mesh sizes.

Mesh Size
Accuracy  Figure 4 shows the computational cost (Cost(sec)) and moments error (J(μ,μ)). The cost is measured as the CPU time of simulations, including auxiliary processing and time needed for meta-model learning. The moments error has the form of MSE: J(μ,μ) = 1 S ∑ S s=1 μ s −μ s 2 2 , whereμ represents the moments estimated by the standard MC, whereasμ are moments calculated by either MC-M or MLMC-M. As expected, the computational cost increases with the precision of a model expressed as a number of mesh elements. As you can see, the cost can be greatly reduced (CS ≈ 87%) by using MC-M instead of MC. However, the meta-model error causes poor estimates of moments: J(μ,μ) ≈ 10 −3 . On the contrary, MLMC-M suppresses the effect of the meta-model error, and estimates of moments are obtained with J(μ,μ) ≈ 10 −5 . The cost savings are not so substantial but still significant: CS ≈ 25%. Generally, using MC-M can be reasonable if we have an exceptionally accurate meta-model or we intend to obtain just a basic idea of the moments and PDF in a short time. Otherwise, using the MLMC-M is recommended.

Multilevel Case
In this section, we investigate an MLMC-M with more than two levels. In order to demonstrate some properties and limitations of our approach, we introduce the following three pairs of Monte Carlo methods:  Figure 5 shows the computational costs of these Monte Carlo methods for a different number of estimated moments R. We see that 1LMC and 1LMC-M have a much higher cost than methods with more levels. This is the crucial observation that demonstrates the limitations of the standard MC.
Let us focus on the course of the costs depending on R. In all the cases, the lowest cost was obtained for R = 2. However, as R increases, the behaviors of the methods differ. For 1LMC, 1LMC-M, 2LMC, and 2LMC-M, we observe a similar course that is steady for R ⪆ 15. It means we can add more moments without affecting computational cost. On the other hand, for 3LMC and 3LMC-M, we observe a gradual increase in cost up to R = 35 and R = 75, respectively.
Regarding computational cost savings, in cases of 1LMC and 2LMC, there is CS ≈ 25% utilizing 1LMC-M and 2LMC-M, whereas 3LMC provides us with savings, from around CS ≈ 25% for R = 2 to just CS ≈ 2% for R = 100. As mentioned in Section 5.2, the computa-tional cost distribution across levels can be described using β and γ variables. Since for 1LMC and 2LMC, β > γ, the dominant computational cost is on the lowest-accuracy levels. Therefore, adding the meta-level results in a much higher CS (for R > 4) than for β < γ, which is the case with 3LMC. Thus, the behavior of our experiments corresponds to the theoretical properties of the MLMC. A closer look at the variances of moments across levelsV r l ( Figure 6) provides a rationale for the claims already presented. For clarity, we display only the first five moments (R = 5), which capture the behavior observed also in cases with more moments. To recall, we employ Legendre polynomials as φ(x); therefore,V 1 l = 0. The total computational cost (see Formula (8) for 3LMC-M has just a slight impact on C M due to the minor C 1 = 0.338 compared to C l , for l > 1. A more accurate meta-model could preventV MAX 2 from increasing for 3LMC-M. We faced a similar behavior for the groundwater flow problem investigated in our previous research [29]. It is good to be aware of this behavior, although the distribution of variances of moments across MLMC levels would deserve a much more detailed explanation, which is beyond the scope of this article.

Approximation of Probability Density Function
Finally, we use moments estimated by the cheapest Monte Carlo methods presented (3LMC and 3LMC-M) to approximate the PDF of the contaminant concentration c[kgm −3 ] on the surface, V t = 10 −5 , R = 25. Let ρ 3LMC , ρ 3LMC−M , ρ MC be the PDFs approximated based on the moments estimated by 3LMC, 3LMC-M, and reference MC. We run N = 50, 000 model samples on a mesh with 18,397 elements to obtain the reference MC. Figure 7 depicts the approximated PDFs, as well as the Kullback-Leibler (KL) divergence (for details, see [39]) used for the accuracy assessment of PDFs: D 3LMC = KL(ρ re f ρ 3LMC ) and D 3LMC−M = KL(ρ re f ρ 3LMC−M ). Considering the values of V t and R, we have decent PDF approximations. Both 3LMC and 3LMC-M provide almost the same results in terms of KL divergence. It meansμ are sufficiently accurate, in particular J(μ,μ) ≈ 8.6 × 10 −6 . Moreover,μ are obtained with CS ≈ 8%. If we are interested just in the expected value and the variance, which is very common, we have CS ≈ 25%. Importantly, we can decrease V t to obtain a better PDF approximation; in this case, the computational cost naturally increases, but CS does not change.

Discussion and Conclusions
This study presented the deep learning meta-model to reduce the computational costs of Monte Carlo methods. We followed up on our previous research [29] and improved the meta-model, which now consists of a graph convolutional neural network connected to the feed-forward neural network. This meta-model can better approximate models such as the tested groundwater contaminant transport problem. We showed that our meta-model could be trained with comparable accuracy for models solved on unstructured meshes from 53 to 18,397 mesh elements. We adopted Monte Carlo methods to obtain generalized statistical moments that are utilized to approximate the probability density function of the contaminant concentration on the surface above a future deep geological repository of radioactive waste. In order to reduce the computational cost of MC, two approaches were propounded. MC-M, a metamodel instead of a model within the standard Monte Carlo method, brings substantial cost savings (CS ≈ 87%). However, the accuracy of moments J was severely affected by the meta-model error and was just around J ≈ 10 −3 . Under the described experiment setting, it is tough to obtain minor meta-model errors for such a complex model such as the one we have. Thus, this procedure has limited usability. What is favorable, though, is the second approach. A meta-level is employed as the lowest-accuracy level of MLMC, denoted MLMC-M. Generally, this approach reduces the computational cost and keeps the error of moments low, J ≈ 10 −5 , which is sufficient for PDF approximations. We presented three pairs of MLMCs and MLMC-Ms to demonstrate the impact of a different distribution of computational cost across levels. In accordance with theory, the most significant cost savings (CS ≈ 25%) was achieved when the dominant computational cost was on the lowest-accuracy level. On the contrary, if the prevailing computational cost is on the highest-accuracy level, we have CS ≈ 2% in the worst case. Importantly, the computational cost and savings are affected by the number of moments R we estimate. In many applications, we are interested in the first few moments, such as the expected value or the variance. For that, we have CS ≈ 25% in all the cases. Intending to approximate PDF, we need at least around R = 25, then CS ≈ 8% in the worst-case scenario.
In our future research, we shall try to improve the accuracy of the current meta-model further and investigate the applicability of standard convolutional neural networks as a meta-model. We shall also rigorously describe the causes of the different course of variances of moments across MLMC levels. We shall deal with other applications in groundwater modeling, especially in connection with fractured porous media.  Data Availability Statement: All of the data supporting the findings of the presented study are available from the corresponding author on request.