1. Introduction
Tetrachloroethene (PCE), a common chlorinated solvent, has been widely used as a dry-cleaning solvent and a metal degreaser. It is among the most widespread and persistent organic contaminants in groundwater resources [
1]. PCE is also known as a dense non-aqueous phase liquid (DNAPL) due to its high density and low solubility in groundwater. As a result of these properties, traditional remediation approaches, such as pump-and-treat and vapor stripping, are very expensive and sometimes inapplicable for PCE contaminants in subsurface environments [
2]. Therefore, in-situ bioremediation is considered to be a promising method for site remediation of chlorinated solvents, because it requires less intervention and is more cost-effective [
3,
4,
5,
6].
Multicomponent reactive transport modeling is an important tool for predicting the fate and transport of reactive contaminants in groundwater by integrating simulations of water flow, biochemical reaction and solute transport [
7,
8]. However, the uncertainty in related parameters of the reactive transport model poses a major challenge to reliable simulations [
9]. In practice, the spatially variable aquifer properties such as hydraulic conductivity and porosity have significant effects on groundwater flow and solute transport modeling [
10,
11,
12]. Additionally, the reactive transport model that characterizes the reaction and migration of contaminants in subsurface aquifers is quite sensitive to biochemical parameters (e.g., rate constant) and contaminant source characteristics (e.g., source strength, source location) [
13,
14]. Therefore, the simultaneous estimation of hydraulic parameters, biochemical parameters and contaminant source characteristics are of vital importance to achieve satisfactory accuracy of the forward model.
Direct determination of specific model parameters is usually infeasible in heterogeneous subsurface environments. To solve this problem, information obtained from indirect measurements (e.g., contaminant concentration and hydraulic head) is used to estimate uncertain model parameters by solving inverse problems [
15]. Many researchers have focused on the calibration of key parameters of reactive transport models using inverse methodologies in the past few years. Bailey and Baù [
14] employed an ensemble Kalman filter (EnKF) method to estimate the spatial distribution of the first-order rate constant of a reactive transport model by assimilating the concentration data. Bailey et al. [
16] estimated the spatially variable denitrification rates in an agricultural groundwater system by assimilating the concentration and mass data of nitrate, which is implemented with the help of the Ensemble Smoother (ES). Dai and Samper [
17] investigated the ability of a general methodology for solving inverse problems in a column experiment and found it useful in the estimation of model parameters. Carniato et al. [
18] compared the performance of weighted least squares (WLS), weighted least squares with weight estimation (WLS(we)) and a multivariate (MV) approach on parameter estimation of reactive transport models and found that residual correlation did not evidently affect the predictive uncertainty. Nevertheless, the joint estimation of various parameters of reactive transport models is still limited to a few studies [
19,
20,
21].
In recent years, Bayesian inference has become a prevalent method for parametric uncertainty quantification [
22,
23,
24]. According to Bayes’ theorem, the unknown parameters are treated as random variables whose posterior probability distributions can be calculated by updating the prior probability with information provided by the observations [
25]. For highly nonlinear systems, the posterior distributions of unknown variables are difficult and in most cases impossible to evaluate analytically [
26]. To address this issue, Markov Chain Monte Carlo (MCMC) methods [
27,
28] have been widely used to approximate the posterior distribution of unknown parameters by generating random samples from the probability distribution. Cao et al. [
29] integrated the multi-try differential evolution adaptive Metropolis (MT-DREAMzs) algorithm with the nested sampling estimator (NSE) to improve the performance of marginal likelihood estimation. Shi et al. [
30] tested the validity of Gaussian assumptions for the parametric uncertainty quantification in a reactive transport model. Vrugt et al. [
31] proposed the Shuffled Complex Evolution Metropolis algorithm (SCEM-UA) to inverse the hydrologic model parameters and found it more efficient than traditional Metropolis-Hastings samplers. Yan et al. [
32] developed a Bayesian-based integrated approach, in which a Kriging surrogate model (KSM) is used to improve the computational efficiency to identify the groundwater contamination sources. Nevertheless, MCMC methods are considered to be computationally expensive because a large number of model evaluations are required for high-dimensional parameter space exploration [
33]. Consequently, the applications of MCMC methods to inverse modeling of reactive transport are still limited [
34,
35].
The ensemble Kalman filter (EnKF), which is a Monte Carlo implementation of the Kalman filter [
36], is a computationally efficient alternative to the MCMC methods. The EnKF algorithm proposed by Evensen [
37] is able to update model parameters and state variables through sequential data assimilation of measurements. Recently, the EnKF algorithm has been widely used for high-dimensional nonlinear data assimilation in geophysical [
38], atmospheric [
39] and hydrological [
40,
41,
42,
43,
44] modeling. However, the application of EnKF to numerical models that involve multiple processes would be inconvenient because the implementation of EnKF requires the modification of restart files and the update of model parameters simultaneously at each assimilation step [
45]. The ensemble smoother (ES) proposed by van Leeuwen and Evensen [
46] is a more efficient alternative to EnKF when dealing with high-dimensional parameter estimation problems. Instead of implementing the sequential data assimilation scheme as EnKF, ES computes a global update of parameters by assimilating all observations simultaneously [
45]. Skjervheim and Evensen [
47] compared the performance of ES and EnKF for solving history matching problems of reservoirs and found that ES could provide similar results to EnKF. The difference is that ES works more efficiently than EnKF because it can be implemented without restarting the simulation model multiple times.
However, both the standard ES and standard EnKF algorithm cannot provide satisfactory data matches when dealing with highly nonlinear problems [
48,
49,
50]. To further improve the performance and efficiency of ES, the iterative forms of ES have been developed recently. Chen and Oliver [
51] proposed an iterative ensemble smoother (IES) algorithm based on the Levenberg–Marquardt (LM) method, in which the ensemble randomized maximum likelihood (EnRML) was used as the smoother. Emerick and Reynolds [
52] developed an ensemble smoother with multiple data assimilation (ES-MDA) to solve the reservoir history-matching problem. Li and Davis [
53] compared the performance of ES and IES in groundwater modeling and their results showed that IES works much better than the standard ES by continuously updating parameters with the available measurements for all time steps. Ma et al. [
54] developed an adaptive IES method (ES-LM) based on the Levenberg-Marquardt algorithm which can reduce the computational complexity of ES. Zhang et al. [
55] proposed an iterative local updating ensemble smoother (ILUES) for parameter estimation in hydrologic models. In the framework of ILUES, an iterative form of ES is used to update the local ensemble of each sample, which is defined based on a comprehensive measure of distance to the sample and the measurements. The performance of the ILUES algorithm was evaluated with several numerical examples and the results showed that ILUES could provide an accurate estimation of model parameters with multimodal distributions. Thus, the ILUES algorithm is of great potential to solve highly nonlinear problems such as the inverse modeling of multicomponent reactive transport.
In this study, the ILUES algorithm is employed to estimate the key parameters of a multicomponent reactive transport model. The numerical model, which characterizes the sequential biodegradation of tetrachloroethene (PCE), describes the coupled simulation of groundwater flow, biochemical reaction and solute transport. As the subsurface heterogeneity has significant effects on the biodegradation process, the key model parameters, including the hydraulic paramters, biochemical parameters and contaminant source characteristics, are jointly estimated using the ILUES algorithm [
55]. The major objective of this study is to verify the effectiveness and efficiency of the ILUES algorithm for solving the inverse problem of the multicomponent reactive transport model.
To the best of our knowledge, this is the first study that focuses on the application of the ILUES algorithm to joint estimation of hydraulic parameters, geochemical parameters and contaminant source characteristics in a sequential biodegradation process. Although model parameter estimation with inverse methods has been investigated recently, related studies seldom focus on the reactive solute transport process, which is difficult to analyze due to the high dimensionality. Furthermore, the ILUES algorithm in this study is modified by determining the local ensemble partly with a linear ranking selection scheme [
56] which is able to more extensively explore the parameter space. The performance of the ILUES algorithm is then evaluated through three different numerical case studies. A comparison between ILUES and ES-MDA is also made to demonstrate the advantage of ILUES on estimation accuracy.
The organization of the paper is as follows: in
Section 2, the basic theory of the sequential biodegradation of PCE is explained. Then the general framework and mathematical formulation of the ILUES algorithm is introduced. In
Section 3, three different numerical case studies based on the biodegradation of PCE are utilized to test the performance of the ILUES algorithm. The results and discussion of corresponding case studies are also presented in this section. The conclusions of the current study are drawn in
Section 4.
2. Methodology
2.1. Sequential Anaerobic Biodegradation of PCE
In this study, PCE contaminants undergo a sequential reductive dechlorination reaction whose degradation kinetics are assumed to be first order in nature. In the process of dechlorination, microorganisms can utilize PCE as the electron acceptor, sequentially removing chlorine atoms from PCE to form trichloroethene (TCE). TCE is degraded to dichlorethene (DCE) and then vinyl chloride (VC) in turn. Finally, non-toxic ethylene (ETH) is produced by the degradation of VC. The typical sequential reductive dechlorination framework of PCE under anaerobic conditions can be written as:
where
denotes the first-order rate constants of PCE and its daughter products. The governing equations which characterize the transformation and transport of PCE and its daughter products are represented by the following partial differential equations:
where
is the retardation factor;
denotes the hydrodynamic dispersion coefficient
;
,
,
,
represent concentrations of corresponding contaminant in the dechlorination reaction
;
is the pore velocity
;
is the volumetric flux of water per unit volume of aquifer
;
is the soil porosity;
,
,
,
are concentrations of corresponding contaminants in the dechlorination reaction at source point
;
is the first-order anaerobic degradation rate
; and
,
,
are chlorinated compound yield values under anaerobic reductive dechlorination conditions. In this study, groundwater flow and reactive transport models are simulated using MODFLOW-2000 [
57] and RT3D [
58,
59], respectively.
2.2. Parameter Estimation
Parameter estimation of reactive transport models aim to obtain the accurate estimations of key model parameters of groundwater flow and solute transport by solving inverse problems. Bayesian inference is an important method in inverse modeling and has been widely used in hydrologic science.
Considering the Bayesian inference for a nonlinear model, the relationship between model parameters and measurements can be expressed in the following form:
where
is the vector for measurements,
is the vector for model parameters,
is the prediction from the forward model, and
is the vector for Gaussian-distributed measurement errors.
According to Bayes’ theorem, posterior distribution is proportional to likelihood times prior distribution, which can be written as:
where
is the posterior probability distribution,
is the prior distribution and
is the likelihood function that measures the goodness of fit of the model to observations of the unknown parameters.
In this study, the analytical evaluation of the posterior distribution of unknown parameters is unavailable due to the high nonlinearity of the model. For parameter estimation problems in nonlinear models, ensemble smoother (ES) is a prevalent method because of its effectiveness and efficiency. However, when dealing with highly nonlinear problems, the standard ES method is unable to obtain satisfactory estimations. In this paper, an iterative local updating ensemble smoother (ILUES) is employed, which is proposed by Zhang et al. [
55], to jointly estimate various parameters of the reactive transport model. In comparison with ES, the ILUES algorithm updates the local ensemble of each sample with the iterative form of ES, instead of globally assimilating parameter realizations in the ensemble. In this way, ILUES has a better performance on highly nonlinear problems such as the inverse modeling of reactive transport.
2.3. Iterative Local Updating Ensemble Smoother
The basic updating equation of ES can be written as:
for
,
denotes the size of ensemble members.
According to Equations (1) and (8), is an ensemble of samples randomly drawn from the prior distribution, is the updated ensemble after taking measurements into account, is the vector for predictions from the forward model, is the cross-variance matrix between and , is the auto-covariance matrix of , is the covariance matrix of the measurement errors, represents the measurement of model parameter with error of .
On the basis of Bayesian inference and the ES method, the implementation of ILUES can be summarized in the following steps and its framework is represented by the flowchart in
Figure 1.
Step1: Initialization.
To begin with, set the iteration counter to 0. samples are randomly drawn from the prior distribution of model parameters as the initial input ensemble. Therefore, the corresponding output ensemble can be obtained by evaluating the forward model.
Step 2: Determination of local ensemble.
The local ensemble of the sample
is defined based on a comprehensive measure of the distance (
) from aspects of the model responses (
) and the model parameters (
).
where
denotes the distance between the measurements
and the model responses
,
denotes the distance between the selected sample
and the model parameters
,
and
are the maximum values of
and
,
is the auto-covariance matrix of the model parameters.
According to the original ILUES algorithm proposed by Zhang et al. [
55], the local ensemble of
is determined by selecting
samples with the lowest
values, expressed as
. In this paper, a linear ranking selection scheme [
56] is combined to determine the local ensemble according to the
values. The selection probability of each sample is expressed as:
for
Where ,, represents the rank of each individual after sorted by the fitness value of .
To better explore the parameter space, this rank-based selection operator is used together with the original selection scheme in ILUES, which means that the first 80% of samples are determined by selecting samples with the lowest values and the remaining 20% of samples are determined by the linear ranking selection scheme.
Step 3: Update the local ensemble.
Update the local ensemble with the basic ES method, written as:
for
.
In the above equation, is the cross-covariance matrix between and , is the auto-covariance matrix of measurement , represents the measurement of model parameters with error of .
Then choose a random sample from the updated local ensemble as the updated sample of . The updated global ensemble can be obtained after this procedure.
The framework of the ILUES algorithm can be generalized into the following flowchart.
4. Conclusions
In this study, an iterative local updating ensemble smoother (ILUES) method is employed to jointly estimate the hydraulic parameters, biochemical parameters and contaminant source characteristics of a reactive transport model. To better explore the parameter space, the original ILUES algorithm is modified by determining the local ensemble partly with a linear ranking selection scheme [
56]. The reactive transport model is constructed to simulate the sequential anaerobic biodegradation process of tetrachloroethene (PCE). The applicability and accuracy of ILUES is evaluated by three numerical case studies. In all case studies, the uncertainty in model parameters is significantly reduced with the implementation of ILUES, which demonstrates the validity of ILUES for joint parameter estimation.
The inversion of unknown model parameters is of vital importance to predict the fate and transport of reactive contaminants in groundwater. The Markov Chain Monte Carlo (MCMC) is the most widely used method to approximate the posterior distribution of unknown parameters. However, as MCMC methods typically require a large number of model evaluations, they are considered to be computationally intensive and may not be applicable to high-dimensional inverse modeling of reactive transport. The ensemble Kalman filter (EnKF) and ensemble smoother (ES), which are based on the theory of data assimilation, are more efficient alternatives to MCMC for parameter estimation in nonlinear systems. Nevertheless, for highly nonlinear problems, both the standard ES and standard EnKF methods cannot provide satisfactory inversion results with sufficient accuracy. Under these circumstances, the ILUES algorithm is developed to improve the accuracy and efficiency of parameter estimation for highly nonlinear reactive transport models.
It is noteworthy that the modified ILUES algorithm is compared with the ensemble smoother with multiple data assimilation (ES-MDA), which is an efficient iterative ensemble smoother method, in terms of the estimation accuracy of model parameters. Although the ES-MDA method can actually reduce the parameter uncertainty of the model, the ILUES algorithm provides better inversion results with higher accuracy, which demonstrates the advantage of ILUES.
The ensemble size and factor have obvious impacts on the estimation accuracy of model parameters. When is either too small or too large, the inversion results are far from satisfactory. The estimation accuracy is usually better with and when the value of is the same. Additionally, the estimation accuracy usually enhances with the increase of , but the computational cost will increase in the meantime. Therefore, methods for determining the optimal balance between estimation accuracy and computational cost are worthy of further study.
In this paper, the reactive transport model is based on the sequential anaerobic biodegradation of PCE. The results can provide valuable references for solving inverse problems of other reactive contaminants (e.g., ionic reactions between inorganic contaminants). Although the ILUES algorithm is proved to be a valid method for solving inverse problems in this study, it may not be so useful in some practical contamination assessments. For example, when dealing with high-dimensional nonlinear problems and large-scale inverse modeling, the computational cost can be extremely high and even prohibitive. To address this issue, construction of the surrogate model which has a similar accuracy and a low computational cost is considered to be a hopeful solution.
The estimation accuracy of ILUES may also be hindered by the lack of information because only two types of measurements (contaminant concentration and hydraulic head) are assimilated in this study. In order to further improve the estimation accuracy of unknown parameters, our future works may focus on the simultaneous assimilation of multiple measurements (e.g., porosity, hydraulic conductivity and geophysical data).