Joint Estimation of Hydraulic and Biochemical Parameters for Reactive Transport Modelling with a Modiﬁed ILUES Algorithm

: Multicomponent reactive transport modeling is a powerful tool for the comprehensive analysis of coupled hydraulic and biochemical processes. The performance of the simulation model depends on the accuracy of related model parameters whose values are usually di ﬃ cult to determine from direct measurements. In this situation, estimates of these uncertain parameters can be obtained by solving inverse problems. In this study, an e ﬃ cient data assimilation method, the iterative local updating ensemble smoother (ILUES), is employed for the joint estimation of hydraulic parameters, biochemical parameters and contaminant source characteristics in the sequential biodegradation process of tetrachloroethene (PCE). In the framework of the ILUES algorithm, parameter estimation is realized by updating local ensemble with the iterative ensemble smoother (IES). To better explore the parameter space, the original ILUES algorithm is modiﬁed by determining the local ensemble partly with a linear ranking selection scheme. Numerical case studies based on the sequential biodegradation of PCE are then used to evaluate the performance of the ILUES algorithm. The results show that the ILUES algorithm is able to achieve an accurate joint estimation of related model parameters in the reactive transport model.


Introduction
Tetrachloroethene (PCE), a common chlorinated solvent, has been widely used as a dry-cleaning solvent and a metal degreaser. It is among the most widespread and persistent organic contaminants in groundwater resources [1]. PCE is also known as a dense non-aqueous phase liquid (DNAPL) due to its high density and low solubility in groundwater. As a result of these properties, traditional remediation approaches, such as pump-and-treat and vapor stripping, are very expensive and sometimes inapplicable for PCE contaminants in subsurface environments [2]. Therefore, in-situ bioremediation is considered to be a promising method for site remediation of chlorinated solvents, because it requires less intervention and is more cost-effective [3][4][5][6].
Multicomponent reactive transport modeling is an important tool for predicting the fate and transport of reactive contaminants in groundwater by integrating simulations of water flow, biochemical reaction and solute transport [7,8]. However, the uncertainty in related parameters of the reactive transport model poses a major challenge to reliable simulations [9]. In practice, the spatially variable aquifer properties such as hydraulic conductivity and porosity have significant effects on groundwater flow and solute transport modeling [10][11][12]. Additionally, the reactive transport model that characterizes the reaction and migration of contaminants in subsurface aquifers is quite sensitive to biochemical parameters (e.g., rate constant) and contaminant source characteristics (e.g., source strength, source location) [13,14]. Therefore, the simultaneous estimation of hydraulic parameters, biochemical parameters and contaminant source characteristics are of vital importance to achieve satisfactory accuracy of the forward model.
Direct determination of specific model parameters is usually infeasible in heterogeneous subsurface environments. To solve this problem, information obtained from indirect measurements (e.g., contaminant concentration and hydraulic head) is used to estimate uncertain model parameters by solving inverse problems [15]. Many researchers have focused on the calibration of key parameters of reactive transport models using inverse methodologies in the past few years. Bailey and Baù [14] employed an ensemble Kalman filter (EnKF) method to estimate the spatial distribution of the first-order rate constant of a reactive transport model by assimilating the concentration data. Bailey et al. [16] estimated the spatially variable denitrification rates in an agricultural groundwater system by assimilating the concentration and mass data of nitrate, which is implemented with the help of the Ensemble Smoother (ES). Dai and Samper [17] investigated the ability of a general methodology for solving inverse problems in a column experiment and found it useful in the estimation of model parameters. Carniato et al. [18] compared the performance of weighted least squares (WLS), weighted least squares with weight estimation (WLS(we)) and a multivariate (MV) approach on parameter estimation of reactive transport models and found that residual correlation did not evidently affect the predictive uncertainty. Nevertheless, the joint estimation of various parameters of reactive transport models is still limited to a few studies [19][20][21].
In recent years, Bayesian inference has become a prevalent method for parametric uncertainty quantification [22][23][24]. According to Bayes' theorem, the unknown parameters are treated as random variables whose posterior probability distributions can be calculated by updating the prior probability with information provided by the observations [25]. For highly nonlinear systems, the posterior distributions of unknown variables are difficult and in most cases impossible to evaluate analytically [26]. To address this issue, Markov Chain Monte Carlo (MCMC) methods [27,28] have been widely used to approximate the posterior distribution of unknown parameters by generating random samples from the probability distribution. Cao et al. [29] integrated the multi-try differential evolution adaptive Metropolis (MT-DREAMzs) algorithm with the nested sampling estimator (NSE) to improve the performance of marginal likelihood estimation. Shi et al. [30] tested the validity of Gaussian assumptions for the parametric uncertainty quantification in a reactive transport model. Vrugt et al. [31] proposed the Shuffled Complex Evolution Metropolis algorithm (SCEM-UA) to inverse the hydrologic model parameters and found it more efficient than traditional Metropolis-Hastings samplers. Yan et al. [32] developed a Bayesian-based integrated approach, in which a Kriging surrogate model (KSM) is used to improve the computational efficiency to identify the groundwater contamination sources. Nevertheless, MCMC methods are considered to be computationally expensive because a large number of model evaluations are required for high-dimensional parameter space exploration [33]. Consequently, the applications of MCMC methods to inverse modeling of reactive transport are still limited [34,35].
The ensemble Kalman filter (EnKF), which is a Monte Carlo implementation of the Kalman filter [36], is a computationally efficient alternative to the MCMC methods. The EnKF algorithm proposed by Evensen [37] is able to update model parameters and state variables through sequential data assimilation of measurements. Recently, the EnKF algorithm has been widely used for high-dimensional nonlinear data assimilation in geophysical [38], atmospheric [39] and hydrological [40][41][42][43][44] modeling. However, the application of EnKF to numerical models that involve multiple processes would be inconvenient because the implementation of EnKF requires the modification of restart files and the update of model parameters simultaneously at each assimilation step [45]. The ensemble smoother (ES) proposed by van Leeuwen and Evensen [46] is a more efficient alternative to EnKF when dealing with high-dimensional parameter estimation problems. Instead of implementing the sequential data assimilation scheme as EnKF, ES computes a global update of parameters by assimilating all observations simultaneously [45]. Skjervheim and Evensen [47] compared the performance of ES and EnKF for solving history matching problems of reservoirs and found that ES could provide similar results to EnKF. The difference is that ES works more efficiently than EnKF because it can be implemented without restarting the simulation model multiple times.
However, both the standard ES and standard EnKF algorithm cannot provide satisfactory data matches when dealing with highly nonlinear problems [48][49][50]. To further improve the performance and efficiency of ES, the iterative forms of ES have been developed recently. Chen and Oliver [51] proposed an iterative ensemble smoother (IES) algorithm based on the Levenberg-Marquardt (LM) method, in which the ensemble randomized maximum likelihood (EnRML) was used as the smoother. Emerick and Reynolds [52] developed an ensemble smoother with multiple data assimilation (ES-MDA) to solve the reservoir history-matching problem. Li and Davis [53] compared the performance of ES and IES in groundwater modeling and their results showed that IES works much better than the standard ES by continuously updating parameters with the available measurements for all time steps. Ma et al. [54] developed an adaptive IES method (ES-LM) based on the Levenberg-Marquardt algorithm which can reduce the computational complexity of ES. Zhang et al. [55] proposed an iterative local updating ensemble smoother (ILUES) for parameter estimation in hydrologic models. In the framework of ILUES, an iterative form of ES is used to update the local ensemble of each sample, which is defined based on a comprehensive measure of distance to the sample and the measurements. The performance of the ILUES algorithm was evaluated with several numerical examples and the results showed that ILUES could provide an accurate estimation of model parameters with multimodal distributions. Thus, the ILUES algorithm is of great potential to solve highly nonlinear problems such as the inverse modeling of multicomponent reactive transport.
In this study, the ILUES algorithm is employed to estimate the key parameters of a multicomponent reactive transport model. The numerical model, which characterizes the sequential biodegradation of tetrachloroethene (PCE), describes the coupled simulation of groundwater flow, biochemical reaction and solute transport. As the subsurface heterogeneity has significant effects on the biodegradation process, the key model parameters, including the hydraulic paramters, biochemical parameters and contaminant source characteristics, are jointly estimated using the ILUES algorithm [55]. The major objective of this study is to verify the effectiveness and efficiency of the ILUES algorithm for solving the inverse problem of the multicomponent reactive transport model.
To the best of our knowledge, this is the first study that focuses on the application of the ILUES algorithm to joint estimation of hydraulic parameters, geochemical parameters and contaminant source characteristics in a sequential biodegradation process. Although model parameter estimation with inverse methods has been investigated recently, related studies seldom focus on the reactive solute transport process, which is difficult to analyze due to the high dimensionality. Furthermore, the ILUES algorithm in this study is modified by determining the local ensemble partly with a linear ranking selection scheme [56] which is able to more extensively explore the parameter space. The performance of the ILUES algorithm is then evaluated through three different numerical case studies. A comparison between ILUES and ES-MDA is also made to demonstrate the advantage of ILUES on estimation accuracy.
The organization of the paper is as follows: in Section 2, the basic theory of the sequential biodegradation of PCE is explained. Then the general framework and mathematical formulation of the ILUES algorithm is introduced. In Section 3, three different numerical case studies based on the biodegradation of PCE are utilized to test the performance of the ILUES algorithm. The results and discussion of corresponding case studies are also presented in this section. The conclusions of the current study are drawn in Section 4.

Sequential Anaerobic Biodegradation of PCE
In this study, PCE contaminants undergo a sequential reductive dechlorination reaction whose degradation kinetics are assumed to be first order in nature. In the process of dechlorination, microorganisms can utilize PCE as the electron acceptor, sequentially removing chlorine atoms from PCE to form trichloroethene (TCE). TCE is degraded to dichlorethene (DCE) and then vinyl chloride (VC) in turn. Finally, non-toxic ethylene (ETH) is produced by the degradation of VC. The typical sequential reductive dechlorination framework of PCE under anaerobic conditions can be written as: where k i denotes the first-order rate constants of PCE and its daughter products. The governing equations which characterize the transformation and transport of PCE and its daughter products are represented by the following partial differential equations: where R i is the retardation factor; D ij denotes the hydrodynamic dispersion coefficient contaminants in the dechlorination reaction at source point ML −3 ; K i is the first-order anaerobic degradation rate T −1 ; and Y T/P , Y D/T , Y V/D are chlorinated compound yield values under anaerobic reductive dechlorination conditions. In this study, groundwater flow and reactive transport models are simulated using MODFLOW-2000 [57] and RT3D [58,59], respectively.

Parameter Estimation
Parameter estimation of reactive transport models aim to obtain the accurate estimations of key model parameters of groundwater flow and solute transport by solving inverse problems. Bayesian inference is an important method in inverse modeling and has been widely used in hydrologic science.
Considering the Bayesian inference for a nonlinear model, the relationship between model parameters and measurements can be expressed in the following form: where d is the vector for measurements, m is the vector for model parameters, f (m) is the prediction from the forward model, and ε is the vector for Gaussian-distributed measurement errors. According to Bayes' theorem, posterior distribution is proportional to likelihood times prior distribution, which can be written as: Water 2020, 12, 2161 5 of 23 where p( m|d) is the posterior probability distribution, p( d|m) is the prior distribution and p(m) is the likelihood function that measures the goodness of fit of the model to observations of the unknown parameters. In this study, the analytical evaluation of the posterior distribution of unknown parameters is unavailable due to the high nonlinearity of the model. For parameter estimation problems in nonlinear models, ensemble smoother (ES) is a prevalent method because of its effectiveness and efficiency. However, when dealing with highly nonlinear problems, the standard ES method is unable to obtain satisfactory estimations. In this paper, an iterative local updating ensemble smoother (ILUES) is employed, which is proposed by Zhang et al. [55], to jointly estimate various parameters of the reactive transport model. In comparison with ES, the ILUES algorithm updates the local ensemble of each sample with the iterative form of ES, instead of globally assimilating parameter realizations in the ensemble. In this way, ILUES has a better performance on highly nonlinear problems such as the inverse modeling of reactive transport.

Iterative Local Updating Ensemble Smoother
The basic updating equation of ES can be written as: for j = 1, 2, · · · , N e , N e denotes the size of ensemble members.
According to Equations (1) and (8) On the basis of Bayesian inference and the ES method, the implementation of ILUES can be summarized in the following steps and its framework is represented by the flowchart in Figure 1.
To begin with, set the iteration counter i to 0. N e samples are randomly drawn from the prior distribution of model parameters as the initial input ensemble. Therefore, the corresponding output ensemble can be obtained by evaluating the forward model.
Step 2: Determination of local ensemble.
The local ensemble of the sample m f j ( j = 1, 2, · · · , N e ) is defined based on a comprehensive measure of the distance (J) from aspects of the model responses (J 1 ) and the model parameters (J 2 ).
where J 1 (m) denotes the distance between the measurements d and the model responses f (m), J 2 (m) denotes the distance between the selected sample m the maximum values of J 1 (m) and J 2 (m), C MM is the auto-covariance matrix of the model parameters.
In the above equation, is the cross-covariance matrix between and , is the auto-covariance matrix of measurement , , represents the th measurement of model parameters with error of . Then choose a random sample , from the updated local ensemble , = ( , , ⋯ , , ) as the updated sample of ( = 1,2, ⋯ , ). The updated global ensemble = ( , , , , ⋯ , , ) can be obtained after this procedure.
The framework of the ILUES algorithm can be generalized into the following flowchart.  . In this paper, a linear ranking selection scheme [56] is combined to determine the local ensemble according to the J values. The selection probability of each sample is expressed as: for i = 1, · · · , N e Where P min = 0.1, P max = 0.9, rank(θ i ) represents the rank of each individual after sorted by the fitness value of θ i .
To better explore the parameter space, this rank-based selection operator is used together with the original selection scheme in ILUES, which means that the first 80% of N loc samples are determined by Step 3: Update the local ensemble. Update the local ensemble with the basic ES method, written as: for k = 1, · · · , N loc .
In can be obtained after this procedure. The framework of the ILUES algorithm can be generalized into the following flowchart.

Illustrative Case Studies
In this section, three numerical case studies for joint parameter estimation of the reactive transport model are introduced to evaluate the performance of the modified ILUES algorithm. These studies are based on the sequential reductive dechlorination of tetrachloroethene (PCE) under anaerobic conditions which follows a series of chain reactions. The dechlorination reaction and solute transport can be simulated by a multicomponent reactive transport model simultaneously. A one-dimensional case is firstly used to demonstrate the validity of the ILUES algorithm for estimating key parameters of the reactive transport model. Then, the performance of the ILUES algorithm is further tested by jointly estimating the source strength and rate constants of the model in a two-dimensional heterogeneous aquifer. The effects of ensemble size and factor α on the prediction accuracy of ILUES are also investigated in Case 2. Finally, the complexity of the problem is increased by adding various unknown parameters, including the hydraulic conductivity and source location. The numerical model constructed in this case is based on the two-dimensional aquifer in Case 2. The third case focuses on the joint estimation of hydraulic parameters, biochemical parameters and contaminant source characteristics in the sequential biodegradation process of PCE.

Case 1: A One-Dimensional Case
The first case aims to test the effectiveness of the ILUES algorithm for estimating key parameters of the reactive transport model. In this case, the sequential biodegradation process of PCE and the transport of reactive solutes are considered in a one-dimensional confined aquifer with steady-state groundwater flow. As shown in Figure 2, the aquifer is assumed to be 100 m in length and 5 m in width. The model domain is discretized into equal size grids of 5 m × 5 m and has constant heads of 100 m and 99.8 m at the right and left boundaries, respectively. No flow condition is prescribed at the upper and lower boundaries. The PCE contaminant with unknown strength is released at the location of (87.5 m, 2.5m) and an observation well is set at the location of (27.5 m, 2.5 m). The simulation time is 1000 days with 10 equal time steps, so each time step lasts 100 days. More details of related model parameters are listed in Table 1. width. The model domain is discretized into equal size grids of 5 m × 5 m and has constant heads of 100 m and 99.8 m at the right and left boundaries, respectively. No flow condition is prescribed at the upper and lower boundaries. The PCE contaminant with unknown strength is released at the location of (87.5 m, 2.5m) and an observation well is set at the location of (27.5 m, 2.5 m). The simulation time is 1000 days with 10 equal time steps, so each time step lasts 100 days. More details of related model parameters are listed in Table 1.    The major objective of this case study is to obtain the joint estimation of hydraulic conductivity (K), contaminant source strength (S S ) of PCE and degradation rate constants (k 1 , k 2 , k 3 , k 4 ) of the chain reaction using the ILUES algorithm. Measurements of the concentrations of PCE, TCE, DCE and VC are collected from the observation well at t = (2, 4, 6, 8, 10) * 100 days to provide information for the estimation. The measurement errors considered in this case follow a normal (Gaussian) distribution with mean µ = 0 and variance σ 2 = 0.005 2 . The prior distributions of the contaminant source strength (S S ) and the first-order rate constants (k 1 , k 2 , k 3 , k 4 ) are assumed to follow uniform distributions. Their ranges and true values are listed in Table 2.

Parameters
Prior Distribution True Value Considering the spatial heterogeneity of the aquifer, the log-conductivity field Y = ln(K) is assumed to follow a Gaussian distribution, with mean µ Y = 3 and variance σ Y 2 = 1. The identification of hydraulic conductivity is computationally expensive because the number of unknown parameters increases a lot. To reduce the dimensionality of the heterogeneous hydraulic conductivity field and efficiently obtain the converged inversion solutions, the Karhunen-Loève (KL) expansion [60] is utilized to parameterize the log-conductivity field (Y). The log hydraulic conductivity Y = ln(K) at two arbitrary spatial locations (x 1 , y 1 ) and (x 2 , y 2 ) in the model are assumed to be exponentially correlated. Their relationship can be expressed by the following covariance function: where σ Y 2 =1 is the variance, λ x = 5 m and λ y = 50 m are the correlation length along x and y directions, respectively. The KL expansion of the log-conductivity field Y = ln(K) can be expressed as: where ξ i are independent standard Gaussian random variables, and τ i and f i (x) are eigenvalues and eigenfunctions of the covariance function C Y , respectively. In this case, approximately 92% of the conductivity field variance can be preserved by keeping the first five KL terms (i.e., N KL = 5), which means that an approximative distribution of the conductivity field can be calculated using Equation (17). Therefore, the identification of the whole hydraulic conductivity field is then simplified to the estimation of only five parameters. To sum up, there are total 10 unknown parameters to be estimated in this case. For joint estimation of the unknown parameters, the ILUES algorithm is implemented with the ensemble size N e = 500 and the iteration number N Iter = 7. The factor α is set to 0.1, which is in the range of [0, 1]. The contaminant source strength (S S ) and the rate constants of the chain reaction (k 1 , k 2 , k 3 , k 4 ) can be accurately estimated within seven iterations, as shown in Figure 3.  Meanwhile, with the five estimated KL terms, the identification of the hydraulic conductivity field is realized by calculating the KL expansion equation. To evaluate the performance of the ILUES algorithm on the conductivity field identification, the mean and the standard deviation of the logconductivity field at selected iterations (Iteration 1, 3, 5 and 7) are plotted in Figure 4. The true values and randomly generated and initial values are also presented in Figure 4 as the reference. As illustrated in Figure 4a, the mean of the log-conductivity field gets closer to the true values as the number of iterations increases and it greatly approximates the true values after seven iterations. In Figure 4b, the standard deviation of the log-conductivity field is relatively high at the early iterations. However, it decreases dramatically with the increase in the iteration number and gets close to zero after seven iterations. The results shown in Figure 4 verify the effectiveness of the ILUES algorithm Meanwhile, with the five estimated KL terms, the identification of the hydraulic conductivity field is realized by calculating the KL expansion equation. To evaluate the performance of the ILUES algorithm on the conductivity field identification, the mean and the standard deviation of the log-conductivity field at selected iterations (Iteration 1, 3, 5 and 7) are plotted in Figure 4. The true values and randomly generated and initial values are also presented in Figure 4 as the reference. As illustrated in Figure 4a, the mean of the log-conductivity field gets closer to the true values as the number of iterations increases and it greatly approximates the true values after seven iterations. In Figure 4b, the standard deviation of the log-conductivity field is relatively high at the early iterations. However, it decreases dramatically with the increase in the iteration number and gets close to zero after seven iterations. The results shown in Figure 4 verify the effectiveness of the ILUES algorithm for the conductivity field identification.
represents the trace plot of the source strength ( ). (b-e) represents the trace plot of the four rate constants ( , , , ), respectively. The black dashed lines denote the reference true values of corresponding model parameters.
Meanwhile, with the five estimated KL terms, the identification of the hydraulic conductivity field is realized by calculating the KL expansion equation. To evaluate the performance of the ILUES algorithm on the conductivity field identification, the mean and the standard deviation of the logconductivity field at selected iterations (Iteration 1, 3, 5 and 7) are plotted in Figure 4. The true values and randomly generated and initial values are also presented in Figure 4 as the reference. As illustrated in Figure 4a, the mean of the log-conductivity field gets closer to the true values as the number of iterations increases and it greatly approximates the true values after seven iterations. In Figure 4b, the standard deviation of the log-conductivity field is relatively high at the early iterations. However, it decreases dramatically with the increase in the iteration number and gets close to zero after seven iterations. The results shown in Figure 4 verify the effectiveness of the ILUES algorithm for the conductivity field identification.  To further demonstrate the estimation accuracy of the ILUES algorithm, the performance of data match is also evaluated using the measurements (i.e., contaminant concentrations at the observation well). The concentration data of the initial and the last ensemble are presented in Figure 5. The true values of the concentration data simulated by the forward model are also shown in Figure 5 as the reference. As illustrated in Figure 5, the concentration data of the last ensemble approximates the true values of the measurements after seven iterations. Therefore, the ILUES algorithm can obtain a satisfactory data match on this problem. To further demonstrate the estimation accuracy of the ILUES algorithm, the performance of data match is also evaluated using the measurements (i.e., contaminant concentrations at the observation well). The concentration data of the initial and the last ensemble are presented in Figure 5. The true values of the concentration data simulated by the forward model are also shown in Figure 5 as the reference. As illustrated in Figure 5, the concentration data of the last ensemble approximates the true values of the measurements after seven iterations. Therefore, the ILUES algorithm can obtain a satisfactory data match on this problem.
To further demonstrate the estimation accuracy of the ILUES algorithm, the performance of data match is also evaluated using the measurements (i.e., contaminant concentrations at the observation well). The concentration data of the initial and the last ensemble are presented in Figure 5. The true values of the concentration data simulated by the forward model are also shown in Figure 5 as the reference. As illustrated in Figure 5, the concentration data of the last ensemble approximates the true values of the measurements after seven iterations. Therefore, the ILUES algorithm can obtain a satisfactory data match on this problem.  The estimation accuracy of ILUES can also be evaluated by the root-mean-square error (RMSE), which is a value that measures the differences between values predicted by an estimator and the values observed. In general, the dataset with lower RMSE (closer to 0) indicates better prediction accuracy. The RMSE of parameters estimated in this case is defined by the following formula: whereŷ n denotes the predicted value of measurements, y n denotes the reference true value of measurements and N e is the ensemble size. As shown in Figure 6, the RMSE of model parameters estimated in this case decreases evidently as the number of iterations increases and all RMSE values approach 0 after seven iterations, which demonstrate the accuracy of ILUES for parameter estimation.
In summary, the ILUES algorithm is an effective method for the joint estimation of unknown parameters of the multicomponent reactive transport model. The estimation accuracy of ILUES can also be evaluated by the root-mean-square error (RMSE), which is a value that measures the differences between values predicted by an estimator and the values observed. In general, the dataset with lower RMSE (closer to 0) indicates better prediction accuracy. The RMSE of parameters estimated in this case is defined by the following formula: where denotes the predicted value of measurements, denotes the reference true value of measurements and is the ensemble size. As shown in Figure 6, the RMSE of model parameters estimated in this case decreases evidently as the number of iterations increases and all RMSE values approach 0 after seven iterations, which demonstrate the accuracy of ILUES for parameter estimation. In summary, the ILUES algorithm is an effective method for the joint estimation of unknown parameters of the multicomponent reactive transport model. In this case, the validity of the ILUES algorithm is tested in a two-dimensional model. The coupled simulation of groundwater flow, biodegradation reaction and solute transport based on the sequential reductive dechlorination of PCE is considered in a two-dimensional confined aquifer with steady state groundwater flow and heterogeneous hydraulic conductivity. As shown in Figure 7, the

Case 2: A Two-Dimensional Case with Heterogeneous Hydraulic Conductivity Field
In this case, the validity of the ILUES algorithm is tested in a two-dimensional model. The coupled simulation of groundwater flow, biodegradation reaction and solute transport based on the sequential reductive dechlorination of PCE is considered in a two-dimensional confined aquifer with steady state groundwater flow and heterogeneous hydraulic conductivity. As shown in Figure 7, the aquifer is assumed to be 500 m in length and 300 m in width, with constant-head boundaries of 100 m and 99 m on the left and right side, respectively. No-flow boundaries are specified along the upper and lower side of the model domain. The model domain is discretized into uniform grids of 10 m × 10 m. The PCE contaminant is released from a point source located at (95 m, 145 m) and an observation well is set at the location of (295 m, 145 m). The total simulation time is 1200 days and is divided into 15 equal time steps. Other related parameters used in the simulation model are listed in Table 3. The second case study focus on the joint estimation of unknown model parameters, including the contaminant source strength ( ) and the first-order rate constants ( , , , ) of the degradation reaction. As the information required for parameter estimation, the contaminant concentrations of PCE, TCE, DCE and VC are obtained from the observation well at t = 6, 8, 10,12,14] * 80 days. The measurement errors in this case are assumed to follow a Gaussian distribution with zero mean and variance = 0.005 . To increase the complexity of the model, the aquifer is assumed to be heterogeneous and the log-conductivity field = ln( ) of the model is randomly generated from a Gaussian distribution with mean = 4 and variance = 0.25. The prior distributions of the contaminant source strength ( ) and the first-order rate constants ( , , , ) are uniform distributions. Their ranges and reference true values are listed in Table 4.
In total, there are five unknown parameters to be estimated in Case 2.   The second case study focus on the joint estimation of unknown model parameters, including the contaminant source strength (S S ) and the first-order rate constants (k 1 , k 2 , k 3 , k 4 ) of the degradation reaction. As the information required for parameter estimation, the contaminant concentrations of PCE, TCE, DCE and VC are obtained from the observation well at t = [6,8,10,12,14] * 80 days. The measurement errors in this case are assumed to follow a Gaussian distribution with zero mean and variance σ 2 = 0.005 2 . To increase the complexity of the model, the aquifer is assumed to be heterogeneous and the log-conductivity field Y = ln(K) of the model is randomly generated from a Gaussian distribution with mean µ Y = 4 and variance σ Y 2 = 0.25. The prior distributions of the contaminant source strength (S S ) and the first-order rate constants (k 1 , k 2 , k 3 , k 4 ) are uniform distributions. Their ranges and reference true values are listed in Table 4. In total, there are five unknown parameters to be estimated in Case 2.
During the procedure of data assimilation, determination of the local ensemble is a crucial step that can evidently affect the inversion results. According to the equation N loc = αN e mentioned in Section 2, the size of the local ensemble is affected by two factors, α and N e , where N e denotes the size of global ensemble and α denotes the ratio of local ensemble over the global ensemble. To investigate the impacts of the two factors on the accuracy of estimation, different combinations of α and N e are evaluated by the RMSE of the measurements.
To facilitate the comparison between different combinations of α and N e , the log-transformed RMSE is introduced in Figure 8.

Parameters Prior Distribution True Value
During the procedure of data assimilation, determination of the local ensemble is a crucial step that can evidently affect the inversion results. According to the equation = mentioned in Section 2, the size of the local ensemble is affected by two factors, and , where denotes the size of global ensemble and denotes the ratio of local ensemble over the global ensemble. To investigate the impacts of the two factors on the accuracy of estimation, different combinations of and are evaluated by the RMSE of the measurements. To facilitate the comparison between different combinations of and , the log-transformed RMSE is introduced in Figure 8. is. This indicates that the ILUES algorithm is unable to obtain satisfactory estimations in both situations. If is too small, the local ensemble is so small that it may miss the samples that are close to the observations, as pointed out in Section 2. Meanwhile, if is too large, the size of the local ensemble will be close to the global ensemble, so the estimation scheme will be similar to the ensemble smoother (ES) without local updating. Furthermore, the datasets with = 0.1 and = 0.2 usually give relatively lower log-RMSE values of the measurements, regardless of the value of . Therefore, As shown in Figure 8, when α is very small or very large (e.g., α = 0.05 or α = 1), the log-transformed RMSE are obviously larger than datasets with other α values, no matter what the value of N e is. This indicates that the ILUES algorithm is unable to obtain satisfactory estimations in both situations. If α is too small, the local ensemble is so small that it may miss the samples that are close to the observations, as pointed out in Section 2. Meanwhile, if α is too large, the size of the local ensemble will be close to the global ensemble, so the estimation scheme will be similar to the ensemble smoother (ES) without local updating. Furthermore, the datasets with α = 0.1 and α = 0.2 usually give relatively lower log-RMSE values of the measurements, regardless of the value of N e . Therefore, in this case, rather than using a constant α value in the original ILUES algorithm, α that randomly generated in the range of [0.1, 0.2] is selected at each iteration to determine the local ensemble.
As for the factor N e , when the ensemble size N e is too small (e.g., N e = 50), limited ensemble updating may underestimate the uncertainty in model parameters and give biased inversion results. In Figure 8, when the value of α is fixed, the datasets with larger N e values usually give relatively lower log-RMSE values, which also indicates better prediction accuracy. However, the larger ensemble size could result in a higher computational burden and the involvement of more measurement errors in the model. To balance the computational efficiency and the estimation accuracy, a combination of α, which is randomly generated in the range of [0.1, 0.2], and N e = 300 is selected as the optimal setting of the ILUES algorithm in the subsequent parameter estimation for this case study.
After determining the optimal setting of α and N e , the model parameter estimation is implemented with the ILUES algorithm. The number of iterations is set to 10 in this case. As shown in Figure 9, accurate joint estimations of the contaminant source strength and the degradation rate constants can be obtained within 10 iterations, which demonstrates the ability of ILUES to solve inverse problems. To further evaluate the estimation accuracy of ILUES, the RMSE of model parameters estimated in this case study is introduced. As shown in Figure 10, the RMSE values of all parameters decrease and gradually approximate 0 with the increase of the iteration number, which means that the estimation error gets lower and lower with the increase of the iteration number. Therefore, model parameter estimation can achieve sufficient accuracy when the number of iterations is large enough. To summarize, for high-dimensional inverse problems, the iteration number and ensemble size of ILUES should be enlarged appropriately. The investigation into the impact of α on estimation accuracy provides a reference for the subsequent case study.
Water 2020, 12, x FOR PEER REVIEW 14 of 24 in this case, rather than using a constant value in the original ILUES algorithm, that randomly generated in the range of [0.1, 0.2] is selected at each iteration to determine the local ensemble. As for the factor , when the ensemble size is too small (e.g., = 50), limited ensemble updating may underestimate the uncertainty in model parameters and give biased inversion results. In Figure 8, when the value of is fixed, the datasets with larger values usually give relatively lower log-RMSE values, which also indicates better prediction accuracy. However, the larger ensemble size could result in a higher computational burden and the involvement of more measurement errors in the model. To balance the computational efficiency and the estimation accuracy, a combination of , which is randomly generated in the range of [0.1, 0.2], and = 300 is selected as the optimal setting of the ILUES algorithm in the subsequent parameter estimation for this case study. After determining the optimal setting of and , the model parameter estimation is implemented with the ILUES algorithm. The number of iterations is set to 10 in this case. As shown in Figure 9, accurate joint estimations of the contaminant source strength and the degradation rate constants can be obtained within 10 iterations, which demonstrates the ability of ILUES to solve inverse problems. To further evaluate the estimation accuracy of ILUES, the RMSE of model parameters estimated in this case study is introduced. As shown in Figure 10, the RMSE values of all parameters decrease and gradually approximate 0 with the increase of the iteration number, which means that the estimation error gets lower and lower with the increase of the iteration number. Therefore, model parameter estimation can achieve sufficient accuracy when the number of iterations is large enough. To summarize, for high-dimensional inverse problems, the iteration number and ensemble size of ILUES should be enlarged appropriately. The investigation into the impact of on estimation accuracy provides a reference for the subsequent case study.

Case 3: A Two-Dimensional Case with Unknown Contaminant Source and Hydraulic Conductivity Field
In Case 3, a parameter estimation problem with a large number of unknown parameters is studied to evaluate the performance of the ILUES algorithm on joint estimation of model parameters. The sequential reductive dechlorination of PCE is considered in a two-dimensional confined aquifer with steady state groundwater flow, whose numerical model has the same model domain and boundary conditions as Case 2, as shown in Figure 11. The total simulation time is still 1200 days with 15 equal time steps. However, in addition to contaminant source strength and degradation rate constants, the contaminant source location and the hydraulic conductivity are assumed to be uncertain in this case, which significantly increases the complexity of the model. The PCE contaminant is released from a potential area depicted by the white dashed rectangle in Figure 11 and the source strength is assumed to be an unknown constant. More details of related model parameters are listed in Table 5.

Case 3: A Two-Dimensional Case with Unknown Contaminant Source and Hydraulic Conductivity Field
In Case 3, a parameter estimation problem with a large number of unknown parameters is studied to evaluate the performance of the ILUES algorithm on joint estimation of model parameters. The sequential reductive dechlorination of PCE is considered in a two-dimensional confined aquifer with steady state groundwater flow, whose numerical model has the same model domain and boundary conditions as Case 2, as shown in Figure 11. The total simulation time is still 1200 days with 15 equal time steps. However, in addition to contaminant source strength and degradation rate constants, the contaminant source location and the hydraulic conductivity are assumed to be uncertain in this case, which significantly increases the complexity of the model. The PCE contaminant is released from a potential area depicted by the white dashed rectangle in Figure 11 and the source strength is assumed to be an unknown constant. More details of related model parameters are listed in Table 5.

Case 3: A Two-Dimensional Case with Unknown Contaminant Source and Hydraulic Conductivity Field
In Case 3, a parameter estimation problem with a large number of unknown parameters is studied to evaluate the performance of the ILUES algorithm on joint estimation of model parameters. The sequential reductive dechlorination of PCE is considered in a two-dimensional confined aquifer with steady state groundwater flow, whose numerical model has the same model domain and boundary conditions as Case 2, as shown in Figure 11. The total simulation time is still 1200 days with 15 equal time steps. However, in addition to contaminant source strength and degradation rate constants, the contaminant source location and the hydraulic conductivity are assumed to be uncertain in this case, which significantly increases the complexity of the model. The PCE contaminant is released from a potential area depicted by the white dashed rectangle in Figure 11 and the source strength is assumed to be an unknown constant. More details of related model parameters are listed in Table 5.   The unknown parameters estimated in this case can be classified into three categories. The first category is the contaminant source characteristics which include the contaminant source strength (S s ) and contaminant source location S x , S y . The second category is comprised of the four first-order rate constants (k 1 , k 2 , k 3 , k 4 ) of the dechlorination reaction, known as the biochemical parameters. The third category consists of the KL terms which are drawn from the KL expansion of the log-conductivity field Y = ln(K).
To characterize the heterogeneity of the aquifer, the log-conductivity field Y = ln(K) is assumed to follow a Gaussian distribution, with mean µ Y = 4 and variance σ Y 2 = 1. The identification of the hydraulic conductivity field can be realized by estimating the conductivity of each grid, so there are 1500 unknown parameters to be estimated. As the number of unknown parameters increases greatly, the computational cost can grow tremendously. To reduce the dimensionality and simplify the estimation of the hydraulic conductivity field, the KL expansion [60] is used to parameterize the log-conductivity field (Y), as used in Case 1. In this case, approximately 95% of the field variance for conductivity can be preserved by keeping the first 68 KL terms (i.e., N KL = 68), which means that an approximative distribution of the conductivity field can be obtained with these 68 parameters. To obtain ample information from the measurements, eight observation wells are set in the model domain, as marked with red squares in Figure 11. The observation data, including the concentration of multiple contaminants (PCE, TCE, DCE and VC) and the hydraulic head of the specific grid, are collected from the eight observation wells at t = [6,8,10,12,14] * 80 (days). As a result, 168 observation data in total are collected from the observation wells with measurement errors ε ∼ N 0, 0.005 2 .
In total, there are 75 uncertain parameters to be estimated in Case 3. The seven parameters which belong to the first and second parameter categories are uniformly distributed. Their corresponding ranges and reference true values are listed in Table 6. The prior distributions of the 68 KL terms follow the standard normal distribution. The joint estimation of the three types of model parameters is implemented with the ILUES algorithm.  Owing to the large quantity of unknown parameters, the ensemble size (N e ) of ILUES in this case is set to 1000. Based on the study in Case 2, instead of setting the factor α as a constant at all times, α is randomly generated in the range of (0.1, 0.2) at each iteration of ILUES. The iteration number (N Iter ) of ILUES is set to 10. To demonstrate the advantage of ILUES in estimation accuracy, the performance of ensemble smoother with multiple data assimilation (ES-MDA) [52] is evaluated with the same numerical model for comparison. The setting of ensemble size (N e ) and iteration number (N Iter ) in ES-MDA is also the same as ILUES. As shown in Figure 12, even though the mean values of some estimated parameters (e.g., S s , k 1 , k 2 , k 3 ) are close to the reference true values, the uncertainty in model parameters is still large in the last few iterations and some estimated parameters at the last iteration (e.g., S x , S y , k 4 ) deviate from the true values after the implementation of ES-MDA. In addition, a large number of samples in the last ensemble are still far from the true value, which means that ES-MDA is unable to achieve sufficient estimation accuracy. In contrast to the inversion results obtained from ES-MDA, the ILUES algorithm can provide more satisfactory results with higher estimation accuracy. As illustrated in Figure 13, the range of estimated values of model parameters gradually decreases as the number of iterations increases and it shrinks to a narrow range around the true value after ten iterations, which means that the uncertainty in model parameters is significantly reduced after ten iterations using ILUES. Furthermore, nearly all estimated values of model parameters approximate the true values after ten iterations, which demonstrates the superior accuracy of ILUES for model parameter estimation. In consequence, the joint estimation of contaminant source characteristics and biochemical parameters can be achieved with the ILUES algorithm.
To further verify the advantage of ILUES, the RMSE between the reference true values and the estimated values of selected model parameters is introduced to evaluate the estimation accuracy.
The RMSE of selected model parameters estimated by ILUES and ES-MDA are listed in Table 7. As clearly shown in Table 7, the RMSE values of estimated model parameters with the implementation of ILUES are all lower than those using ES-MDA. As lower RMSE values indicate higher estimation accuracy, a conclusion can be drawn that ILUES has an obvious advantage over ES-MDA on the joint estimation of contaminant source characteristics and biochemical parameters.    To further verify the advantage of ILUES, the RMSE between the reference true values and the estimated values of selected model parameters is introduced to evaluate the estimation accuracy.
The RMSE of selected model parameters estimated by ILUES and ES-MDA are listed in Table 7. As clearly shown in Table 7, the RMSE values of estimated model parameters with the implementation of ILUES are all lower than those using ES-MDA. As lower RMSE values indicate  The estimation accuracy of ILUES is also evaluated by the RMSE of model parameters considered in this case study. As illustrated in Figure 14, the RMSE values of all parameters decrease evidently with the increase of the iteration number and finally approximate 0 after 10 iterations. Thus, the ILUES algorithm can provide sufficient accuracy for the joint estimation of biochemical parameters and contaminant source characteristics of the reactive transport model. Meanwhile, with the 68 KL terms inversed by ILUES, the hydraulic conductivity filed can be calculated from the KL expansion equation (Equation (17)). In this case, posterior samples in the ensemble of the 10th iteration are chosen to calculate the distribution of the log-conductivity field. Six posterior realizations, the mean estimate and the estimation variance of the log-conductivity field at the 10th iteration are presented in Figure 15. The reference log-conductivity field which is randomly generated at the beginning of the simulation is also presented in Figure 15. As illustrated in the figure, estimations of the log-conductivity field of the six posterior realizations are highly similar to the reference log-conductivity field and the estimation variance is reduced to a relatively low level after assimilation, which demonstrates the effectiveness of ILUES on hydraulic parameter estimation. The slight differences between the reference and the estimated log-conductivity field might be caused by the limited and sparse observation wells for each contaminant. To summarize, the ILUES algorithm can provide accurate joint estimations for the three types of model parameters mentioned above. The estimation accuracy of ILUES is also evaluated by the RMSE of model parameters considered in this case study. As illustrated in Figure 14, the RMSE values of all parameters decrease evidently with the increase of the iteration number and finally approximate 0 after 10 iterations. Thus, the ILUES algorithm can provide sufficient accuracy for the joint estimation of biochemical parameters and contaminant source characteristics of the reactive transport model. Meanwhile, with the 68 KL terms inversed by ILUES, the hydraulic conductivity filed can be calculated from the KL expansion equation (Equation 17). In this case, posterior samples in the ensemble of the 10th iteration are chosen to calculate the distribution of the log-conductivity field. Six posterior realizations, the mean estimate and the estimation variance of the log-conductivity field at the 10th iteration are presented in Figure 15. The reference log-conductivity field which is randomly generated at the beginning of the simulation is also presented in Figure 15. As illustrated in the figure, estimations of the log-conductivity field of the six posterior realizations are highly similar to the reference log-conductivity field and the estimation variance is reduced to a relatively low level after assimilation, which demonstrates the effectiveness of ILUES on hydraulic parameter estimation. The slight differences between the reference and the estimated log-conductivity field might be caused by the limited and sparse observation wells for each contaminant. To summarize, the ILUES algorithm can provide accurate joint estimations for the three types of model parameters mentioned above.

Conclusions
In this study, an iterative local updating ensemble smoother (ILUES) method is employed to jointly estimate the hydraulic parameters, biochemical parameters and contaminant source characteristics of a reactive transport model. To better explore the parameter space, the original ILUES algorithm is modified by determining the local ensemble partly with a linear ranking selection scheme [56]. The reactive transport model is constructed to simulate the sequential anaerobic biodegradation process of tetrachloroethene (PCE). The applicability and accuracy of ILUES is evaluated by three numerical case studies. In all case studies, the uncertainty in model parameters is significantly reduced with the implementation of ILUES, which demonstrates the validity of ILUES for joint parameter estimation.
The inversion of unknown model parameters is of vital importance to predict the fate and transport of reactive contaminants in groundwater. The Markov Chain Monte Carlo (MCMC) is the most widely used method to approximate the posterior distribution of unknown parameters. However, as MCMC methods typically require a large number of model evaluations, they are considered to be computationally intensive and may not be applicable to high-dimensional inverse modeling of reactive transport. The ensemble Kalman filter (EnKF) and ensemble smoother (ES), which are based on the theory of data assimilation, are more efficient alternatives to MCMC for parameter estimation in nonlinear systems. Nevertheless, for highly nonlinear problems, both the

Conclusions
In this study, an iterative local updating ensemble smoother (ILUES) method is employed to jointly estimate the hydraulic parameters, biochemical parameters and contaminant source characteristics of a reactive transport model. To better explore the parameter space, the original ILUES algorithm is modified by determining the local ensemble partly with a linear ranking selection scheme [56]. The reactive transport model is constructed to simulate the sequential anaerobic biodegradation process of tetrachloroethene (PCE). The applicability and accuracy of ILUES is evaluated by three numerical case studies. In all case studies, the uncertainty in model parameters is significantly reduced with the implementation of ILUES, which demonstrates the validity of ILUES for joint parameter estimation.
The inversion of unknown model parameters is of vital importance to predict the fate and transport of reactive contaminants in groundwater. The Markov Chain Monte Carlo (MCMC) is the most widely used method to approximate the posterior distribution of unknown parameters. However, as MCMC methods typically require a large number of model evaluations, they are considered to be computationally intensive and may not be applicable to high-dimensional inverse modeling of reactive transport. The ensemble Kalman filter (EnKF) and ensemble smoother (ES), which are based on the theory of data assimilation, are more efficient alternatives to MCMC for parameter estimation in nonlinear systems. Nevertheless, for highly nonlinear problems, both the standard ES and standard EnKF methods cannot provide satisfactory inversion results with sufficient accuracy. Under these circumstances, the ILUES algorithm is developed to improve the accuracy and efficiency of parameter estimation for highly nonlinear reactive transport models.
It is noteworthy that the modified ILUES algorithm is compared with the ensemble smoother with multiple data assimilation (ES-MDA), which is an efficient iterative ensemble smoother method, in terms of the estimation accuracy of model parameters. Although the ES-MDA method can actually reduce the parameter uncertainty of the model, the ILUES algorithm provides better inversion results with higher accuracy, which demonstrates the advantage of ILUES.
The ensemble size N e and factor α have obvious impacts on the estimation accuracy of model parameters. When α is either too small or too large, the inversion results are far from satisfactory. The estimation accuracy is usually better with α = 0.1 and α = 0.2 when the value of N e is the same. Additionally, the estimation accuracy usually enhances with the increase of N e , but the computational cost will increase in the meantime. Therefore, methods for determining the optimal balance between estimation accuracy and computational cost are worthy of further study.
In this paper, the reactive transport model is based on the sequential anaerobic biodegradation of PCE. The results can provide valuable references for solving inverse problems of other reactive contaminants (e.g., ionic reactions between inorganic contaminants). Although the ILUES algorithm is proved to be a valid method for solving inverse problems in this study, it may not be so useful in some practical contamination assessments. For example, when dealing with high-dimensional nonlinear problems and large-scale inverse modeling, the computational cost can be extremely high and even prohibitive. To address this issue, construction of the surrogate model which has a similar accuracy and a low computational cost is considered to be a hopeful solution.
The estimation accuracy of ILUES may also be hindered by the lack of information because only two types of measurements (contaminant concentration and hydraulic head) are assimilated in this study. In order to further improve the estimation accuracy of unknown parameters, our future works may focus on the simultaneous assimilation of multiple measurements (e.g., porosity, hydraulic conductivity and geophysical data).