1. Introduction
Pumping and tracer tests are frequent site-investigation techniques used to determine aquifer parameters, notably hydraulic conductivity (K). The data retrieved with such tests in their conventional layout yield average characteristics of the subsurface material being investigated and usually give limited information of subsurface heterogeneity. However, resolving the spatial variability in hydraulic conductivity is critical, for example in the design of efficient groundwater remediation systems, in which solute transport is partially controlled by small-scale variations in hydraulic properties [
1].
The limitations of conventional hydraulic testing have motivated the development of more sophisticated methods such as tomographic aquifer experiments, in which large datasets representative of different parts (and different processes) of the subsurface can be collected. The idea behind a tomographic aquifer test is to sequentially stress the aquifer at multiple isolated sections and measure the response at many observation points. The observations are then inverted to estimate aquifer parameters. The most common application of an aquifer tomographic method is hydraulic tomography, which involves a sequence of pumping tests with multiple injection/extraction wells and many observation locations to monitor hydraulic head changes. Hydraulic tomography has been applied successfully in theoretical, laboratory, and field studies [
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27], however, the diffusive nature of groundwater-flow limits the information contained in the observed data, no matter how sophisticated the chosen test layout is [
28,
29,
30,
31]. Recent studies suggest that this limitation can be overcome by integrating data of different types (e.g., hydraulic head and tracer concentrations) in parameter estimation. The different sensitivity patterns of each data type help improve the resolution and reduce the uncertainty of the estimate [
32,
33,
34,
35].
The relevance of tracer data for aquifer characterization has motivated the adaptation of traditional tracer tests into a tomographic layout. In analogy to hydraulic tomography, tracer tomography involves a series of tracer tests with tracer injection at different locations or in different depths of the aquifer. The concept of tracer tomography has been investigated mainly in numerical [
34,
35,
36,
37,
38,
39] and laboratory [
33,
40] studies. The logistic effort and associated costs have hindered its field application in real aquifers. Doro et al. [
41] presented a field method to perform tomographic tracer tests and showed its viability with a field experiment performed at the hydrogeological research site Lauswiesen, in Germany. Somogyvári and Bayer [
42] obtained a hydraulic-conductivity tomogram based on tracer data collected during a field tracer tomography experiment at the Widen field site, in Switzerland. Both studies used heat as tracer.
The datasets retrieved from combined tomographic experiments require inverse models that can use the information at reasonable computational costs. Different inverse methods have been proposed [
43,
44,
45], but the large datasets recorded and the high spatial resolution aimed for the estimated parameter fields limit the application of many approaches. Gauss–Newton methods, for example, require the sensitivity of all observations with respect to all parameters. This may be achieved by solving as many adjoint problems as there are measurements [
46,
47,
48,
49]. This approach, however, imposes prohibitive computational costs if a highly discretized model and many observations are used.
An efficient alternative is data assimilation, in which real and simulated observations at the current state of a system are merged, the model states and/or parameters are corrected to minimize their differences, model simulations are run forward in time until new observations are available and the differences are evaluated again. A particularly popular data assimilation method in hydrogeology is the ensemble Kalman filter [
50], due to its simplicity in implementation and efficiency for correcting (or updating) model states and/or parameters, for which sensitivities are not explicitly calculated. The covariance matrices needed in the update step are evaluated by ensemble-averaging within Monte Carlo simulations. The premise is that the efficiency of data assimilation methods in updating model parameters and/or states may enable resolving aquifer heterogeneity at higher resolutions, given that enough information is available. As the main objective of aquifer tomographic tests is to provide comprehensive information about subsurface heterogeneity, it seems a natural choice to explore the performance of data assimilation schemes in the estimation of aquifer parameters with tomographic data sets. Although different applications of ensemble-Kalman filter based methods in groundwater problems can be found in the literature [
51,
52,
53,
54,
55,
56,
57], their application to data from combined hydraulic and tracer tomographic experiments has not yet been reported.
This work is the follow up to a synthetic study [
58] on the application of two data assimilation methods to the estimation of spatially distributed hydraulic conductivity: (1) the Ensemble Kalman Filter (EnKF) and (2) the Kalman Ensemble Generator (KEG). In the preceding synthetic study [
58], we showed that the EnKF and KEG are well-fitted for the estimation of spatially distributed hydraulic conductivity fields from hydraulic- and tracer-tomographic data. In this paper, we applied these techniques to field experiments and to the estimation of aquifer parameters with the data collected in real tomographic aquifer tests.
The approach for the assimilation of field data follows the methods presented in the synthetic study. We used the same numerical models to simulate groundwater flow and solute transport but extended the problem to three spatial dimensions. To minimize repetition, we refer the reader to the previous paper [
58] and the references therein for a more in-depth description of the theory behind groundwater flow, solute transport, and the application of ensemble-Kalman filter based methods to groundwater problems. The experimental work presented here was applied at the hydrogeological research site Lauswiesen in Germany.
This work is motivated by the constant evolution of modern measurement devices that have led to better, smaller, and affordable sensors. Unlike previous field applications [
41,
42], in which heat was used as tracer, the experimental setup developed in this work uses fluorescein as conservative tracer, avoiding issues related to the high diffusivity of heat, density and viscosity effects caused by changes in water temperature, and the technical difficulties associated with the injection of water for long time periods at a temperature different than that of the ambient groundwater.
Like in the synthetic study [
58], we applied the standard update scheme of the EnKF to assimilate transient drawdown data. In this scheme the parameters (and/or model states) are updated by evaluating the model performance from the current state to one step forward in time. For the assimilation of tracer concentrations, we applied a restart scheme of the EnKF, in which parameters are updated in the analysis of a time step and solute transport is resimulated from the initial time until the next available measurement time step [
59,
60,
61,
62]. Although computationally more demanding, the restart scheme is necessary to honor mass conservation and consistency between hydraulic parameters and model states during transport simulations [
58]. In contrast, parameter updating with the KEG was performed by assimilating all available information simultaneously, with an iterative approach in which the same data are reused to update all ensemble members. The update scheme of the KEG requires full forward simulations, posing prohibitive computational costs for three-dimensional transient groundwater flow and solute transport simulations of finely discretized numerical models. To reduce the computational burden, we directly solved for steady-state flow and the first two temporal moments of tracer concentration and used only steady-state hydraulic heads and mean breakthrough times as data for the update.
We estimated effective aquifer parameters by type-curve matching of analytical and semianalytical solutions of the groundwater flow [
63,
64,
65] and advection–dispersion equations [
66,
67]. This information was considered, together with information from previous investigations, in the geostatistical models applied for the generation of the ensemble of parameters.
The work is structured as follow.
Section 2 is dedicated to the underlying theory and governing equations describing groundwater flow and solute transport.
Section 3 summarizes the theory of ensemble-Kalman filter based methods for parameter estimation.
Section 4 describes the general hydrogeological setting of the aquifer at the research field site Lauswiesen and the experimental design of the aquifer tomographic experiments.
Section 5 presents the dataset obtained with the field experiments and describes the general methodology applied to denoise and process the raw data.
Section 6 contains a description of the numerical models used to simulate the aquifer tomographic experiments and the settings of the EnKF and KEG applied for the assimilation of the field data.
Section 7 presents the results of the parameter estimation using both ensemble-Kalman filter based methods. This work ends in
Section 8 with conclusions, a discussion of the limitations of the proposed methodology, and recommendations to improve solute-tracer tomography on the field scale.
2. Governing Equations
We consider the spatial field of isotropic hydraulic conductivity
K(
) [
] and the transient hydraulic-head field
h(t,
) [
L] as described by the groundwater flow equation:
subject to:
in which
[
L] is the vector of spatial coordinates,
[
] is the specific storage coefficient,
t [
T] denotes time,
[
T] refers to the initial time,
[
] describes volumetric sources or sinks (e.g., injection/extraction wells),
and
are the Dirichlet boundaries at the in- and outflow of the domain, respectively, and
represents the Neumann boundaries with defined normal flux
[
].
is the unit vector normal to the boundary pointing outwards.
[
L],
[
L], and
[
L] denote the initial and boundary values of hydraulic head, and the specific discharge
[
] follows Darcy’s law (
).
We also assume solute transport described by the dual-domain advection–dispersion equation, which separates the pore space of an aquifer into mobile and immobile continuous domains and assumes that the mass transfer between the two domains is proportional to the corresponding concentration difference [
68,
69]:
subject to:
in which
[
] and
[
] refer to the tracer concentrations in the mobile and immobile domains, respectively,
[
] and
[
] are the corresponding porosities,
[
] is the dispersion tensor,
[
] is the first-order solute exchange coefficient between the mobile and immobile zones,
represents all domain boundaries, and
[
] is a known concentration along the inflow boundary
. If the mobile porosity equals the total porosity and the immobile porosity is set to zero, the model described by Equations (
6) and (
7) simplifies to the classical advection–dispersion equation. The seepage velocity
[
] is defined as the specific discharge divided by the porosity (
).
Additionally, we consider temporal moments of tracer concentration, which can be computed by temporal-moment generating equations [
70]:
in which
is the
k-th raw temporal moment of the tracer concentration
c. Equations (
11) and (
12) imply that the zeroth moments
and
of the mobile and immobile concentrations are identical, and the first moment of the mobile concentration does not differ between the standard advection–dispersion model and the dual-domain case with two porosities. In our analysis, we used mean tracer arrival times, that is, the ratio of the first over the zeroth temporal moment of concentration, rather than the moments themselves as observations.
3. Parameter Estimation with Ensemble-Kalman Based Methods
We applied the same ensemble-Kalman filter based methods for parameter estimation as those used in the synthetic study of the companion paper [
58]: the Ensemble Kalman Filter (EnKF) and the Kalman Ensemble Generator (KEG). Ensemble-Kalman filter based methods are recursive processes with a forecast-update cycle defined by Equation (
13) (forecast) and Equation (
14) (update) [
71]. Both the EnKF and KEG share the same underlying theory and differ only in the way observations are analyzed.
in which
is the ensemble index, and
is the number of ensemble members,
is the model that simulates the model states
(e.g., drawdown or concentrations) for given parameters (e.g., hydraulic conductivity) and preceding states
, and
t [
T] is the time index,
and
are the real and simulated observations, respectively. Measurement noise
drawn from a multi-Gaussian probability distribution with zero mean and covariance matrix
is added to the simulated observations [
72]. The subscripts
u and
c refer to the unconditional and conditional state and/or parameter vector
, respectively, and
is a damping factor that ranges between 0 and 1, which reduces the step size of parameter updates. The matrices
(
) and
(
) are the auto-covariance matrix of modeled observations and the cross-covariance matrix between model parameters and modeled observations, respectively, computed from the model ensemble.
and
refer to the number of parameters and available observations, respectively.
We applied the EnKF with a standard update, damping, and normal-score transformations to assimilate transient hydraulic data, whereas the restart EnKF scheme was implemented without data transformations for concentrations. The normal-score transformation, or Gaussian anamorphosis, is a univariate transformation that modifies the marginal distributions of the variables to standard normal distributions without changing the statistical dependence and thus without ensuring multi-Gaussianity after the transformation. We built an individual transformation function for each measurement location and applied it to both the perturbed modeled observations (
) and the corresponding reference observation
[
51,
73]. In the KEG, we updated parameters by assimilating all observations simultaneously, as it is regularly done in a batch calibration. This scheme follows an iterative approach, in which the same data are reused to update those ensemble members for which the likelihood of the data does not meet an acceptance criterion. In the application of the KEG, we solved only for steady-state flow and the zeroth and first temporal moments of tracer concentration and used the steady-state heads and mean breakthrough times as data. Neither damping nor data transformation were applied during parameter updating with the KEG.
5. Data Processing and Preliminary Aquifer Characterization
To increase the signal-to-noise ratio, correct signal shifts and/or remove outliers, background values, and trends introduced by the sensors in the collected dataset, we used smoothing functions and low-pass filters to remove high frequency noise and automated outlier detection algorithms based on the modified Z-scores [
86].
To account for the anomalous transport observed at the research site Lauswiesen [
25,
78], which is characterized by early breakthrough and long tailing, we fitted the generalized inverse Gaussian (GIG) distribution [
87] to the tracer data, and used the fitted curve in those cases in which the parametric model and the field data showed a good agreement. The objective of fitting a parametric function to noisy data is to find a smooth curve that follows the trend of the data but is fully characterized by a few parameters. While a parametric model has the disadvantage that it predefines the general shape of the curve, it has the advantage that it can be used for extrapolation, which may be relevant in the estimation of transport parameters using temporal moments. If the fitted curve preserves the main features of the original data, e.g., first arrival, peak, and observed part of tail of a breakthrough curve, it may be considered instead of the actual data. If the main features are not described well enough, they should not be used.
The GIG distribution
is defined as:
in which
is the modified Bessel function of the second kind and order
p, and
a and
b are non-negative parameters. The GIG distribution includes the inverse Gaussian distribution, which is the analytical solution for the 1-D advection–dispersion equation with uniform coefficients, as special case (
) but adding another coefficient facilitates distributions that are more skewed than the inverse Gaussian. The version of the GIG distribution we present in Equation (
15) includes a
and
for shifts in both dimensions, accounting for delayed arrival times and background values in the observed signal, and a factor
that scales the distribution to the units of the data. These parameters regulate the signal recovery, offsets in the baseline of the online fluorometer, and time shifts. The fitting was performed using nonlinear least squares with the Levenberg–Marquardt algorithm. As an example,
Figure 3 shows the raw data and the corresponding fitted distribution for the breakthrough curve measured at extraction well B6 during tracer test 3b. Despite the high noise level in the raw signal (gray line), the resulting fitted curve (black dashed line) is a noise-free breakthrough curve with all relevant features of the original data.
To estimate effective values of hydraulic conductivity and specific storage coefficient, we fitted the Theis solution [
63] to selected drawdown curves, accounting for multiple injection and extraction wells. Even though the aquifer under investigation is an unconfined system, previous investigations have shown that assuming confined conditions is acceptable for the time scales of the current experiments and for pumping tests in which the drawdown is considerably smaller than the thickness of the aquifer. Effective porosities and dispersivities were estimated by fitting a travel-time based semianalytical solution of the advection–dispersion equation (ADE, Equation (
6)) [
67], to the breakthrough curves measured at extraction wells B6 and B7. The latter can represent one-dimensional advection and dispersion processes, combined with a dual-porosity domain and a first-order mass transfer reaction between the two regions. The analytical solution exists only in the Laplace domain and the final result is obtained via numerical inverse Laplace transformation [
88]. Type-curve matching of the Theis solution to the drawdown data was performed using the Nelder–Mead simplex algorithm that evaluates the fit using direct function evaluations, rather than derivatives. We fitted the travel-time based semianalytical solution to the breakthrough curves using the global search algorithm DREAM-ZS, which is an advanced Markov chain Monte Carlo sampler that provides the estimation together with the associated uncertainty bounds [
89].
The full processed dataset of the tracer tomography test consists of 52 drawdown curves and 46 breakthrough curves (
Figure 4,
Figure 5 and
Figure 6). Hydraulic head changes observed during test 2a could not be retrieved due to technical problems with the data loggers. We selected 27 characteristic time points from each processed drawdown curve, for both the estimation of effective aquifer parameters with the Theis solution and for model inversion. The number of characteristic points selected from the breakthrough curves varied between 27 and 34 characteristic points, depending on the length of each test.
Hydraulic responses to injection and extraction of water were similar throughout the five tests shown in
Figure 4. The small differences between tests were attributed to the slightly different injection and extraction rates applied. For tests 1a and 1b, steady state was achieved after 5000 s, whereas in all other tests steady-state conditions were not observed before 7000 s. Multiple points taken from the steady-state portion of the drawdown curves would not provide additional information for parameter estimation; therefore, the curves were trimmed at the first time point at which steady-state flow was observed at most locations.
Figure 5 shows the tracer breakthrough curves, normalized by the injected mass, measured at extraction wells B6 and B7 during the six tracer tests of the tracer tomography experiment. The tracer was detected in both extraction wells; however, the concentrations observed in well B7 were one order of magnitude smaller than those of well B6. The difference in peak arrival times observed in tests with tracer injection in the same section (e.g., tracer tests 1a and 1b) is attributed to the slightly different pumping rates applied. Please note that the breakthrough curves measured at pumping well B6 during tests 1b and 2a (
Figure 5 top) still show some level of noise, which means that it was not possible to fit the GIG parametric function in a satisfactory way and instead the smoothed curve had to be preserved.
The recovered tracer mass was calculated for the breakthrough curves measured at wells B6 and B7 with temporal moments, where the 0th temporal moment represents the effective tracer mass (mass per discharge). The recovered tracer mass indicates the reliability of the recorded data. However, the estimation is highly affected by all processing steps required to transform the raw and noisy breakthrough curves into a smooth time series in units of concentration. We estimated a tracer recovery of 87% and 62% for tracer tests 2a and 2b, respectively. The lowest tracer recovery was estimated for tracer test 3b (32%), indicating that the majority of the tracer was lost into ambient flow or remained somewhere in the domain. Disturbances of the steady-state hydraulic field were not observed, and the reason for the low recovered mass could not be identified. A larger measurement uncertainty was assigned to this dataset when it was used for parameter estimation. The overestimation of the recovered mass for tracer tests 1a, 1b, and 3a (115%, 107%, and 117%, respectively) was attributed to complications during water sampling and uncertainties in the laboratory measurements, leading to errors when the data was scaled to units of concentration.
The breakthrough curves measured in all additional observation wells are shown in
Figure 6, normalized by the injected mass. The three plots on the left correspond to the first test series (series
a), and the plots on the right are the breakthrough curves of the second test series (series
b). A visual inspection of all curves reveals that independent of the tracer injection location, the tracer plume was well distributed horizontally and vertically in the investigated area. Breakthrough curves were observed even at the wells located at the corners of the observation well grid (e.g., in wells ow4, ow13, and ow16) and also at different depths of the multilevel observation wells (e.g., in wells cmt1, cmt2, and cmt3). Except for test 1a, tracer concentrations decreased with a shift of the tracer injection towards the bottom of the aquifer. The highest concentrations were measured in the observation point cmt1-3, which is located at a depth of 4m and directly next to the injection well B3. The vertical location of this point coincides with the top tracer injection section. High tracer concentrations were observed at this point also in tests 3a and 3b, indicating a large and fast vertical spread of the plume. Breakthrough curve tailing was accentuated in the tests with tracer injection at the lower sections of the aquifer. This might indicate a higher degree of heterogeneity and zones with lower hydraulic conductivity.
Figure 7 exemplarily illustrates the fitting of the analytical and semianalytical solutions to some drawdown (top) and breakthrough curves (bottom), and
Figure 8 summarizes with violin plots the statistical distribution of the estimated aquifer parameters. In addition to the mean and median (solid and dotted black lines in
Figure 8, respectively), violin plots show the shape of the full distribution of a dataset.
In general, the Theis model matches well the hydraulic response to the injection and extraction of water and fits the steady-state values observed during the pumping tests. A mean hydraulic conductivity
K of
ms
was estimated after averaging the individual values calculated for all drawdown curves of all tests. The corresponding violin plot for
K (
Figure 8, first column) shows that although the majority of the estimated values cluster around the median, the values distribute more or less uniformly across the estimated range. These estimates agree with previous research performed at the research site Lauswiesen [
77,
80,
81,
82]. The observed drawdown is representative of the volume of the domain being investigated. This volume expands with pumping time, producing hydraulic responses that are increasingly averaged [
90]. This effect is evidenced in the relatively small coefficient of variation (CV) of
calculated for the Theis curve derived hydraulic conductivities. We estimated a mean specific storage coefficient
of
m
which is typical of confined systems. The large variability in the estimation of
(CV = 127%) can be seen as an aliasing effect in which unresolved conductivity appears as a variation of aquifer storativity and is a direct consequence of the application of methods derived under the assumption of homogeneity in heterogeneous systems [
90,
91,
92]. The violin plot for
(
Figure 8, second column) shows that despite the wide range observed in the estimated values of
, the most probable values of
are well grouped near the median.
Figure 7 (bottom) shows the semianalytical model fitted to the breakthrough curves observed at extraction well B6 during tests 2b (gray solid line) and 3b (orange solid line), together with the associated uncertainty bounds (shaded areas) for a
confidence interval. The semianalytical model reproduces the overall shape of the observed breakthrough curves well, with the uncertainty bounds covering the measured data at almost every point in the curve. The largest misfits occur at the first arrival and tail of the curves, for which the field data fall outside the uncertainty bounds. This might indicate the unresolved small-scale variability of aquifer heterogeneity by a semianalytical model. Similar fitted curves were obtained for the rest of the breakthrough curves measured at wells B6 and B7 during all additional tests, and the statistics of all the estimated transport parameters (first-order mass transfer coefficient, mobile and immobile porosities, and dispersivity) are shown in the last four columns of
Figure 8.
After fitting all breakthrough curves, we estimated a mean mass transfer coefficient (
) and longitudinal dispersivity (
) of
× 10
m
and 1.3 m, respectively. These estimates are in agreement with values previously reported for the Lauswiesen site [
25,
78]. We also estimated mean mobile (
) and immobile (
) phase porosities of
and
, respectively. However, the violin plots for the porosities (
Figure 8 columns 4 and 5) show that mean porosity values are strongly affected by values close to
. Although the median is less affected by extreme values, the small median values estimated for the mobile and immobile porosities (
and
) are unrealistically low. We have found similar mobile porosity values in preceding studies after fitting an analytical solution to tracer data collected at the same field site, but in those studies total porosity values close to
were needed to improve transport simulations [
25]. To account for the uncertainty observed in the porosity estimates, spatially uniform mobile and immobile porosities were also updated during the assimilation of the tracer data.
7. Results and Discussion
In the following we first compare the hydraulic conductivity fields estimated with both ensemble-Kalman filter based methods, and then discuss model performance based on the ensemble simulations of the pumping and tracer tests.
Figure 10 shows details of the estimated mean log-hydraulic conductivity field (A and C) and the associated ensemble variances (B and D) after all drawdown and tracer data were assimilated. The top row of
Figure 10 corresponds to the results obtained with the EnKF, and the bottom row shows the results of the KEG.
To help visualize the three-dimensional fields,
Figure 10A,C includes slices enclosing the area of interest, an extra horizontal slice at the middle of the aquifer, and a cross section along the location of the injection and extraction wells. A visual inspection reveals that the hydraulic structure differs between the two estimated 3-D fields, although both log-hydraulic conductivity fields show a similar statistical distribution, as observed in the histograms computed for the full aquifer thickness (
Figure 10E). Higher conductivity values dominate the upper portion of the aquifer for the KEG, while they are more evenly distributed across the entire aquifer for the EnKF (
Figure 10F,G). While the KEG preserves the two different peaks in the distribution enforced in the generation of the parameter fields, a single peak distribution is observed for the EnKF (
Figure 10E). In both cases lower-conductive zones are accentuated at the bottom. These results are consistent with the prior geostatistical distributions assigned for generating the ensemble of parameters and agree with previous studies in which a more conductive upper layer overlaying a less conductive and more heterogeneous lower layer was found [
41,
82].
The exact structure of the aquifer material at the site is unknown, which hampers evaluating the correctness of the estimated hydraulic conductivity fields. Instead of a direct comparison between available in-situ measurements of hydraulic conductivity and estimated values with the two ensemble-Kalman filter based methods, we compared the spatial structure of the estimated mean ensemble hydraulic conductivity fields, with the findings of Lessoff et al. [
82] (see
Table 4). Differences in the support volume of the direct-push measurements and of the pumping and tracer test data render a one-to-one comparison between available point measurements of (relative) hydraulic conductivity and the estimated hydraulic conductivity of the corresponding grid cells in the suboptimal models. We thus subdivided both model grids in two major units—an upper layer with a thickness of 2 m and a 6 m thick lower layer—and estimated the corresponding mean and variance of the mean ensemble log-K fields obtained with both the EnKF and the KEG.
The spatial variability of the ensemble mean of log-hydraulic conductivity at the upper layer is lower than for the lower layer in both the EnKF and KEG, which is in agreement with the statistical analysis of Lessoff et al. [
82]. The similarity between the mean of the ensemble mean log-hydraulic conductivity for the EnKF and the values reported by [
82] (which were used to generate the initial ensemble) demonstrate that the EnKF is not able to shift the distribution away from the prescribed geostatistical parameters, even if the reference data is not honored (see below). While this is usually not a problem in synthetic studies, where the geostatistical parameters used to generate the initial ensemble are known, it is a strong limitation in field applications, where those parameters are unknown.
By the end of the data assimilation, relevant reductions in the ensemble variance of log-hydraulic conductivity were observed across the entire area of interest for both methods but especially for the upper layer (see
Figure 10B,D), indicating filter inbreeding. In both methods, two different zones with distinct mean variance and structure could be observed. For the EnKF, a thin high variance zone was developed at the bottom of the aquifer, whereas the distinction between the two zones in the KEG is closer to the transition between the upper and lower aquifer layers that was enforced during the ensemble generation. Furthermore, a strong reduction in variance for the EnKF (
Figure 10B) is observed all over the upper ∼6 m of the aquifer, even at places far away from sensitive regions, indicating spurious correlations during parameter updating. In contrast, the ensemble variance of the KEG (
Figure 10D) is noticeably reduced only in the regions where the observation wells are located, suggesting fewer and smaller artifacts introduced in the numerical covariances during parameter updating.
With the EnKF, we estimated a final ensemble mean aquifer specific storage coefficient
of 0.04 m
with an associated ensemble variance of 0.3 m
. The estimated mean
falls within the range of values obtained with type-curve matching of the field drawdown curves (see
Figure 8), and can be considered typical of unconfined systems. The high ensemble variance obtained indicates difficulties in updating a unique effective
. With the EnKF, we also estimated a final ensemble mean of mobile and immobile porosities, mass transfer coefficient, and longitudinal dispersivity of 2.0%, 10%,
× 10
s
and 0.5 m, respectively. As for
, no ensemble variance reduction of these effective parameters was observed. We attribute the inability of the filter to reduce the variance to the underrepresented aquifer heterogeneity.
Figure 11a shows a scatter plot of the ensemble-mean and spread of simulated versus measured drawdown for the last pumping test (pumping test 1b) assimilated. Transient drawdown data, which were used by the EnKF are shown in black crosses. Blue dots are the drawdown values simulated at the last assimilation step with the EnKF, which correspond to the steady-state values used by the KEG (red dots). We observe that the EnKF has difficulties simulating properly very low and high drawdown values, underestimating the majority of the measured data. The KEG shows better simulations than the EnKF, with an ensemble spread that covers the measured values at almost all observation points.
Additionally,
Figure 11b,c are the distribution of the normalized errors (by their corresponding error measurement), for the EnKF and the KEG, respectively. The black lines in
Figure 11b,c are normal distributions fitted to the corresponding residuals. While the residuals of the EnKF show a better shape, the corresponding standard deviation of 5.71 indicates that the residuals are considerably larger than explained by measurement error. The residuals of the KEG are skewed , but the distribution is considerably narrower than the distribution of residuals of the EnKF.
A gradual improvement in the EnKF predictions was observed after transient data from each pumping test were included. This is confirmed by a decrease of the RMSE from 15 cm after assimilating data from the first pumping test, to 3.5 cm after all data were assimilated. The larger misfit is observed at early-time drawdown, especially during the assimilation of data from pumping tests 3b and 3a. We attribute this to assimilating a single uniform effective specific storage coefficient, leaving relevant variability of aquifer storativity unresolved. At late times, steady state is achieved, which does not depend on the storativity at all, and an improvement on the simulated drawdown is observed. An alternative explanation could be that small-scale features of the conductivity field are not properly resolved even though we work with realizations exhibiting variability on all scales. The early-time drawdown is heavily affected by hydraulic conductivity in the direct vicinity of the pumping well, whereas the support volume of late-time drawdown measurements is bigger. The KEG achieves a similar reduction of the RMSE, with an RMSE-value of 4 cm at the end of the assimilation of steady-state drawdown. The final RMSE-values obtained with both methods are bigger than the measurement error assumed during updating ( mm).
These results suggest that no real advantage of transient records exist over the steady-state values. Higher computational resources are required by transient simulations, and to stabilize the parameter updates with the EnKF and reduce filter inbreeding, a strong damping factor had to be applied. In contrast, only steady-state simulations were required by the KEG, and no damping was necessary. Transient data would be advantageous in cases when variability in aquifer storativity is considered, which could not be estimated from steady-state information.
Figure 12 shows, for each tracer test assimilated with the EnKF, an example of the ensemble predictions of cumulative concentrations as a function of time. The simulations of the breakthrough curves were performed using the ensemble of parameters obtained at the end of the assimilation of each test. Field measurements are represented with black solid lines, and the ensemble median and 90% confidence interval of the ensemble predictions are shown with dashed lines and as gray zones, respectively.
Large discrepancies were observed between the ensemble predictions and the observed breakthrough curves. Even though there is a slight improvement after each test is assimilated, the EnKF is not capable of fixing the parameters to honor the tracer data. A reason for the failure of the EnKF with tracer data could be attributed to the strong damping applied, which limits the improvement during parameter updates. In addition, a big difficulty lies in the sequential approach of the EnKF itself, and more so when parameters, and not models states, are included in the update. In each update step, the parameters are modified to obtain a good fit at that time, potentially affecting the fitting at earlier times. The latter is not the case for a batch calibration, where parameters are updated considering all data at once. Furthermore, effective parameters fail to resolve local variations in the aquifer material. To improve model simulations, transport parameters such as , and would most likely have to be considered as spatially variable during updating, increasing the computational requirements. The systematic underestimation of the predicted mean cumulative concentrations observed in tracer tests 3a and 2a can be attributed to biases in the estimation of effective transport parameters and errors in the log-K fields.
Figure 13 shows in the upper row, scatter plots of the reference mean tracer arrival times versus the corresponding ensemble mean (dots) and spread (solid lines) obtained with the KEG.
Figure 13 also includes the root mean square errors and mean absolute errors for the mean tracer arrival times predictions normalized by the difference between maximum and minimum arrival times for each test (NRMSE and NMAE, respectively). The histograms in the bottom row of
Figure 13 are the distribution of the normalized errors for each test. The mean of the histograms is close to zero for all three tests, and while the spread in the normalized residuals for tests 3a and 2a is reasonable, for the latter fitting almost perfectly to a standard normal distribution, it increases considerably for tracer test 3b. We expected a lower performance for tracer test 3b, having the lowest recovery rates of all tests, and therefore larger uncertainties associated to the measured data.
In general, acceptable mean ensemble predictions were observed with the KEG, with uncertainty bounds better constrained than by the EnKF. Furthermore, we observed a decrease in the NRMSE and MAE after each tracer test data was assimilated. Mean tracer arrival times are less sensitive to artifacts caused by, e.g., sensor instabilities, and therefore introduce less measurement uncertainty during data assimilation. The largest ensemble spread occurred at the largest mean arrival times. A similar behavior has been observed before [
58] and is related to observation points far away from the injection, at the limits of the study area for which data from only one tracer test were measured (e.g., ow-14 and ow-16). At these locations the ensemble variance of log-hydraulic conductivity is relatively high in comparison to the central region of the domain.
The histogram of the normalized residuals for test 2a shows a reasonable fit to the normal distribution but not for tracer tests 3a and 3b, in which the histograms are skewed. This can be attributed to the non-Gaussian distribution of the mean arrival times, which is bounded on one side by zero. It is not guaranteed that bounded quantities can assume prescribed distributions, especially if the measured values are close to the bounding values.
To validate both conditioned parameter ensembles, we evaluated the steady-state simulations of pumping test 1a, which was not used for updating the parameters.
Figure 14 shows a calibration plot that compares the predicted with the measured steady-state drawdown. The blue and red dots correspond to the simulations performed with the ensemble of parameters updated with the EnKF and KEG, respectively. Acceptable ensemble predictions were observed for the KEG. This is confirmed by an RMSE of 3 cm. In contrast, we estimated an RMSE of 24 cm for the simulations with the parameters conditioned with the EnKF and observed a systematic overestimation of the drawdown. These results emphasize the advantage of compressing the concentration data to mean arrival times, which stabilizes the parameter updates during the assimilation of tracer data at dramatically reduced computational costs.
8. Conclusions and Recommendations
In this study, we have presented the entire workflow for combined field-scale hydraulic and solute-tracer tomography experiments, data preprocessing, preliminary analysis of the data, and inversion by ensemble-Kalman filter based approaches to obtain hydraulic aquifer parameters. We have applied the field method at the hydrogeological research site Lauswiesen in Germany, obtaining a dataset that consists of 52 drawdown curves and 46 breakthrough curves. The preliminary aquifer characterization provided information that was used to define the prior distributions of aquifer parameters needed in stochastic parameter estimation.
For the estimation of the spatially distributed hydraulic-conductivity field, we used the same models and inversion methods as in the virtual tests presented in the Sanchez et al. [
58] but extended the problem to three dimensions. When using the ensemble Kalman filter (EnKF), we applied the standard update scheme to assimilate transient drawdown data and the restart scheme for the transient records of cumulative tracer concentrations. When using the Kalman Ensemble Generator (KEG), by contrast, only steady-state hydraulic heads and mean breakthrough times were used for parameter estimation. The latter excluded the estimation of dual-domain properties of the aquifer.
In contrast to the findings of the companion paper, in which the assimilation of transient drawdown outperformed the steady-state data, we observed no real advantage of using transient records. We partially attribute the poor performance of the EnKF with the transient drawdown data to the unknown specific storage coefficient. While the specific storage coefficient was also updated, we considered only an effective uniform parameter. By ignoring the spatial variability of storativity, the EnKF largely penalizes the hydraulic conductivity for the model misfit at early times, whereas in reality the poor model performance might be caused by the unrepresented variability of storativity. This can lead to erroneous initial parameter updates that propagate to later times. Unresolved variability of aquifer storativity is also reflected in the relatively large error metrics estimated for early-time drawdown data. Model simulation errors are reduced at later update steps, when there is no more dependence of the drawdown on aquifer storativity.
The very small ensemble spread obtained after assimilating both transient drawdown and tracer concentrations by the EnKF indicates severe filter inbreeding caused by erroneous correlations, despite the strong damping applied. This leads to a deterioration in the quality of the update that ultimately translates into large errors in the ensemble simulations. Better behaved covariances were obtained with the KEG, suggesting more stable parameter updates when steady-state and mean breakthrough times were used. For well-controlled hydraulic tests, we thus recommend batch-calibration techniques, in which all data are accounted for simultaneously rather than a classical filter (or smoother) in which sequential parameter updating is performed. Condensing the concentration data to their temporal moments increases the computational efficiency of the inversion but may be challenging if the time series are incomplete.
The ensemble-mean of log-hydraulic conductivity obtained with both ensemble-Kalman filter based methods, preserved the two prescribed aquifer zones. A highly conductive zone at the top of the aquifer and low-conductivity material dominating the bottom. However, the transition between the two layers was shifted to the bottom of the aquifer with the EnKF.
Even though we worked with a relatively dense monitoring network at our site, it remains difficult to capture important features of aquifer heterogeneity at scales smaller than the distance between the observation points (∼3 m in the horizontal directions). Additional information could be gained by installing new wells perpendicular to the main flow direction, that would function as additional and interchangeable injection and extraction wells. By using these wells, groundwater flow would be forced perpendicular to the natural flow direction, producing flow regimes considerably different to those generated with the wells used for our tests, which may also provide insight into hydraulic anisotropy.
Currently, we are advancing the field method to include a salt tracer that can be monitored using cross-hole time-lapse electrical resistivity tomography (ERT). We aim for a design that allows the integration of both tracers in a single tomographic test with minimal additional effort. The sensitivity of the geoelectrical measurements would provide information of aquifer heterogeneity that is not restricted to the direct vicinity of the observation wells and in combination with the hydraulic and concentration data, a better overall picture of the aquifer properties that may be obtained. The inclusion of electrical resistivity data obtained during salt-tracer tests in the ensemble-Kalman filter based methods is straightforward but involves estimating additional parameters and significantly increases the computational effort. As ERT is sensitive to salt both in the mobile and immobile fractions of the pore space, combining concentration measurements in wells with ERT data may also shed additional light onto dual-domain transport behavior.