1. Introduction
The effective management of groundwater contamination requires the joint estimation of contaminant source parameters and hydraulic conductivity fields to accurately assess the impacts and design remediation strategies [
1,
2,
3]. Given the inaccessibility and complexity of subsurface systems, direct measurement of these parameters is impractical, necessitating reliance on sparse indirect observations such as hydraulic heads and contaminant concentrations from limited monitoring wells. This inference constitutes a high-dimensional inverse problem that demands repeated execution of a numerical model coupling groundwater flow and solute transport until the simulated responses adequately match the observations [
4,
5]. Substantial aquifer heterogeneity, driven by incomplete knowledge of geological structures, necessitates representing hydraulic properties and release histories with a large number of parameters, leading to a super-linear rise in computational expense for joint estimation [
6,
7]. These challenges highlight the urgent need for efficient surrogate modeling and inversion techniques to achieve reliable parameter estimation under realistic observational constraints.
Given such limited data resources and a vast search space for finding optimal solutions, various methods have been proposed. Traditional optimization-based approaches, such as least square regression [
8], nonlinear programming [
9], and hybrid optimization with genetic algorithms [
10,
11], can provide acceptable predictions for release histories but often fail to fully quantify uncertainties in the inverse results [
12]. Recent studies have extended traditional optimization to metaheuristic algorithms for parameter estimation in transport models [
13,
14]. A novel inverse model based on teaching learning-based optimization (TLBO) was developed for the continuous time random walk-truncated power law (CTRW-TPL) model, demonstrating low sensitivity to initial guesses and higher accuracy in estimating key parameters compared with the standard CTRW MATLAB R2017a toolbox, across synthetic, experimental, and direct numerical simulation breakthrough data in porous and fractured media [
13]. In contrast, statistical methods, including data assimilation with extended and ensemble Kalman filters [
15,
16,
17], Bayesian inference based on Markov chain Monte Carlo (MCMC) [
18], and geostatistical approaches combined with adjoint methodologies [
19], offer capabilities for uncertainty estimation. Ensemble-based data assimilation methods have therefore emerged as an efficient alternative, combining Monte Carlo sampling with sequential or batch data incorporation to jointly estimate parameters and their uncertainties at manageable cost [
20,
21,
22]. Ensemble Smoother (ES) and its iterative variants assimilate all observations in a single global update while avoiding state-parameter inconsistencies seen in the Ensemble Kalman Filter [
23]. To enhance the robustness in strongly nonlinear and multimodal problems common in contaminant transport, Zhang et al. (2018) developed the Iterative Local Updating Ensemble Smoother (ILUES), which conducts localized ensemble updates using a combined parameter-and-response distance metric to mitigate ensemble collapse and improve convergence [
24].
To address these computational burdens, surrogate-based approaches have emerged as cost-effective approximations of the input–output relationships in computationally intensive forward simulators. Recently, deep neural network (DNN)-based surrogates have gained prominence for their ability to capture complex mappings [
25,
26,
27]. Zhu and Zabaras (2018) pioneered image-to-image regression using fully convolutional networks (FCNs) to directly model high-dimensional inputs and outputs as images, transforming surrogate construction into an image regression task and yielding efficient solutions for intricate problems [
28]. This idea has since been widely adopted and extended into various efficient encoder–decoder architectures, such as the Attention U-Net for steady-state flow fields [
29] and the U-Net for transient flow inversion [
30]. These models achieve end-to-end learning, reducing the computational time by orders of magnitude while preserving the accuracy.
Further developments have focused on handling temporal dynamics in contaminant transport. Mo et al. introduced deep autoregressive strategy in a dense convolutional encoder–decoder to surrogate groundwater contaminant transport, coupling it with ILUES for high-dimensional parameter estimation [
31], while Bai and Tahmasebi (2022) combined this autoregressive idea with a transformer-based model for spatiotemporal forecasting [
25]. Extensions include residual dense networks for improved feature reuse in multi-source scenarios [
32], ResNet surrogates with enhanced particle filters for DNAPL characterization [
33], BPNN surrogates coupled with optimizers for monitoring under uncertainty [
34], and hybrid kernel models for sensitivity-driven inversion [
35]. These innovations collectively advance surrogates from static snapshots to dynamic full-field time-series approximations, demonstrating improved accuracy in capturing nonlinear dynamics and uncertainties.
Many existing convolutional or recurrent architectures rely on a large number of trainable parameters and dense feature transformations, which substantially increase the training cost and memory requirements when high-resolution spatial fields and long temporal horizons are considered [
36]. To address these challenges, lightweight convolutional architectures have emerged as a promising alternative. Depthwise separable convolution, originally developed to reduce computational complexity in computer vision tasks, decomposes standard convolution into channel-wise spatial filtering and pointwise feature fusion, leading to a significant reduction in parameter count and floating-point operations without altering the representational capacity of the network [
37]. When combined with dense connectivity and encoder–decoder structures, depthwise convolution enables the efficient extraction and reuse of multiscale spatial features, making it well suited for surrogate modeling of groundwater flow and contaminant transport processes characterized by spatial heterogeneity.
In parallel, accurately capturing temporal dynamics remains a central challenge for surrogate models of contaminant transport. Autoregressive learning strategies provide a flexible framework for modeling transient evolution by explicitly conditioning future states on previously predicted system responses. Compared to recurrent architectures, autoregressive convolutional models avoid sequential backpropagation through time and offer improved numerical stability and scalability for long-term simulations. When integrated with a convolution-based surrogate, this strategy allows the generation of spatially continuous concentration fields over the entire simulation domain at each time step, rather than being limited to sparse observation points.
Despite the growing adoption of surrogate models, their effective integration with inverse modeling frameworks remains a challenging task. Ensemble-based data assimilation methods such as ILUES have demonstrated robustness in addressing strong nonlinearity and multimodality by employing localized ensemble updates. However, convergence can be sensitive to ensemble size and the degree of model nonlinearity, particularly in high-dimensional parameter spaces involving both hydraulic conductivity fields and time-dependent source information. Incorporating Levenberg–Marquardt regularization into the iterative ensemble smoother provides an adaptive mechanism between gradient-based correction and stabilization, enhancing the convergence behavior and mitigating ensemble collapse in ill-posed inverse problems [
38].
Motivated by these considerations, an autoregressive depthwise convolutional neural network is developed as a computationally efficient surrogate to replace the forward coupled groundwater flow and contaminant transport model. The proposed network preserves the spatial resolution of the simulation domain and enables continuous temporal prediction through an autoregressive formulation, while substantially reducing the computational cost through depthwise separable convolution. This surrogate is further integrated with an improved Iterative Local Updating Ensemble Smoother with Levenberg–Marquardt regularization (ILUES-LM) to jointly estimate contaminant source characteristics and heterogeneous hydraulic conductivity fields under sparse observation conditions. The combined framework offers a balanced combination of predictive accuracy, uncertainty quantification, and computational efficiency, thereby enhancing the practicality of high-dimensional groundwater model parameters joint inversion.
The remainder of this study is organized as follows:
Section 2 describes the methodology, including the governing equations, the construction of the AR-DWCNN surrogate, and the formulation of the ILUES-LM inversion algorithm;
Section 3 presents the case study setup and evaluation metrics;
Section 4 discusses the results; and
Section 5 concludes with key findings and future directions.
4. Results and Discussion
4.1. Performance of the Proposed Surrogate
The proposed Autoregressive Depthwise Separable Convolutional Neural Network (AR-DWCNN) surrogate achieves predictive accuracy comparable to the baseline AR-Net on the testing set, while offering substantial advantages in training efficiency and model compactness. Across 10 independent realizations, the AR-DWCNN achieves a higher coefficient of determination (R
2 = 0.989) and a lower root mean square error (RMSE = 0.056), compared with R
2 = 0.984 and RMSE = 0.067 obtained by AR-Net for concentration fields at the final time step (t = 16 [T]). Additionally, AR-DWCNN exhibits a smaller mean bias error (MBE = 0.017 vs. 0.021), reduced uncertainty (U95 = 0.280 vs. 0.332), and a lower global performance indicator (GPI = 5.004 × 10
−5 vs. 1.405 × 10
−4), indicating the improved overall accuracy, stability, and reliability of the surrogate predictions. This parity in fidelity is further demonstrated in
Figure 5 and
Figure 6 for a random selected testing sample, which compare plume evolution at selected time steps (t = 4, 8, 12, and 16 [T]) for both surrogates against the reference fields generated by the physics-based forward model. Both AR-DWCNN (
Figure 5) and AR-Net (
Figure 6) successfully capture the overall shape and migration patterns of the contaminant plume under the influence of heterogeneous hydraulic conductivity and time-varying source releases. The predicted plumes closely align with the reference in terms of the spatial distribution and concentration magnitude at all displayed time steps, confirming the effectiveness of the autoregressive strategy in modeling temporal dependencies.
Minor discrepancies between the surrogates and the reference are primarily observed near the contaminant sources and areas, where concentration gradients are steep, and local variability is high. In these regions, small spatial shifts or smoothing effects inherent to convolutional approximations can lead to relatively larger absolute errors. Such deviations are common in data-driven surrogates trained on limited samples and are more pronounced in areas of rapid concentration change driven by source proximity [
29,
30]. The visual and quantitative agreement validates the capability of AR-DWCNN to capture the full-field groundwater flow and contaminant transport process, providing a reliable and computationally lightweight alternative to validated architecture like AR-Net.
Figure 7 provides point-wise scatter plots comparing the predicted versus reference concentrations for all grid cells across the selected testing set at 8 time steps. Both surrogate models perform well in low-concentration regimes (near-zero values), where scatter points closely follow the 1:1 (45°) reference line across all observation times. This indicates an accurate representation of background concentrations and the contaminant plume margins, which occupy the majority of the spatial domain. Such strong agreement in low-concentration regions is consistent with the plume morphology comparisons shown in
Figure 5 and
Figure 6, demonstrating that the overall plume and low-concentration areas are well captured by both surrogate models.
In high-concentration regions, both models exhibit noticeable deviations, with points scattering below the 45° line, indicating a tendency to underestimate the peak values. These discrepancies are most evident near the contaminant sources and within plume cores, where steep gradients and rapid temporal changes prevail, due to time-varying releases and heterogeneous hydraulic conductivity. The underestimation arises from convolutional smoothing effects and limited training realizations failing to fully resolve extreme localized peaks, a common problem in data-driven surrogates for solute transport. AR-DWCNN shows slightly tighter clustering and higher R2 values overall (ranging from 0.995 at t = 2 [T] to 0.983 at later steps) compared to AR-Net (0.979 at t = 2 [T]), demonstrating marginally better handling of high-value variability.
The accuracy of the surrogate predictions at observation locations is particularly relevant for subsequent data assimilation, as these represent the sparse measurements typically available in real-world inverse problems.
Figure 8 displays breakthrough curves at 15 monitoring wells across the full observation period for the same randomly selected test realization shown in
Figure 5 and
Figure 6. Due to the finite-duration release of the contaminant source (stops after t ≥ 8T), observation wells closer to the source or on the main flow path (e.g., wells 6–9) exhibit complete “rise–peak–decline” features within the simulation time, while more distant or transversely influenced wells (e.g., wells 2–5 and 11–15) remain in the concentration rising phase during the 0–16 T period and have not yet reached their peak, thus not displaying complete breakthrough curve morphology. Both surrogates reproduce general trends effectively, including peak arrival times, maximum concentrations, and tailing behavior at most wells. This agreement confirms that both models provide reliable concentration data at observation points, which is important for driving accurate inversion in data assimilation. AR-DWCNN generally tracks the reference curves more closely than AR-Net, particularly during rapid rise and peak phases. For instance, at upstream wells (e.g., Wells 5, 7, 9) influenced by source proximity, AR-DWCNN better matches the peak timings and magnitudes, while AR-Net shows slight delays or overestimations in some cases. Downstream wells (e.g., Wells 11, 13, 14) exhibit smoother lower-amplitude responses due to dispersion; here, both surrogates perform well, though AR-DWCNN exhibits marginally less deviation in tailing sections. Across the testing set, the mean RMSE of breakthrough curves for AR-DSCNN is 0.019, compared to 0.032 for AR-Net, confirming substantially higher prediction accuracy at observations despite the lightweight architecture.
The training metrics present the efficiency of the proposed AR-DWCNN compared to the AR-Net when both models are trained for the same number of epochs (200 epochs). Despite identical training epochs, AR-DWCNN converges to a lower mean validation loss of 0.012, compared to 0.018 for AR-Net. More importantly, the AR-DWCNN contains only 1,813,730 trainable parameters, representing a reduction of approximately 48% relative to the 3,490,020 parameters of AR-Net. This substantial decrease in model complexity directly transforms into significant computational savings: training time on a single NVIDIA GeForce RTX 3090 is reduced by about 37% (1590 s versus 2536 s). After training, the single-realization prediction time is comparable for the two models, amounting to 0.1462 s for AR-DWCNN and 0.1572 s for AR-Net, respectively. These efficiency gains primarily arise from the adoption of depthwise separable convolutions, which replace standard convolutions and substantially minimize redundant computations without sacrificing capacity to capture complex spatiotemporal dependencies in groundwater flow and contaminant transport processes. Even under the same 200-epoch training constraint, the AR-DWCNN offers approximately 48% fewer trainable parameters and 37% shorter training time than AR-Net, while delivering comparable or slightly superior predictive accuracy. This lightweight design is particularly advantageous for constructing surrogate under limited computational resources, enabling faster iteration during model development and more efficient deployment in ensemble-based inverse method.
4.2. Inversion Results of ILUES-LM
The ILUES-LM algorithm is employed to inverse 321 parameters using the AR-DWCNN surrogate model, which is constructed from 1500 forward model input–output dataset. The forward model is fully replaced by this surrogate, eliminating additional forward simulation executions during the inversion process. To evaluate the accuracy and computational efficiency of the high-dimensional parameter inversion combining the AR-DWCNN surrogate with ILUES-LM (surrogate-based ILUES-LM), the results from ILUES-LM using the original physics-based forward models (physics-based ILUES-LM) serve as a reference.
For both approaches, an ensemble size of 3300 and iteration number of 10 are determined to handle high-dimensional parameter joint inversion and quantify parameter uncertainty adequately.
Figure 9 illustrates the inversion results for the 4 contaminant source strength parameters with the increase in iteration for physics-based ILUES-LM (left column) and surrogate-based ILUES-LM (right column). In both cases, the ensemble means move toward the true values within the early iterations, accompanied by a progressive reduction in ensemble spread. Compared to the physics-based implementation, the surrogate-based ILUES-LM exhibits a slightly slower convergence rate; nevertheless, it continues to evolve steadily and ultimately converges to values close to the reference. Therefore, the surrogate-based approach demonstrates convergence behavior that is qualitatively consistent with the physics-based inversion, confirming its capability to accurately identify source strengths.
Figure 10 and
Figure 11 present the reference hydraulic conductivity field alongside the estimated mean field for the final ensemble, a randomly selected posterior realization, and the variance field of the final ensemble, obtained from the physics-based ILUES-LM and surrogate-based ILUES-LM, respectively. As shown in
Figure 10, the physics-based ILUES-LM successfully reconstructs the main patterns of the reference field. The mean estimate clearly shows the continuous high-conductivity zones (yellow-orange areas) and surrounding low-conductivity regions (blue areas). The randomly selected posterior realization preserves the spatial connectivity and variability of these zones. The variance field is generally low across most of the domain, with slightly higher values in a few isolated points, reflecting limited influence from distant observations.
The surrogate-based ILUES-LM produces very similar results (
Figure 11). The mean estimate captures the overall structure of high- and low-conductivity zones, with only minor underestimation. The selected posterior realization exhibits comparable heterogeneity and continuity to the physics-based case. The variance field is marginally higher, which can be attributed to residual discrepancies between the selected observations and the true values, leading to a systematic bias in the ensemble inversion and, consequently, elevated uncertainty levels. Another reason might be the inherent characteristics of sparse observations relative to the high-dimensional parameter space, which amplifies the impact of these surrogate errors on uncertainty propagation.
Executing physics-based ILUES-LM requires 36,300 forward model runs (3300 for initial ensemble plus 33,000 across 10 iterations), imposing a significant computational burden and reducing the inversion efficiency. In contrast, surrogate-coupled ILUES-LM yields comparable results for high-dimensional parameter estimation to physics-based ILUES-LM. Notably, it requires no additional forward runs during inversion, with only 2000 simulations (1500 training + 500 testing) used for surrogate construction. The dramatic reduction underscores the efficiency gains enabled by the AR-DWCNN surrogate while maintaining an inversion accuracy nearly identical to the physics-based reference.
To quantify the efficiency gains, the total computational costs are compared. For physics-based ILUES-LM, the cost is = Tf × Nf, where Tf is the time per forward model run, and Nf = 36,300. For surrogate-based ILUES-LM, the cost includes forward runs for surrogate samples (Tf × N0, with N0 = 2000) plus surrogate training time Tt = 1590 s, giving Ts = Tf × N0 + Tt. Assuming Tf ≈ 10 s (a conservative estimate of the physics-based simulator for the numerical case in this study), TF ≈ 363,000 s, while Ts ≈ 21,590 s, representing an acceleration of approximately 17 times. Even larger speedups are expected for higher-dimensional problems or more expensive forward models, as the fixed surrogate construction cost is amortized over the inversion process.
These results align with the trends in recent surrogate-coupled ensemble frameworks, where lightweight deep learning approximations dramatically cut forward evaluations without compromising posterior fidelity [
28,
29,
30]. The AR-DWCNN surrogate thus enables the practical application of data assimilation to complex real-world contaminant source identification tasks that would otherwise be computationally prohibitive.
4.3. Limitations and Potential Improvements
While the proposed AR-DWCNN surrogate model coupled with the ILUES-LM algorithm demonstrates promising performance in terms of predictive accuracy and computational efficiency for the joint inversion of hydraulic conductivity fields and time-varying contaminant source strengths, several limitations inherent to the current study should be acknowledged.
First, the numerical experiments are conducted in a synthetic two-dimensional domain under steady-state saturated flow conditions. This setup serves primarily as a proof-of-concept, allowing focused evaluation of the surrogate’s ability to capture spatiotemporal transport dynamics under high-dimensional parameter uncertainty and sparse observations, while keeping the computational demands manageable for extensive ensemble-based inversion. However, real-world aquifers are typically three-dimensional, exhibit transient flow regimes, and are influenced by complex recharge–discharge processes, vertical flow components, and multi-scale geological structures. These simplifications limit the direct transferability of the results to field-scale applications and may underestimate the challenges associated with increased dimensionality, data scarcity, and non-stationarity in practical scenarios.
Second, the solute transport model considers only advection and mechanical dispersion (Equations (4) and (5)), neglecting molecular diffusion, adsorption, decay, and biogeochemical reactions. This assumption is reasonable for conservative contaminants in the synthetic cases examined and aligns with many previous inverse modeling studies focusing on source identification and heterogeneity. However, in real-world groundwater systems, if the reactive parameters of the contaminant are known a priori from site characterization or laboratory data, the proposed surrogate modeling and inversion framework holds strong application potential: by simply incorporating the relevant reaction terms into the high-fidelity forward simulations used to generate the training dataset for the AR-DWCNN, the surrogate can learn to approximate the full reactive transport dynamics without requiring fundamental changes to the network architecture or inversion algorithm. This extension represents a direction for future research to enhance the method’s applicability to more complex reactive contaminant problems.
Third, the AR-DWCNN is designed as a data-driven surrogate without explicit incorporation of governing physical equations as hard constraints, in contrast to physics-informed neural networks (PINNs). While this choice prioritizes computational efficiency and scalability for high-dimensional problems, it may lead to minor deviations from fundamental physical principles in regions that are sparsely represented in the training data or characterized by sharp spatial gradients. Future research will therefore focus on incorporating physical mechanisms into the learning framework, for instance by introducing physics-based regularization terms into the loss function or by developing hybrid data–physics modeling strategies. Such extensions are expected to further improve the physical consistency, interpretability, and robustness of the surrogate model, particularly under more challenging extrapolation and generalization scenarios.
Finally, the current case does not account for pumping or injection stresses, which are common in managed aquifers and can significantly alter flow paths and contaminant migration patterns. This omission was intentional to isolate the effects of natural-gradient heterogeneity and time-varying source release, but it reduces the framework’s immediate relevance to actively managed groundwater systems. However, pumping rates can be incorporated as additional time-varying input channels in the training data and inversion parameters without changing the network architecture or algorithm. Extending the framework to actively managed sites constitutes another direction for future investigation.
Despite these limitations, the developed lightweight surrogate-assisted inversion framework provides a robust and efficient foundation for addressing high-dimensional groundwater inverse problems. Future research will focus on extending the methodology to three-dimensional transient systems, incorporating reactive transport processes, embedding physical constraints into the surrogate model, and including anthropogenic stresses such as pumping to enhance its applicability to real-world contaminated sites.
5. Conclusions
This study introduces a computationally efficient and robust framework for the joint inversion of heterogeneous hydraulic conductivity and time-varying contaminant source strengths under sparse and noisy observational data. The key innovation is a lightweight autoregressive depthwise convolutional neural network (AR-DWCNN) that serves as a high-fidelity surrogate for the coupled groundwater flow and solute transport processes. By replacing standard convolutions with depthwise separable convolutions and incorporating dense connectivity within an encoder–decoder architecture, the AR-DWCNN achieves predictive accuracy comparable to the established AR-Net while using only 1,813,730 trainable parameters with a 48% reduction and requiring 37% less training time. Under identical training budgets of 200 epochs, the proposed model exhibits faster convergence, with lower validation loss 0.012 compared to the AR-Net, demonstrating that depthwise separable convolutions effectively reduce computational cost without sacrificing the ability to capture complex spatiotemporal dependencies.
The surrogate model is integrated with an enhanced Iterative Local Updating Ensemble Smoother incorporating adaptive Levenberg–Marquardt regularization (ILUES-LM). This coupling achieves convergence stability in nonlinear and high-dimensional inverse problems. In a synthetic two-dimensional heterogeneous aquifer case, the surrogate-assisted ILUES-LM yields posterior distributions of hydraulic conductivity and source parameters that closely match those obtained using the numerical forward model, while reducing the overall computational cost by more than one order of magnitude, approaching 17 times acceleration. The framework successfully reproduces plume evolution, identifies time-varying source strengths, and provides uncertainty-aware inverse estimates, despite the challenges posed by high-dimensional parameters and limited observational data.
The combination of AR-DWCNN and ILUES-LM offers an effective balance between predictive accuracy, computational efficiency, and uncertainty quantification. The results, while obtained in a synthetic setting, validate the main methodological contributions: the lightweight surrogate design, the autoregressive formulation for transient dynamics, and the regularized local ensemble updating strategy for robust inversion in nonlinear and high-dimensional problems. The proposed framework shows the potential to extend to more realistic and challenging hydrogeological scenarios. Future research will focus on extending to three-dimensional transient systems, incorporating reactive transport processes, embedding physical constraints (e.g., through regularization or hybrid approaches), and integrating pumping stresses as additional input channels or parameters, thereby enhancing the scalability, robustness, and real-world applicability.