1. Introduction
Data assimilation optimally merges observational data with numerical model simulations to reduce system uncertainty, thereby enhancing the model’s predictive capabilities [
1]. Consequently, it is widely used in various fields such as meteorology, hydrology, and geology [
2,
3,
4]. In petroleum engineering, this concept is applied through automatic history matching [
5], which seeks to infer geological parameters by matching simulation results to historical production data. This process not only facilitates more reliable prediction of future reservoir dynamics but also enables computational evaluation of different development strategies, making it a key component of closed-loop reservoir management. Nevertheless, a comprehensive data assimilation workflow typically requires hundreds to thousands of numerical simulations in the forward process. For large-scale engineering problems, this computational burden is often prohibitively expensive, thereby limiting the broader application of such methods.
Surrogate models are an effective strategy for mitigating this computational burden. They establish fast input-output mappings that provide accurate approximations of expensive numerical simulations [
6,
7]. These techniques are primarily categorized into two groups: physics-driven and data-driven approaches. Physics-driven methods include projection-based reduced-order models [
8,
9] and multi-fidelity models [
10,
11]. Although these methods are grounded on strong simplifying assumptions, such assumptions may restrict their applicability. Conversely, data-driven approaches are designed to capture nonlinear input-output mappings by directly identifying patterns from the data [
12]. Traditional methodologies like Gaussian processes [
13,
14], radial basis functions [
15], and polynomial chaos expansions [
16,
17] are recognized for their computational efficiency. However, their utility is generally limited to low-dimensional and relatively simplistic reservoir models, as their performance often deteriorates when applied to high-dimensional data and strongly nonlinear challenges [
18].
The rapid development of deep learning and data-driven methods in other engineering fields also provides useful context for surrogate modeling. In structural and infrastructure engineering, such methods have been applied to nonlinear response prediction, such as temperature-induced bearing displacement prediction of long-span bridges using DCNN-LSTM models [
19]. They have also been used for response reconstruction, such as reconstructing structural acceleration responses under environmental temperature effects using CNN-BiGRU with a squeeze-and-excitation module [
20]. In addition, recent studies have reviewed data processing and behavior monitoring techniques for dam health monitoring [
21], and missing measurement data recovery methods have been investigated in structural health monitoring [
22]. These studies illustrate the broad use of deep learning and data-driven techniques for prediction, reconstruction, monitoring, and data recovery in complex engineering systems. This broader progress provides useful methodological motivation for reservoir surrogate modeling, where the objective is to construct an efficient mapping from high-dimensional geological parameter fields to dynamic production responses.
In recent years, there has been a rapid development of data-driven, deep learning-based surrogate models [
23,
24,
25,
26]. Tang et al. [
27] utilized a residual U-Net to establish a nonlinear mapping from geological parameters to saturation and pressure fields, thereby predicting reservoir dynamics. In a similar vein, Zhang et al. [
28] introduced a dual-channel surrogate based on a densely connected encoder-decoder network to predict saturation and pressure fields from both spatial and vector data. However, these models do not directly provide the production data necessary for history matching, necessitating an additional step involving the application of Peaceman’s equation, which requires storing a substantial volume of intermediate data.
To overcome this limitation, Ma et al. [
29] proposed an end-to-end framework. Their model initially extracts spatial features using two-dimensional densely connected convolutional layers and subsequently employs stacked recurrent neural networks (RNNs) to perform regression directly on the production time-series data, thus significantly simplifying the implementation process. Building upon this framework, Zhang et al. [
30] adapted the Vision Transformer (ViT) architecture [
31] by replacing the RNN with a multi-layer Transformer decoder [
32] for the temporal regression module. More recently, Zhang et al. [
33] designed a dual surrogate framework that explicitly models the surrogate predictive error and incorporates this metric into the prediction and optimization processes. Furthermore, Zhang et al. [
34] enhanced the surrogate accuracy by developing a fully Transformer-based encoder-decoder, replacing the convolutional and RNN modules with a unified architecture for processing both spatial and temporal features.
Although existing deep learning-based surrogate models have shown promising performance in automatic history matching, several challenges still remain. One major limitation is their reliance on large labeled training datasets, whose generation through numerical simulation is computationally expensive. In addition, many existing architectures first extract spatial features using two-dimensional convolutions or Transformer-based encoders and then pass them to a separate temporal regression module. Such a design usually requires an additional feature transformation step between spatial encoding and production sequence prediction through, for example, flattening, pooling, or feature replication across time steps. As a result, feature extraction and temporal regression are often implemented as two relatively separate stages, which may increase model complexity and training cost. By contrast, generating reservoir parameter realizations is relatively inexpensive. These realizations therefore provide an abundant source of unlabeled data that can potentially be exploited to improve data efficiency.
In this work, we develop a data-efficient surrogate modeling framework to address these challenges. First, we reformulate the two-dimensional grid input into a flattened representation and employ one-dimensional convolutions for feature extraction. This representation provides a direct interface between parameter encoding and downstream production sequence prediction, avoiding an additional spatial-to-temporal feature transformation stage. Unlike feature-replication-based image-to-sequence designs, the proposed encoder directly generates a sequence of learned feature representations from the flattened parameter field, so that the temporal regression module receives non-replicated feature inputs before sequence modeling. Second, to better exploit inexpensive unlabeled parameter realizations, we introduce a self-supervised pre-training stage based on an autoencoder and use the pre-trained encoder to initialize the surrogate model for supervised regression.
The proposed framework is evaluated on the large-scale three-dimensional Brugge benchmark and further incorporated into a surrogate-assisted automatic history matching workflow with adaptive differential evolution. Experimental results demonstrate that the proposed architecture, when combined with autoencoder-based pre-training, improves data efficiency and achieves competitive predictive performance, particularly when labeled simulation data are limited.
The contributions of this work can be summarized as follows:
A flattened input representation combined with one-dimensional convolutions is introduced to provide a more direct spatial-to-temporal feature interface for production prediction. Compared with feature-replication-based designs, this architecture reduces intermediate transformation operations and provides non-replicated, learned feature inputs to the temporal regression module.
The proposed surrogate model achieves competitive predictive accuracy while requiring shorter training time than the baseline architecture in the Brugge benchmark.
A pre-training strategy based on unlabeled parameter realizations is incorporated to improve the data efficiency of surrogate modeling under limited labeled simulation data.
The remainder of this paper is organized as follows.
Section 2 introduces the surrogate-based automatic history matching method.
Section 3 presents the proposed surrogate architecture and pre-training strategy.
Section 4 describes the experimental setup and reports the corresponding results.
Section 5 and
Section 6 provide the discussion and conclusions, respectively.
5. Discussion
In this study, we developed a data-efficient surrogate modeling framework with autoencoder-based pre-training for automatic history matching. The framework addresses the high cost of labeled simulation data by combining two components: a flattened one-dimensional representation for production forecasting and a self-supervised pre-training stage based on unlabeled parameter realizations. The former provides a direct route from parameter encoding to production-sequence prediction, while the latter improves the use of readily generated prior realizations before supervised regression.
It should be noted that the proposed architecture is not intended to claim universal superiority over CNN-RNN or Transformer-based surrogate models. Transformer-based models may have advantages in capturing complex temporal dependencies. The main focus of this study is instead the spatial-to-temporal feature interface before the temporal regression module. By generating a non-replicated feature sequence from the flattened reservoir parameter representation, the proposed model provides a lightweight alternative to feature-replication-based image-to-sequence surrogate designs. This architectural design is further combined with autoencoder-based pre-training to improve data efficiency when labeled simulation data are limited.
The experimental results indicate that these two components are beneficial in the Brugge benchmark setting. In particular, the proposed framework achieves competitive surrogate accuracy with fewer intermediate feature transformation operations, and its advantage becomes more evident when the number of labeled training samples is limited. The pre-training strategy provides a useful initialization for the downstream supervised regression task, especially when labeled simulation data are scarce. For history matching, the resulting surrogate-assisted workflow achieves results comparable to those of the baseline model trained with a larger dataset. These findings suggest that improving data efficiency can help reduce the number of expensive simulation samples required for surrogate construction. More broadly, the proposed framework may also be combined with other strategies for further improving surrogate-assisted history matching.
Nevertheless, the proposed method still has several limitations. First, the relationship between the size of the pre-training dataset and the final model performance is not strictly monotonic, which makes the choice of pre-training sample size less straightforward. A possible explanation is that the unlabeled pre-training samples are generated from the same PCA-based prior distribution. Although increasing the number of such samples enlarges the pre-training dataset, it may not proportionally increase the diversity of geological patterns represented in the data. When the pre-training dataset becomes excessively large, the encoder may become more specialized to the reconstruction objective of the autoencoder, rather than learning representations that are most transferable to the downstream production-regression task. This mismatch between the self-supervised reconstruction objective and the supervised production-prediction objective may lead to diminishing returns or slight performance degradation after fine-tuning. Therefore, the pre-training sample size should be regarded as a tunable factor rather than a parameter that always improves performance when increased. Further investigation of pre-training data diversity, reconstruction objectives, and their relationship with downstream regression performance will be considered in future work. Second, because the model is entirely data-driven, its performance remains dependent on the distribution of the available training data, which may affect its robustness in practical history matching applications. Third, although the flattened representation is effective for production forecasting in the Brugge benchmark, its applicability to tasks that require stronger preservation of local spatial structure, such as direct prediction of spatial pressure or saturation fields, still requires further investigation. Therefore, future work will further investigate the sensitivity of the proposed representation to geological structure continuity and explore hybrid encoders that combine the proposed one-dimensional feature interface with spatial-structure-preserving modules.