Adaptive Surrogate Estimation with Spatial Features Using a Deep Convolutional Autoencoder for CO 2 Geological Sequestration

: This paper develops a reliable deep-learning framework to extract latent features from spatial properties and investigates adaptive surrogate estimation to sequester CO 2 into heterogeneous deep saline aquifers. Our deep-learning architecture includes a deep convolutional autoencoder (DCAE) and a fully-convolutional network to not only reduce computational costs but also to extract dimensionality-reduced features to conserve spatial characteristics. The workﬂow integrates two different spatial properties within a single convolutional system, and it also achieves accurate reconstruction performance. This approach signiﬁcantly reduces the number of parameters to 4.3% of the original number required, e.g., the number of three-dimensional spatial properties needed decreases from 44,460 to 1920. The successful dimensionality reduction is accomplished by the DCAE system regarding all inputs as image channels from the initial stage of learning using the fully-convolutional network instead of fully-connected layers. The DCAE reconstructs spatial parameters such as permeability and porosity while conserving their statistical values, i.e., their mean and standard deviation, achieving R-squared values of over 0.972 with a mean absolute percentage error of their mean values of less than 1.79%. The adaptive surrogate model using the latent features extracted by DCAE, well operations, and modeling parameters is able to accurately estimate CO 2 sequestration performances. The model shows R-squared values of over 0.892 for testing data not used in training and validation. The DCAE-based surrogate estimation exploits the reliable integration of various spatial data within the fully-convolutional network and allows us to evaluate ﬂow behavior occurring in a subsurface domain.


Introduction
Data science has revolutionized engineering analytics in the oil and gas industry. Datadriven analyses are now assisting the decision-making process making it more reliable as well as more efficient [1][2][3][4][5]. Despite these high-end computer-assisted methods delivering efficient solutions, establishing reliable standard forms has been challenging. Uncertainty quantification requires reliable integration of all available scale-dominant spatiotemporal properties. Rock properties are heterogeneous spatially, which makes fluid-flow uncertain in subsurface domains. A grid in a heterogeneous geo-model contains a lot of scale-variant, non-linearly-connected, and spatiotemporal information that influences flow behavior, e.g., porosity, permeability, capillary pressure, and fluid saturations. These properties are assigned to an individual grid but are non-linearly linked so that the governing equations integrate them closely. Quantifying the non-linearity of grid properties is essential to simulate flow behavior in a realistic manner. Integrating all available scale-dependent, uncertain, and spatiotemporal properties requires large amounts of computing resources; thereby, the technical demand for adaptive surrogate models is increasing. Surrogate models, i.e., proxy models, replace these time-consuming forward-simulations to estimate the final results or to provide answers to the user's interests such as for production performance observed at the surface [6][7][8][9].
Deep learning, that is neural-network machine learning, has been applied with increasing regularity for reservoir simulation [9][10][11]. A typical neural-network consists of a multilayer perceptron; the perceptron's size can be increased in order to obtain more accurate non-linear relationships between input and output data. There are other factors besides simply increasing the units of the hidden layers and their depth that make neural networks complex, e.g., when increasing the units and the depth of multi-layered perceptron as part of a fully-connected-layer system, the parameter number increases dramatically and thus reduces the computational efficiency [12][13][14]. Data flattening to use fully-connected layers ignores the spatial distributions of the input rock properties [15,16].
Modern deep-learning tries to develop more efficient and simpler architectures using the more relevant factors that maintain the representative characteristics of the original data; convolutional neural networks (CNNs), autoencoders, and hybrid architectures including both. CNN was introduced to restore spatial features during image processing [17]. It typically consists of one or more of pooling, convolutional, and fully-connected layers. The neurons in the convolutional layer are connected to a small region of the previous layer and share their weights. This technique allows the system to adaptively learn spatial hierarchies or patterns from input images and decrease the number of parameters required by the network, which, in turn, curtails the possibility of overfitting problem [18,19]. The pooling operation, e.g., max pooling or average pooling, has the purpose of reducing the dimensionality of the previous layer, while the fully-connected layers are composed in the last layer as a high-level feature learner or classifier [16,18,19]. Chu et al. [11] implemented several CNNs to integrate simulation parameters for a surrogate model determining optimal well placements. Razak and Jafarpour [20] classified appropriate geological scenarios corresponding to observed flow responses and generated calibrated models using a CNN. Tang et al. [9] suggested a deep-learning-based surrogate model using a CNN and principal component analysis to parameterize 2D facies models. CNNs are generally known to exhibit better recognition as they become deeper with a larger number of kernels; however, the computational burden increases exponentially as the kernels increase [17][18][19][20].
Some researchers that have implemented autoencoders for modeling fluid flow have shown that dimensionality reduction (feature extraction) can improve computational efficiency, and, thereby, estimate oil-gas-water production behavior more accurately [18,[21][22][23][24][25][26]. The autoencoder consists of a recognition network (encoder) and a generative network (decoder): The encoder uses the dimensionality reduction to compress the original inputs into a smaller set of parameters (the extracted features), while the decoder is used to reconstruct plausible outputs that correspond suitably with the inputs. These systems are symmetric and typically have several hidden layers. Reducing the number of parameters increases computational efficiency, while the reconstruction helps validate the results from deep neural networks. Ahn et al. [21] introduced a stacked autoencoder in an inverse method and showed that the stacked autoencoder improved computational efficiency and geo-model retrieval. Kim et al. [22] updated the training dataset of deep neural networks using a distance-based clustering scheme and improved the predictability of history matching. A convolutional autoencoder (CAE) considered state-of-the-art technology for extracting spatial features from geological models [23][24][25][26][27][28]. Masci et al. [23] proposed a CAE in the field of computer vision that can efficiently reduce the dimensionality of data while preserving the features of spatial images. Liu and Grana [28] used a deep CAE (DCAE), i.e., a further deepened CAE, to update porosity and permeability values to avoid ensemble collapse.
One of challenging works that hinders deep-learning-based surrogate estimation from being applied more regularly in the field of real problems is a way to reduce high- dimensional data and the associated computational burden of 3D geo-models. This paper develops an adaptive surrogate model based on a DCAE for CO 2 sequestration into deep saline aquifers that conserves the spatial distribution of rock properties such as permeability and porosity. A fully-convolutional network is newly introduced not only to mitigate the computational inefficiency of fully-connected layers but also to effectively conserve spatial features. Observation of the statistical distributions of rock properties, reconstructed by the decoding process, examines the reliability of conserving spatiotemporal values.

3D Heterogeneous Aquifer Models and CO 2 Sequestration
3D heterogeneous geo-models are generated geostatistically using Petrel property modeling (Schlumberger, Houston, TX, USA). A slightly-confined aquifer is located between 840 m and 1090 m in true vertical depth, i.e., the aquifer thickness is 250 m, the pressure of which is assumed to be 9000 kPa near 840 m. The aquifer size is 6310 m × 7076 m × 250 m in the respective (x, y, z) directions, and the grid numbers are (N x , N y , N z ) = (38,45,13), i.e., the number of grid cells is 22,230. A total of 600 saline-aquifer models that are heterogeneous in terms of porosity and permeability are constructed; the 600 geo-models are divided into three categories: 500 for training data, 50 for the validation data, and 50 for test data. The uncertain properties are assumed to be porosity and permeability; thereby, we have 44,460 (=38 × 45 × 13 × 2 = 22,230 × 2) parameters that are spatially distributed in each geo-model. Table 1 summarizes the key parameters to generate the aquifer models. The numerical simulations for CO 2 sequestration are carried out using a compositional simulator, generalized equation-of-state model (GEM; Computer modelling group, Calgary, AB, Canada). Although determining the optimal bottom hole pressure (CO 2 injection pressure) is challenging, this work assumes, as a guideline, a minimum miscibility pressure (P MMP ; lb/in 2 ) used in typical CO 2 -enhanced-oil-recovery in mature oil fields. The minimum miscibility pressure has been estimated as 10,512 kPa (=10.5 MPa) according to an empirical equation assuming the temperature (T; • F) given in this work (Equation (1)) [29].
One well to inject CO 2 is located at the center of the aquifer and is perforated below 975 m (the lower part of aquifer) in the z direction. The operating conditions are the minimum bottom hole pressure (MPa) and the maximum injection rate (million cubic meters per day; ×10 6 m 3 /day). They are randomly selected to be in the range from 13 to 17 MPa for the minimum bottom hole pressure, and from 0.5 (×10 6 m 3 /day; MMm 3 /day) to 0.7 (×10 6 m 3 /day) for the maximum injection rate. CO 2 injection has taken place over 15 years, and the CO 2 trapping behavior has been observed for 50 years since the first CO 2 injection. Figure 1 depicts examples of heterogeneous distribution of permeability ( Figure 1a) and porosity ( Figure 1b) and shows the CO 2 saturation observed at the end of simulation (50 years after the first CO 2 injection) ( Figure 1c). Figure 1a,b show the heterogeneous distributions for permeability and porosity; these two distinct distributions are generated according to the facies models (sandstone and shale). Figure 1c confirms the and shows the CO2 saturation observed at the end of simulation (50 years after the first CO2 injection) (Figure 1c). Figure 1a,b show the heterogeneous distributions for permeability and porosity; these two distinct distributions are generated according to the facies models (sandstone and shale). Figure 1c confirms the reliability of CO2 transport behavior showing that CO2, with lower density than that of brine, moves upward and is distributed laterally.

Design of the Deep Convolutional Autoencoder
In this section, we develop a deep-learning architecture with a deep convolutional autoencoder/decoder that conserves the spatial features of permeability and porosity. It

Design of the Deep Convolutional Autoencoder
In this section, we develop a deep-learning architecture with a deep convolutional autoencoder/decoder that conserves the spatial features of permeability and porosity. It integrates some sub-techniques into the fully-convolutional network: batch normalization, network in network (NIN), strided convolution, dilated convolution, and transposed convolution. This study connects batch normalization layers after the convolutional operation. Batch normalization is used to mitigate the internal covariance shift problem; the update of network parameters changes the statistical distribution of network activation. This leads to an unstable training process. Batch normalization attempts to keep the distributions of the output layer unchanged to prevent gradient vanishing or explosion [30]. This technique normalizes the layer input using statistical values, i.e., the mean and the variance, of mini-batches. A small constant, , is added to the mini-batch variance (σ 2 n ) for numerical stability as shown in Equation (2) while two variables (γ and β) are introduced in the backpropagation process (see Equation (3)).
In Equations (2) and (3),x i represents the ith normalized layer input; x i is the ith input parameter; µ n is the mini-batch mean; σ 2 n is the mini-batch variance; y i is the output value by batch normalization; γ is the scale factor; β is the shift factor. Batch normalization makes the training faster by allowing us to set a large learning-rate. It also helps to suppress overfitting through the regularization effect.
NIN enables us to reduce the number of filters used during feature extraction. Its strength is in its effectiveness to extract better features by adding a non-linear activation function and thereby reducing the size of the feature map and the number of parameters [31]. The NIN process is equivalent to a convolution operation with unit size kernel (1 × 1), i.e., cascaded cross-channel pooling. The NIN reduces calculation costs and the number of parameters while maintaining high accuracy by reducing the channel dimensions [31,32]. Our work places a layer consisting of unit-sized kernels to prevent any excessive computation as the convolutional layers deepen.
The pooling operation can reduce the size of the feature map by representing a specific area as a single value to identify important features in large-scale data [33]. However, a lot of information loss occurs because this down-sampling method comes from some predetermined interpolations. That is, this approach makes it difficult to restore the original grid properties, and causes the reconstructed data to become blurred. This study adjusts the size of the feature maps using strided convolution, which is a convolutional operation where the step size of the kernel is greater than 1, which is in contrast to the typical pooling operations. Its objective is to incorporate the resizing process into the network training operation. Zero-padding maintains the size of the feature map by excluding the layers in the decoder for restoring the odd dimension of the original data. Figure 2 depicts the DCAE architecture of both the encoder ( Figure 2a) and the decoder ( Figure 2b). The dilated kernel traverses the input data, which delivers a wider view with the same computational cost by defining a space between the kernel weights. The convolutional layer with the unit-sized kernel is placed next to the layer that has more than 64 kernels. In this way, the grid properties in the three-dimensional aquifer (with 44,460 pieces of data) can be compressed to the latent features and then these extracted features are used to reconstruct the grid properties with the same dimension of original data through transposed convolutional layers (see Figure 2b). The transposed convolution can be incorporated into the up-sampling process for the latent features during the training workflow, which stretches the layer input by inserting blank (dummy) rows and columns and then performs the normal convolutional operation.
Three kinds of indicators are applied to evaluate deep-learning performances: the coefficient of determination (R 2 value; R-squared; Equation (4) [34]), the root mean squared error (RMSE; ε RMSE in Equation (5) [35]), and the mean absolute percentage error (MAPE; ε MAPE in Equation (6) [36]). In Equations (4)-(6), n is the number of predicted values, y i is the ith true value (or observed value), andŷ i is the ith prediction value.   (6) [36]). In Equations (4)- (6), n is the number of predicted values, is the ith true value (or observed value), and ̂ is the ith prediction value.
The DCAE architecture proposed in this study: (a) the encoder extracts latent features from spatial properties (permeability and porosity); (b) the decoder conducts the inverse mapping of the latent features to the original dimensions and evaluates the reconstruction performance. Input data are also traversed by dilated kernels at the first convolution stage (gray); The feature map size is adjusted by strided convolution (red), transposed convolution (purple), and zeropadding (green); The cross-channel pooling with unit-sized kernel is introduced to reduce the parameter number.

Adaptive Surrogate Estimation with Spatial Feature and Data Integration
This section presents a data-driven surrogate estimation for CO2 sequestration performances proposed in this work. Figure 3 shows a conceptual diagram of adaptive surrogate estimation. The estimation values are the structural trap volume, the dissolved trap volume, and the total CO2 injection volume. The input data consist of the well operating conditions, the modeling parameters, and the latent features extracted by the DCAE. Two different operating conditions such as the maximum bottom hole pressure and the maximum injection rate are assumed; however, controlling these two criteria does not lead to Figure 2. The DCAE architecture proposed in this study: (a) the encoder extracts latent features from spatial properties (permeability and porosity); (b) the decoder conducts the inverse mapping of the latent features to the original dimensions and evaluates the reconstruction performance. Input data are also traversed by dilated kernels at the first convolution stage (gray); The feature map size is adjusted by strided convolution (red), transposed convolution (purple), and zero-padding (green); The cross-channel pooling with unit-sized kernel is introduced to reduce the parameter number.

Adaptive Surrogate Estimation with Spatial Feature and Data Integration
This section presents a data-driven surrogate estimation for CO 2 sequestration performances proposed in this work. Figure 3 shows a conceptual diagram of adaptive surrogate estimation. The estimation values are the structural trap volume, the dissolved trap volume, and the total CO 2 injection volume. The input data consist of the well operating conditions, the modeling parameters, and the latent features extracted by the DCAE. Two different operating conditions such as the maximum bottom hole pressure and the maximum injection rate are assumed; however, controlling these two criteria does not lead to a constant rate of CO 2 injection into the aquifer over time. Thus, the cumulative injection amount varies quite unpredictably. The modeling parameters, i.e., the temperature and the salinity, represent the general aquifer conditions related to CO 2 dissolution.
The detailed workflow of our feature-based adaptive surrogate estimation can be summarized as follows: 1.
Generate 3D geo-models stochastically and split them into the training, validation, and test datasets 2.
Carry out flow simulations to obtain the dynamic responses from CO 2 sequestration such as the trapped volume and CO 2 amount injected 3.
Normalize the inputs and the outputs 4.
Build and train the DCAE for encoding/decoding the spatial data (permeability and porosity) 5.
Develop the adaptive surrogate model to give estimates for the trapped CO 2 volume and the injection amount without the need for time-consuming numerical simulations In the final stage, global average pooling (a structural regularizer) is implemented in the convolutional layers to replace the fully-connected layers. It generates three output neurons according to the feature map allocated, i.e., structural trap volume, dissolved trap volume, and total CO 2 injection volume. It is not necessary to optimize this model for interpreting the feature maps; as such, the overfitting problem becomes more avoidable [31,32]. a constant rate of CO2 injection into the aquifer over time. Thus, the cumulative injection amount varies quite unpredictably. The modeling parameters, i.e., the temperature and the salinity, represent the general aquifer conditions related to CO2 dissolution. The detailed workflow of our feature-based adaptive surrogate estimation can be summarized as follows: 1. Generate 3D geo-models stochastically and split them into the training, validation, and test datasets 2. Carry out flow simulations to obtain the dynamic responses from CO2 sequestration such as the trapped volume and CO2 amount injected 3. Normalize the inputs and the outputs 4. Build and train the DCAE for encoding/decoding the spatial data (permeability and porosity) 5. Develop the adaptive surrogate model to give estimates for the trapped CO2 volume and the injection amount without the need for time-consuming numerical simulations In the final stage, global average pooling (a structural regularizer) is implemented in the convolutional layers to replace the fully-connected layers. It generates three output neurons according to the feature map allocated, i.e., structural trap volume, dissolved trap volume, and total CO2 injection volume. It is not necessary to optimize this model for interpreting the feature maps; as such, the overfitting problem becomes more avoidable [31,32].

Reconstruction Performances of Rock Properties
The parameter size is examined with the comparison of typical fully-connected-layertype (multi-layered-perceptron-type; MLP-type) autoencoder [37,38]. Table 2 shows the number of parameters used by a typical MLP-type autoencoder if we set the latent features used to be the same as those of our DCAE, i.e., the 1920 parameters that represent 4.3% of the input data (the original 44,460 three-dimensional spatial properties). Notwithstanding the fact that the details of designing the MLP-type autoencoder may differ according to the user's purpose, the general architecture is configured; a symmetrical structure uses the node number set to 1/3 or more in the previous layer to prevent any excess information loss. The total number of parameters would be about 1.5 billion for the encoding and decoding processes and thereby it is not practical to train this kind of network from the viewpoint of computational efficiency. Figure 4 explains the DCAE architecture in detail; Figure 4a demonstrates the encoding process that reduces (38 × 45 × 13 × 2) dimensions to a (5 × 6 × 2 × 32) dimensional latent code, and Figure 4b shows the decoding workflow to reconstruct spatial properties (permeability and porosity) that have the same dimensions as the original data (38 × 45 × 13 × 2). The latent features are significantly reduced in the input data, decreasing from 44,460

Reconstruction Performances of Rock Properties
The parameter size is examined with the comparison of typical fully-connected-layertype (multi-layered-perceptron-type; MLP-type) autoencoder [37,38]. Table 2 shows the number of parameters used by a typical MLP-type autoencoder if we set the latent features used to be the same as those of our DCAE, i.e., the 1920 parameters that represent 4.3% of the input data (the original 44,460 three-dimensional spatial properties). Notwithstanding the fact that the details of designing the MLP-type autoencoder may differ according to the user's purpose, the general architecture is configured; a symmetrical structure uses the node number set to 1/3 or more in the previous layer to prevent any excess information loss. The total number of parameters would be about 1.5 billion for the encoding and decoding processes and thereby it is not practical to train this kind of network from the viewpoint of computational efficiency. Figure 4 explains the DCAE architecture in detail; Figure 4a demonstrates the encoding process that reduces (38 × 45 × 13 × 2) dimensions to a (5 × 6 × 2 × 32) dimensional latent code, and Figure 4b shows the decoding workflow to reconstruct spatial properties (permeability and porosity) that have the same dimensions as the original data (38 × 45 × 13 × 2). The latent features are significantly reduced in the input data, decreasing from 44,460 to  Table 2). Our fully-convolutional network consists of 27 convolutional layers and 18 batch normalizations. The 'Adam' optimizer is chosen after testing Adam, RMSprop, Adamax, and Adagrad optimizers. The activation function is 'ReLU (rectified linear unit)' and the He normal initializer is used [39,40]. The loss function is based on the mean squared error. The normalization of the input image is carried out based on the maximum pixel value and thus all inputs range between 0 and 1.  [39,40]. The loss function is based on the mean squared error. The normalization of the input image is carried out based on the maximum pixel value and thus all inputs range between 0 and 1.           The DCAE reconstructs spatial properties reliably. Table 5 summarizes the reconstruction performance in terms of R-squared, RMSE, and MAPE. Figure 6 compares the mean and the standard deviation of the permeabilities from the original input and the reconstructed geo-models. For example, 22,230 (=38 × 45 × 13) permeability values are input; the DCAE reconstructed these for each geo-model, and thereby one mean and one standard deviation are evaluated for each geo-model. The DCAE training operation (for 500 geo-models) matches the mean permeability up to 0.999 for the R-squared value, 0.872 millidarcy for RMSE, and 0.18% for MAPE. The results for the mean permeability from the validation and test sets have slightly lower R-squared values than the training set, i.e., 0.997 for the validation set (50 geo-models) and 0.996 for the test set (50 geo-models), but these differences are small enough to confirm the reliability of the DCAE. The standard deviations of reconstructed permeabilities have slightly lower values than the original inputs, of which the reasons would be from the inevitable information loss during dimension compression. All performance metrics related to the standard deviations of the training, validation, and test sets show similar trends. The R-squared values of the standard deviations for permeabilities are 0.992 for the training set, 0.983 for validation, and 0.980 for the test set. Figure 7 depicts a comparison of the porosities between the original input and the reconstructed data. In a similar way to the results already shown in Figure 6, the porosity prediction is accurate and efficient. The R-squared values for mean porosity after reconstruction are 0.999 for the training set, 0.979 for the validation set, and 0.989 for the test set. The lowest R-squared value is 0.972 for the standard deviation of the porosities with the test set. Analyzing Figures 6 and 7, as well as Table 5, it is clear that the DCAE is able to extract latent features and then reconstruct the spatial properties accurately.
In short, after evaluating the performance of the DCAE-based feature extraction, it can be concluded that the developed architecture can reduce the number of parameters required for reconstruction to just 2,303,466 for both encoding and decoding operations, which is only 0.155% of what a typical symmetric-autoencoder would require. The extracted features are reliable, which is shown by the small errors and the high R-squared values after reconstructing the original spatial values. Another notable conclusion is that the DCAE can integrate different spatial properties within a single architecture and also regenerate both spatial values accurately from a statistical viewpoint. The results confirm that the developed DCAE-based feature extraction can be applied efficiently for upscaling and downscaling spatial properties.

Surrogate Estimation for CO 2 Geological Sequestration
An adaptive surrogate model is constructed to estimate CO 2 sequestration as an alternative to time-consuming simulations. Figure 8 and Table 6 explain the detailed architecture of adaptive surrogate estimation using the features extracted from spatial data on permeability and porosity, the modeling parameters such as salinity and temperature, as well as the operating conditions such as the minimum well bottom hole pressure and the maximum CO 2 injection rates. The input dimensions are (5 × 6 × 2 × 36) while there are three outputs after global average pooling. The model consists of a fully-convolutional neural network but while we do not implement the fully-connected layers, seven convolutional layers and seven batch normalizations are included. The number of parameters used for the adaptive surrogate estimation is 348,239.

Surrogate Estimation for CO2 Geological Sequestration
An adaptive surrogate model is constructed to estimate CO2 sequestration as an alternative to time-consuming simulations. Figure 8 and Table 6 explain the detailed architecture of adaptive surrogate estimation using the features extracted from spatial data on permeability and porosity, the modeling parameters such as salinity and temperature, as well as the operating conditions such as the minimum well bottom hole pressure and the maximum CO2 injection rates. The input dimensions are (5 × 6 × 2 × 36) while there are three outputs after global average pooling. The model consists of a fully-convolutional neural network but while we do not implement the fully-connected layers, seven convolutional layers and seven batch normalizations are included. The number of parameters used for the adaptive surrogate estimation is 348,239.      Figure 9 shows the training and validation losses with epochs. The y-axis values are expressed as RMSE with units of million cubic meters to improve the understanding of training performances. The graph shows stable convergence and confirms that the model is efficient enough to converge to the final solutions without any remarkable fluctuations after 100 epochs. The computing time required is 0.09 s/epoch and no issues related to overfitting matters are experienced, and thus we can see that the surrogate model with the fully-convolutional network developed in this paper converges stably to the final optimality.
Energies 2021, 14, x FOR PEER REVIEW 15 of 19 matters are experienced, and thus we can see that the surrogate model with the fully-convolutional network developed in this paper converges stably to the final optimality.   (Figure 10a), the dissolved trap amount (Figure 10b), and the total CO2 volume injected into the aquifer (Figure 10c). Table  7 summarizes the results of the indicators used to evaluate the prediction performance. The highest R-squared value, 0.989, is observed for predicting the dissolved trap amount; a possible reason for this would be assigning salinity and temperature as the input units that influence CO2 dissolution. The heterogeneity of aquifer properties more strongly affects the structural trap amount and the total CO2 injected volume. However, the MAPEs are lower than that of the dissolved trap amount, which is 3.02% as shown in Table 7. Thus, we can conclude that both the spatial heterogeneity and operating conditions influence the predictability. A notable result is that the errors are less than 3.02% of the MAPE so that the adaptive surrogate estimation is accurate without the need for time-consuming physically-based simulations.
The large number of parameters involved in the deep-learning workflow cannot be free from 'the curse of dimensionality'. The dimensionality reduction is essential to solve the following problems: convergence and computation efficiency. The high dimensions have sparse characteristics, the distance between training samples would be far, the extrapolation to forecast unknown values is unstable, and thus the overfitting and the poor convergence matters become more likely. In addition, the large number of operations, e.g., dot products between the kernel and the feature map, and the training parameters decrease the computational efficiency; the computing cost of adaptive surrogate estimation is 0.09 s/epoch. If analyzing the approximate computation time, the adaptive surrogate estimation requires 1005 seconds including the time for training over 500 epochs and estimating 50 test geo-models (=500 × (1.92 for DCAE + 0.09 for surrogate estimation); the prediction time is negligible. In contrast, the conventional flow simulations need 13,940 seconds to complete the 50 test geo-models(=278.8 × 50). Thus, the adaptive surrogate estimation can dramatically reduce the computational time required to approximately 7% of the time for the conventional flow simulation.   (Figure 10a), the dissolved trap amount (Figure 10b), and the total CO 2 volume injected into the aquifer (Figure 10c). Table 7 summarizes the results of the indicators used to evaluate the prediction performance. The highest R-squared value, 0.989, is observed for predicting the dissolved trap amount; a possible reason for this would be assigning salinity and temperature as the input units that influence CO 2 dissolution. The heterogeneity of aquifer properties more strongly affects the structural trap amount and the total CO 2 injected volume. However, the MAPEs are lower than that of the dissolved trap amount, which is 3.02% as shown in Table 7. Thus, we can conclude that both the spatial heterogeneity and operating conditions influence the predictability. A notable result is that the errors are less than 3.02% of the MAPE so that the adaptive surrogate estimation is accurate without the need for time-consuming physically-based simulations.
The large number of parameters involved in the deep-learning workflow cannot be free from 'the curse of dimensionality'. The dimensionality reduction is essential to solve the following problems: convergence and computation efficiency. The high dimensions have sparse characteristics, the distance between training samples would be far, the extrapolation to forecast unknown values is unstable, and thus the overfitting and the poor convergence matters become more likely. In addition, the large number of operations, e.g., dot products between the kernel and the feature map, and the training parameters decrease the computational efficiency; the computing cost of adaptive surrogate estimation is 0.09 s/epoch. If analyzing the approximate computation time, the adaptive surrogate estimation requires 1005 s including the time for training over 500 epochs and estimating 50 test geo-models (=500 × (1.92 for DCAE + 0.09 for surrogate estimation); the prediction time is negligible. In contrast, the conventional flow simulations need 13,940 s to complete the 50 test geo-models(=278.8 × 50). Thus, the adaptive surrogate estimation can dramatically reduce the computational time required to approximately 7% of the time for the conventional flow simulation.   This study reduces the size of the feature map and convolutional operations required. In this way, an improvement of the computational efficiency is obtained. The methodology exploits the data integration within a single architecture unlike typical multimodal deeplearning approaches, and thus our method is able to successfully execute the reliable reconstruction of spatial properties as well as efficient dimensionality reduction.
An original part of the developed adaptive model in terms of methodology is that all inputs are integrally regarded as image channels from the initial stage of learning while typical multimodal models integrate the dataset within the flattened layers. Another key point is to use the fully-convolutional network and global average pooling instead of fully-connected layers, which is done to minimize the data loss without flattening the data to a one-dimensional array, and also to mitigate overfitting. Designing the proper hyper-parameters (such as the size of the feature map, the number of convolutional layers, kernel size, and the number of kernels, etc.) dominates the computational efforts, and thus the optimal designs that combine sub-sections of deep-learning will be challenging to reach the optimality of adaptive surrogate estimations. Notwithstanding that this paper shows successful estimation of three key results related to CO 2 sequestration performance without using physically-based simulations, the spatiotemporal responses, e.g., saturation distribution and sequestration amount with time, need to be evaluated accurately in order to completely replace time-consuming subsurface flow simulations.

Conclusions
This paper presented an adaptive surrogate estimation based on a DCAE with fullyconvolutional network that reduces the number of parameters needed without using typical fully-connected layers and flattening the spatial data. The DCAE-based feature extraction integrated different spatial data within one fully-convolution network and reliably reconstructed the original data. The fully-convolutional network significantly reduced the number of parameters to 0.155% of what a typical autoencoder would require. The reconstruction performance of spatial properties, i.e., permeability and porosity, proved to be reliable in achieving R-squared values higher than 0.972 for the mean and standard deviation of the test dataset.
Adaptive surrogate estimation was carried out using the spatial features extracted by our DCAE, the modeling parameters (salinity and temperature), and the operating conditions (well bottom hole pressure and CO 2 injection rate). The surrogate model accurately estimated CO 2 sequestration performances with R-squared values higher than 0.892 and mean absolute percentage error of less than 3.02% for the amounts of structural trap, dissolved trap, and total CO 2 injection. The proposed DCAE-based surrogate estimation can be utilized for regression problems as well as for integration of scale-dependent spatial datasets.