1. Introduction
The magnitude of CO
2 emissions generated via oil and gas operations is substantial, with total global energy-related CO
2 emissions reaching approximately 36.8 billion tonnes (Gt) in 2022, emphasizing the significant contribution of the oil and gas sector to worldwide greenhouse gas emissions [
1]. In recent years, there has been a significant increase in awareness and concern about the continuous buildup of greenhouse gases in our atmosphere. This buildup is primarily driven by human activities such as fossil fuel combustion, deforestation, and industrial processes that release greenhouse gases like carbon dioxide (CO
2), methane, and nitrous oxide [
2]. The rising concentration of these gases has contributed to global temperature increases, triggering significant climate changes that pose a serious threat to the planet [
3]. In 2022, these activities contributed roughly 15% of global energy-related emissions, amounting to an estimated 5.1 billion tonnes of greenhouse gases [
1]. This significant contribution to global emissions highlights the critical need for action within this sector. The adoption of the Paris Agreement in 2015 was a landmark moment in global climate efforts, establishing ambitious goals to limit global warming to well below 2 °C above pre-industrial levels, with an emphasis on striving to restrict the increase to 1.5 °C [
4]. As countries work toward meeting these climate goals, carbon dioxide (CO
2) storage has become a crucial tool in combating climate change. Carbon capture and storage (CCS) technology has the capability to eliminate 90–99% of CO
2 emissions from industrial sources, making it an essential solution for decarbonizing hard-to-abate sectors and supporting nations in fulfilling their climate commitments [
5]. The 2023 COP28 conference in Dubai reinforced the urgency of addressing climate change, highlighting the need to expedite the transition to clean energy and significantly increase climate financing efforts [
6]. The conference underscored the critical role of innovative technologies such as CCS in achieving the Paris Agreement goals, emphasizing that sustained government backing and international collaboration are essential for unlocking the full potential of CO
2 storage in the fight against global warming [
7].
Field tests have traditionally played a crucial role in evaluating the feasibility and performance of CO
2 storage in subsurface formations. These tests, often conducted at pilot or demonstration sites, provide valuable insights into reservoir behavior, injectivity, plume migration, and trapping mechanisms under real conditions [
8,
9]. Notable examples include the Sleipner and Snøhvit projects in Norway [
10,
11,
12], which have offered decades of observational data. While field tests offer high-fidelity results and ground-truth validation, they are typically time-consuming, expensive, and limited in spatial and temporal resolution. In contrast, reservoir simulations offer a more cost-effective and flexible approach to studying a wide range of reservoir conditions and scenarios.
Reservoir simulations help predict how CO
2 will spread, interact with rock and fluids, and become trapped within the reservoir. These models provide critical insights into the movement of CO
2 plumes over time, enabling engineers to assess storage capacity, evaluate seal integrity, and identify potential risks such as leakage or pressure buildup. By simulating complex geological conditions and fluid behaviors, reservoir simulations aid in optimizing injection strategies, ensuring efficient utilization of storage space, and enhancing the long-term safety and reliability of carbon storage projects. Additionally, they play a key role in validating monitoring plans and supporting regulatory compliance, contributing to the overall success of carbon capture and storage (CCS) initiatives. As noted by [
13], “Reservoir simulation is a direct numerical modeling method to model fluid flow in a reservoir or in a better description in the porous medium”. These simulations can represent various trapping mechanisms, such as structural, residual, solubility, and mineral trapping, each functioning over different time scales [
14].
In this study, we developed a base reservoir model to simulate CO
2 injection in a grid containing a centrally located injection well, flanked by two production wells positioned on either side. This well arrangement was chosen to examine the influence of nearby production wells on the reservoir’s pressure distribution, CO
2 saturation levels, injectivity, and trapping efficiency under varying geomechanical and geochemical conditions. This setup facilitates a thorough assessment of CO
2 storage behavior within a more realistic reservoir configuration, offering insights into the complex interactions affecting CO
2 containment and mobility in subsurface storage projects [
15]. The complexity of the base model was incrementally tuned with each new set of simulations, enabling the characterization of specific trapping mechanisms. We began by simulating hysteresis and residual gas trapping to understand their roles in CO
2 immobilization. Subsequently, we incorporated an infinite-acting aquifer into the model to assess its impact on pressure distribution, flow patterns, CO
2 plume behavior, saturation levels, and overall trapping efficiency in the reservoir. CMG Builder was used to build the two-dimensional grid, WinProp (Computer Modelling Group Ltd., Calgary, AB, Canada), which is CMG’s equation of state multiphase equilibrium property package was used for fluid component characterization.
In addition to field tests and reservoir simulations, proxy models have become valuable tools in reservoir engineering, especially for simplifying complex simulations. These models, often built using machine learning techniques, are designed to replicate the behavior of detailed physics-based simulators by learning the relationships between key input parameters and resulting outputs. In recent years, they have received growing attention for their ability to generate fast and reliable predictions while supporting tasks like sensitivity analysis and operational optimization. Important contributions to this field include smart proxy modeling frameworks developed by Mohaghegh [
16], data-driven modeling approaches based on numerical simulation by Alabboodi [
17], and the application of ensemble learning methods such as random forest and gradient boosting to forecast CO
2 storage performance, as demonstrated by Li et al. [
18]. Together, these studies show that proxy models can effectively mirror simulation outcomes, making them useful for decision-making in carbon storage projects.
Building upon this initial modeling framework, we aim to investigate complex reservoir interactions in a realistic subsurface configuration involving the centrally positioned CO2 injector and adjacent producer wells.” This arrangement allows us to explore critical phenomena such as pressure distribution effects, CO2 plume dynamics, injectivity constraints, and trapping efficiencies under varied geomechanical and geochemical scenarios. Despite considerable progress in carbon capture and storage (CCS) research, significant knowledge gaps persist, particularly in understanding the integrated effects of geological heterogeneities, operational injection strategies, and nearby producer wells on long-term CO2 containment security. Traditional numerical reservoir simulations, although insightful, remain computationally demanding and inflexible for rapid scenario analysis or real-time decision-making.
To address these challenges, our study introduces a novel methodological integration of advanced reservoir simulations with machine learning-based smart proxy models (SPMs). SPMs serve as simplified yet highly accurate data-driven approximations of complex numerical models, effectively capturing essential relationships between critical reservoir parameters and key outputs [
17]. Leveraging ensemble learning algorithms including random forest, gradient boosting, and decision trees these smart proxy models rapidly assess a broad range of CCS scenarios, substantially reducing computational effort compared to traditional numerical simulations [
16]. These models replicate detailed numerical simulations accurately while operating efficiently on standard computational resources, facilitating comprehensive sensitivity analyses and robust uncertainty quantification [
16]. Thus, the methodological novelty of our research lies in systematically coupling physics-based reservoir modeling with sophisticated machine learning techniques, significantly enhancing predictive capability, scenario assessment efficiency, and ultimately optimizing secure and sustainable long-term CO
2 storage strategies.
2. Materials and Methods
In this study, three advanced machine learning models were utilized: random forest, gradient boosting, and decision trees. These models were deployed as intelligent proxy models to enhance the efficiency and accuracy of predicting the volume of CO2 trapped within the reservoir. The primary motivation behind using these models was to significantly reduce the simulation and computational time required for analyzing large datasets, while maintaining robust predictive capabilities. By leveraging these machine learning approaches, the study aimed to capture the complex, nonlinear relationships inherent to the simulation results obtained from CO2 injection scenarios. The models were trained and validated using data generated from simulations, ensuring their ability to generalize across various conditions. Additionally, these proxy models provided insights into key reservoir parameters influencing CO2 trapping, enabling faster decision-making and the optimization of injection strategies.
2.1. Model Setup
To provide a clear overview of the methodology employed in this study, a structured workflow was developed outlining the key stages of the research process. This includes the initial construction of a 2D reservoir model, the simulation of various CO
2 trapping mechanisms, and the integration of additional reservoir features. It also highlights the development of smart proxy models and the final analysis of simulation results. The complete workflow is illustrated in
Figure 1.
2.1.1. Base Case Model
The base model was developed using CMG’s 2023.10 software packages—Builder, GEM, and WinProp—to create a Cartesian grid with dimensions of 100 blocks in the i-direction, 1 block in the j-direction, and 20 blocks in the k-direction, as outlined in
Table 1. Each layer in the k-direction has a thickness of 5 m, with the top of the grid starting at 1200 m, giving the reservoir a total thickness of 100 m, extending from 1200 m at the top to 1300 m at the bottom, as shown in
Figure 2 below. The reservoir was modeled with isotropic permeability, meaning permeability values are consistent in all directions, with a uniform single porosity of 18% throughout. All measurements in this study adhere to S.I. units. The reservoir pressure and temperature were set to 11,800 kPa and 50 °C, respectively.
For the base case simulation, the Peng–Robinson equation of state (PR EOS) [
19] was applied, as shown in Equation (1). The PR EOS establishes a relationship between thermodynamic properties and phase equilibrium for CO
2, determining densities, fugacities, and equilibrium phase compositions based on input conditions. This model also incorporates a generalized equation of state for pure components in GEM, as presented in Equation (3). The reservoir fluid components were modeled in WinProp with carbon dioxide (CO
2) indicated as the secondary component, being the injection fluid. However, methane (CH
4) is specified as a primary component and reservoir fluid with a mole fraction composition of 0.999 but is treated as a trace component by the simulator to maintain a gas phase in each block. The inclusion of methane as a trace component is essential for initializing aqueous solubility equations across the blocks. It ensures that a minimal amount of gas is present for phase calculations without allowing methane to dissolve in water, thereby facilitating accurate gas property calculations. The Rowe–Chou aqueous density correlation and the Kestin aqueous viscosity correlation were chosen as the recommended correlations for aqueous density and viscosity calculations for CCS simulations.
Water gas contact was set to 1150 m, which indicates that the whole reservoir is under water. The injection time was set to 10 years from the year 2024, after which the well is shut in, while monitoring was carried out for 40 additional years post-injection, leading to a total of 50 years. The constraints governing the operation of the injection well and producer wells are detailed in
Table 2 below.
2.1.2. Residual Gas Trapping
This hysteresis simulation was two-phase hysteresis based on Carlson and Land’s model. This trapping mechanism, a physical process of CO
2 immobilization, relies on CO
2 movement and interaction with the aqueous phase, governed by saturation hysteresis and capillary forces. Relative permeability, which defines the ease with which a phase moves through the porous medium, depends on various rock and fluid properties, including wettability, the flow rate, interfacial tension, and saturation history [
20]. During CO
2 injection, the flow initially aligns with the relative permeability drainage curve for gas, as shown in
Figure 3. Upward CO
2 migration results from convective currents and gravity differentials between CO
2 and brine. This migration reduces water saturation near the injection well, followed by an increase in water saturation as the CO
2 plume moves away. This increase causes a transition from drainage to imbibition, altering the gas flow path as graphically depicted in
Figure 3 below. KrgD referes to the drainage curve, while KrgI refers to the imbibition curve. These curves govern the working principle of the Land’s method in the model. The input Krg values put into the simulator is automatically generates a curve recognized as a drainage curve by the simulator. Through hysteresis, CO
2 trapping occurs as residual gas saturation shifts from the drainage to the imbibition phase, reducing CO
2 mobility [
21]. This shift minimizes the risk of CO
2 migrating towards upper reservoir layers, enhancing gas containment within the storage formation [
22,
23]. This trapping mechanism, often referred to as residual or hysteresis trapping, is vital for secure, long-term CO
2 sequestration. Hysteresis was activated in the model with an Sgt, max value of 0.4. Equations (1) and (2) illustrate the underlying parameters that influence Land’s model, which assists in predicting drainage curves accurately.
2.1.3. Hysteresis with Infinite-Acting Aquifer Support
At this stage of the simulation, to simulate the presence of an infinite-acting aquifer, we applied a pore volume multiplier to expand the size of the lowest reservoir layer, as illustrated in
Figure 4. By increasing the volume in this region, the model can simulate the effects of an endless water influx from an aquifer, allowing for a more realistic analysis of pressure behavior, saturation changes, and fluid movement over time. The increased volume in the aquifer region effectively supports the surrounding layers, replicating how an infinite aquifer would behave in a real-world scenario by continuously supplying water to maintain reservoir pressure. This approach effectively mimics the characteristics of an infinite aquifer, allowing us to observe how this additional water influx would impact reservoir conditions over time. We analyzed key parameters, including changes in pressure, water saturation, gas saturation, and gas density across different time intervals. This setup enabled a detailed investigation into how the aquifer’s influence alters reservoir dynamics, providing insights into fluid flow, gas migration, and pressure maintenance throughout the simulation.
2.1.4. Solubility Trapping
In this study, our solubility simulations were based on Henry’s law and Harvey’s correlation [
25]. According to Henry’s law, at a constant temperature, the amount of gas that dissolves in a liquid is directly proportional to the partial pressure of that gas above the liquid. Thus, as pressure increases, CO
2 solubility also increases. However, an increase in temperature and salinity reduces solubility [
26]. This approach allows us to capture the conditions under which CO
2 dissolves effectively in brine, contributing to long-term storage stability in saline aquifers.
Gas solubility from Henry’s law [
27]:
HCO2 = Henry’s law constant of component CO2 (it is a function of Pressure, temperature and Salinity)
yiw = CO2 mole fraction in the aqueous phase
Equation (4) represents Henry’s law applied to CO2 solubility, where the amount of CO2 dissolved in water (expressed as the mole fraction) depends on the Henry’s law constant and the equilibrium fugacity of CO2 in both phases. This relationship helps predict how much CO2 can dissolve in the brine, depending on reservoir conditions like pressure, temperature, and salinity.
2.1.5. Mineralization Trapping
This geochemical trapping mechanism involves converting CO2 into stable mineral phases, primarily carbonate minerals such as calcite, dolomite, and siderite. Additionally, CO2 can adsorb onto clay minerals within the reservoir. The mineral trapping of CO2 is governed by two main classes of chemical reactions: those occurring between components in the aqueous phase and those between minerals and aqueous components. These reactions are represented in Equations (5)–(7) for aqueous reactions and Equations (8)–(10) for mineral dissolution and precipitation. For this simulation, the Transition State Theory (TST) model was applied to simulate mineral reactions, while the B-dot model was chosen to govern the aqueous reactions. The B-dot model is preferred for CCS modeling, as it provides greater accuracy for aqueous species compared to the Ideal Solution model, which is typically used as the default in GEM. It is important to note that mineral dissolution and precipitation can alter the pore volume in the porous medium, leading to changes in porosity and potential variations in permeability.
The geochemical database utilized for the simulation was ‘thermo.dat’ [
28], which provides thermodynamic data essential for modeling mineral–water interactions in the context of CO
2 storage.
Aqueous chemical equilibrium reactions [
29]:
Mineral dissolution and precipitation reactions [
29]:
2.1.6. Structural Trapping (Caprock Integrity)
Many reservoirs are initially enclosed by a competent, sealing caprock. A pressure differential typically exists across this caprock, and during CO
2 injection, if this pressure differential becomes sufficiently large, the caprock seal may be compromised. This breach could lead to CO
2 leakage into overlying layers, potentially contaminating nearby freshwater resources. Once a breach occurs, further seal degradation may follow, allowing for an increased outflow of injected fluid.
Figure 5 illustrates the typical behavior of pressure with increasing depth, including hydrostatic pressure, fracture gradient, and lithostatic stress, which help determine safe injection limits.
Figure 6 below illustrates a schematic of rock stress and fluid pressure, showing that deeper layers can exist in overpressure regimes. The fracture gradient indicates the pressure at which rock fracturing can occur, which often approaches the maximum stress level. To maintain caprock integrity, injection pressure must remain below the fracture pressure.
The purpose of this geomechanical simulation was to examine potential caprock leakage from injecting CO
2 over 10 years, comparing a single centralized injector scenario with a configuration using one injector and two adjacent producers. The objective was to evaluate pressure distribution, the likelihood of caprock fracturing, and trapping efficiency in each scenario. The Barton–Bandis model was applied to simulate the opening of a conductive fracture in the event of tensile failure. For this model, a natural fracture system was predefined, and a very low fracture permeability was assigned to simulate an effective no-flow boundary. As pore pressure increases under constant total stress, the rock’s effective normal stress decreases. If the effective normal stress drops sufficiently, tensile failure can occur [
30]. This simulation aimed to determine whether and when tensile failure would occur, particularly with the influence of two producer wells adjacent to the CO
2 injector at the center of the grid, as illustrated in
Figure 7 below.
Figure 5.
Schematic demonstration of Pressure behavior with respect to depth [
31].
Figure 5.
Schematic demonstration of Pressure behavior with respect to depth [
31].
Figure 6.
Cross-sectional view of the reservoir model with the producer wells present while illustrating permeability distribution across layers. The blue section represents a layer of extremely low permeability, modeled to imply the presence of a caprock seal, which acts as a barrier to upward CO2 migration.
Figure 6.
Cross-sectional view of the reservoir model with the producer wells present while illustrating permeability distribution across layers. The blue section represents a layer of extremely low permeability, modeled to imply the presence of a caprock seal, which acts as a barrier to upward CO2 migration.
Figure 7.
Cross-sectional permeability profile of the layers in the grid with just the injector well and no producers present.
Figure 7.
Cross-sectional permeability profile of the layers in the grid with just the injector well and no producers present.
To model the caprock layers, I divided the topmost layer into 13 additional sub-layers along the ‘K’ direction, adjusting their properties to represent overburden. Among these, two layers (Layers 6 and 12) serve as caprock layers, as shown in
Figure 6 and
Figure 7. These caprock layers have a thickness of 4.5 m each, compared to the 15.25 m thickness of the remaining 11 layers and 7.5 m in the rest of the grid. The entire grid was assigned a porosity of 0.13, while the caprock layers were given an I-direction permeability of 1 × 10
−7 md.
In the model, the layers immediately above, below, and including the caprock layers were specified as ‘active’ for the Barton–Bandis model, enabling fracture reactivation in and around the caprocks, as illustrated in
Figure 6. For geomechanical properties, the first rock type was modeled using the Mohr–Coulomb Elasto-Plastic model with a Young’s modulus of 4.9987 × 106 kPa, a Poisson’s ratio of 0.25, and a cohesion of 689,476 kPa. The second rock type was simulated using the Drucker–Prager Elasto-Plastic model with a Young’s modulus of 861,845 kPa, a Poisson’s ratio of 0.3, and the same cohesion value of 689,476 kPa. Two additional rock types were generated, modeled with properties identical to Rock Types 1 and 2.
I specified the deformation rock type for each individual layer along the K-direction, defining certain zones as rock compaction regions. These regions were assigned a reference pressure of 24,476.4 kPa for calculating rock compressibility, with a rock compressibility value of 1.28213 × 10
−6 1/kPa. The parameters used for activating the Barton–Bandis fracture model are detailed in
Table 3 below.
2.2. Machine Learning Methods
2.2.1. Random Forest
The random forest algorithm operates on the principle of ensemble learning, specifically using a technique called bagging (bootstrap aggregating). RF creates multiple decision trees by randomly sampling the training data with replacement. This process is called bootstrap sampling. At each node of a decision tree, RF selects a random subset of features to consider for splitting. This introduces additional randomness and helps decorrelate the trees [
32]. At each split node, a binary test is conducted on a subset, directing the result to either the left or right sub-node. The test involves selecting a random subset of features and identifying a value that minimizes the mean square error to determine the optimal branch. This process can be represented as follows [
32]:
Here, Ei represents the true value of the i-th sample, while C1 and C2 denote the predicted values for the left and right subspaces, D1 and D2, respectively. Each tree is grown to its maximum depth or until a stopping criterion is met. The trees are not pruned, allowing them to capture complex patterns in the data. For regression tasks, the average prediction across all trees is given by Equation (12) below:
The combined regression model is depicted as f(x).
RF can capture complex, non-linear relationships in the data, which is crucial for modeling complex systems like oil and gas reservoirs. RF provides a measure of feature importance, helping identify the most influential parameters in the model [
33]. Random forest regression algorithms have been employed to predict CO
2-WAG (water alternating gas) performance, including oil production, CO
2 storage amount, and storage efficiency, which just further demonstrates its usage as a smart proxy model, as seen in [
18].
2.2.2. Gradient Boosting
The model starts with an initial prediction, often the mean of the target variable for regression tasks. In each iteration, the model calculates the residuals (differences between predictions and actual values) [
34]. A new weak learner (e.g., a decision tree) is trained to predict these residuals. The predictions of this new learner are added to the ensemble’s predictions, scaled by a learning rate. The algorithm aims to minimize a specified loss function (e.g., mean squared error for regression) at each step; by iteratively reducing the loss, the model improves its predictive accuracy. Various techniques like shrinkage (learning rate) and subsampling are used to prevent overfitting.
2.2.3. Decision Trees
A decision tree consists of nodes (decision points) and branches (possible outcomes). The topmost node is called the root node, internal nodes represent features, and leaf nodes represent the final decisions or predictions, as shown in
Figure 8 below. Each node evaluates a feature and splits the dataset based on a threshold value. The objective is to maximize the homogeneity of resulting subsets or minimize the loss function (e.g., mean squared error for regression). The dataset is split recursively based on feature values that optimize a chosen metric.
Common splitting metrics include the following:
Gini impurity: measures the probability of incorrect classification of a randomly chosen element.
Entropy (for information gain): measures the disorder or uncertainty in the dataset.
Variance reduction: used in regression tasks to minimize variance within subsets.
To prevent overfitting, trees can be pruned by removing branches that provide little predictive power. Decision tree-based models have been used to predict CO
2 solubility in aqueous solutions, which is crucial for understanding CO
2 behavior in reservoirs, as seen in [
35]. Decision tree algorithms and their ensemble variants offer a powerful and interpretable approach to modeling complex CO
2 injection scenarios. Their ability to capture non-linear relationships, provide feature importance, and make rapid predictions makes them valuable tools for developing smart proxy models in reservoir simulation and optimization [
36].
2.2.4. Model Development
To enhance the analysis of the simulation results, three machine learning models were developed: gradient boosting, random forest, and decision trees, tailored for each trapping mechanism (excluding caprock). These models were employed as smart proxy models to predict the amount of CO2 trapped by each mechanism, capturing the nonlinear relationships among key reservoir parameters. The input features included Time, Pressure (kPa), Effective Porosity, Residual Gas Saturation for Krg Hysteresis, Current Net Pore Volume, Gas Saturation, Gas Saturation for Krg Hysteresis, Gas Relative Permeability, and Gas Mass Density.
The machine learning models provided robust predictions by capturing complex interactions between these parameters and their impact on CO2 trapping. I also conducted feature importance analysis for each trapping mechanism to identify the most influential parameters driving CO2 trapping. This helped explain which reservoir properties had the greatest impact on the efficiency of each mechanism, based on historical data matching. The performance and efficiency of the smart proxy models were evaluated using R2, the mean absolute error (MAE), and the mean relative percentage error (MRPE) metrics on both the training and the testing datasets. These additional metrics offer complementary perspectives: MAE assesses absolute prediction error in SCF, while MRPE quantifies average percentage deviation from observed values, which aids interpretability across varying magnitudes. The results for each unique case and model prediction are presented below, showcasing the reliability and accuracy of these models in predicting CO2 trapping behavior. This comprehensive approach not only enhanced the understanding of hysteresis and trapping mechanisms in the reservoir but also provided a framework for optimizing CO2 injection and storage strategies.
Model Training and Evaluation Workflow
The following steps highlight the breakdown of how the models were built, from the data preprocessing to predictions and visualization:
Data pre-processing: The dataset was loaded and meticulously preprocessed to ensure compatibility with machine learning algorithms. The Time feature was transformed into a numerical format (seconds since epoch) to enable meaningful calculations. Input features (X) and the target variable (y) were coerced into numeric data types to handle any potential formatting inconsistencies. Rows containing missing or invalid values were removed to maintain data integrity and ensure a robust analysis.
Feature selection: The input features selected for modeling included critical geophysical and reservoir properties such as porosity, relative permeability, saturation, density, and pressure. These variables were chosen based on their known relevance to the CO2 trapping process, ensuring the model captured the most influential factors affecting the target variable.
Train–test split: To evaluate model performance on unseen data, the dataset was split into training (80%) and testing (20%) subsets. This split was essential to prevent overfitting and to provide an unbiased estimate of the model’s ability to generalize to new data.
Model training: The three regression models—random forest (RF), gradient boosting (GB), and decision tree (DT)—were trained using the training data. Hyperparameter optimization was applied where appropriate. For RF and GB models, a comprehensive grid search (GridSearchCV) was employed to identify the optimal combination of parameters, such as max_depth, n_estimators, and min_samples_split. For DT models, key parameters like max_depth, min_samples_split, and min_samples_leaf were manually fine-tuned to balance complexity and prevent overfitting.
Performance evaluation: Model performance was evaluated using R2 scores on both training and testing datasets. These metrics quantified the model’s accuracy and predictive capabilities. Additionally, the inclusion of training and testing R2 scores in the prediction plots provided a visual representation of model performance and highlighted any potential overfitting issues.
Feature importance: Feature importance scores were extracted to identify the most influential input variables driving the predictions. These scores were visualized using a bar plot, offering valuable insights into the physical parameters most critical to predicting CO2 trapping.
Predictions and visualization: The model’s predictions were incorporated into the original dataset to facilitate comparisons with actual values. A time-series plot was generated to visually compare the original and predicted CO2 trapping values, offering a clear depiction of the model’s performance over time.
To improve clarity and facilitate comparison across all scenarios,
Table 4 summarizes the setup, key features, and objectives of the different simulation cases presented in this study.
3. Results
The following results were obtained from the comprehensive simulations conducted to investigate various CO2 trapping mechanisms and the development of smart proxy models aimed at predicting CO2 trapping efficiencies. This section details the outcomes from the series of simulations that explored the different scenarios of CO2 behavior under varying subsurface conditions. The presence of an infinite-acting aquifer and two adjacent producer wells significantly influenced CO2 plume distribution and storage efficiency. The aquifer’s pressure support extended the lateral migration of CO2, leading to a more dispersed plume. The producers introduced localized pressure sinks, which altered the CO2 movement, pulling the plume toward the production wells and modifying the expected containment efficiency. This effect was most pronounced in the post-injection phase, where the producers continued to influence plume stabilization. During the active injection phase (pre-2040), the plume exhibited rapid expansion and increased trapping via structural and residual mechanisms. However, post-injection (post-2040), plume stabilization was prolonged due to the combined influence of pressure depletion from the producers and the aquifer’s continuous support. The competition between these forces dictated CO2 migration pathways, creating a dynamic interplay that influenced long-term trapping efficiency. CO2 saturation maps revealed that pressure-controlled displacement governed the efficacy of structural and residual trapping, with the aquifer enhancing vertical migration while the producers promoted horizontal movement. This suggests that aquifer connectivity plays a crucial role in determining storage permanence and optimizing CO2 containment strategies.
From
Figure 9 below, the base case (green curve) exhibits the least amount of CO
2 trapping, stabilizing early at approximately 5.0 × 10
7 SCF. This scenario represents a reservoir without enhanced trapping mechanisms such as hysteresis, solubility, or aquifer effects. As a benchmark, the base case demonstrates limited trapping efficiency, underscoring the critical role of advanced mechanisms in improving CO
2 storage capacity and security. After injection stops in 2035, CO
2 trapped under the hysteresis mechanism increases significantly due to residual trapping, where capillary forces immobilize CO
2 in the pore spaces. Between 2040 and 2075, the curve gradually approaches a plateau, indicating that most residual trapping occurs shortly after injection ceases, with the trapping stabilizing at approximately 7.0 × 10
7 SCF. Hysteresis is a critical mechanism for ensuring secure long-term storage by effectively immobilizing CO
2 within the reservoir.
The combination of hysteresis and the infinite aquifer (purple curve) leads to a sharp rise in trapped CO2 immediately after injection stops in 2035, initially surpassing all other scenarios. The aquifer enhances residual trapping by maintaining pressure and encouraging brine displacement, which increases the amount of CO2 immobilized. From 2040 to 2075, the trapped CO2 continues to rise slowly, ultimately achieving the highest final trapped value of approximately 7.8 × 107 SCF. This combination proves to be the most effective for maximizing storage capacity, although the increased pressure from the aquifer necessitates careful monitoring of caprock integrity to ensure long-term stability.
The solubility trapping mechanism (brown curve) shows a steady increase in trapped CO2 between 2035 and 2040 as CO2 dissolves into brine, reducing the free gas phase. However, the rate of increase is slower compared to hysteresis-driven mechanisms. From 2040 to 2075, the trapping stabilizes at approximately 5.5 × 107 SCF, reflecting the limit of CO2 solubility in the formation brine. Although solubility is a slower mechanism, it contributes to secure CO2 trapping by reducing mobile CO2 and enhancing chemical stability within the reservoir.
Harvey’s solubility (orange curve) shows a steady increase in trapped CO2 between 2035 and 2040, similar to the standard solubility mechanism, as CO2 dissolves into brine. From 2040 to 2075, the trapping plateaus at approximately 6.0 × 107 SCF, slightly higher than the standard solubility case, likely due to improved brine displacement or enhanced dissolution conditions. This scenario demonstrates a marginal improvement over standard solubility trapping, providing slightly increased storage efficiency.
Mineralization trapping (blue curve) remains negligible throughout the monitoring period from 2035 to 2075, with almost no increase in trapped CO
2. This indicates that mineralization is a very slow process requiring significantly longer timescales to contribute meaningfully to storage. While mineralization offers secure trapping, it is not a significant mechanism within the timescales of this simulation. A clearer visual representation of these information is provided in
Figure 10a,b, with
Figure 10b specifically highlighting the storage efficiency of the geomechanical trapping forces within the caprock.
The smart proxy models were employed to predict CO2 trapping efficiency under various conditions. The models successfully replicated the impact of aquifer support and producer wells on CO2 trapping efficiency, providing rapid insights into reservoir behavior. The interaction between aquifer influx and producer-induced pressure depletion was effectively modeled, emphasizing the dynamic nature of CO2 storage in reservoirs with active pressure interference.
Table 5 presents the performance comparison of the Smart proxy models in predicting CO
2 trapping across different scenarios. The Train R
2 and Test R
2 scores were used to evaluate the models’ predictive accuracy and generalization ability. Across all trapping mechanisms, gradient boosting consistently achieved the highest Test R
2 scores, indicating superior performance in capturing nonlinear dependencies and reservoir dynamics.
The smart proxy models identified key reservoir parameters that significantly influence CO2 trapping efficiency. Among these, pressure (kPa), gas saturation, net pore volume, and time emerged as the dominant factors controlling CO2 retention. Pressure fluctuations played a crucial role in stabilizing the CO2 plume, where higher pressures favored increased solubility trapping by enhancing the dissolution of CO2 into the formation brine. Gas saturation was equally critical, particularly in determining residual trapping efficiency, with hysteresis effects amplifying CO2 immobilization in regions of high saturation. Net pore volume dictated the reservoir’s overall capacity to accommodate CO2, reinforcing the role of rock properties in ensuring long-term containment feasibility. Additionally, time was essential in capturing cumulative trapping effects, highlighting the need for prolonged monitoring to evaluate storage integrity over extended periods. The presence of hysteresis effects further influenced CO2 immobilization, as higher gas saturation regions exhibited stronger residual trapping. This suggests that pressure depletion from adjacent producer wells can either enhance or diminish CO2 storage efficiency, depending on the permeability and porosity distribution within the reservoir.
4. Discussion
The simulation results highlight the profound influence of aquifer support and adjacent producer wells on CO2 plume distribution and storage efficiency. The aquifer’s sustained pressure support extended lateral CO2 migration, while localized pressure sinks from producer wells redirected portions of the plume, altering containment dynamics. These interactions played a critical role in both pre- and post-injection phases, with the post-injection period showing prolonged plume stabilization due to the combined effects of pressure depletion and aquifer influx. This finding highlights the necessity of reservoir pressure management strategies to maintain long-term CO2 containment, as uncontrolled migration could compromise storage security.
The analysis of CO2 trapping mechanisms revealed distinct efficiencies in securing CO2 over time. The base case demonstrated limited trapping efficiency (5.0 × 107 SCF), serving as a benchmark that emphasizes the necessity of enhanced trapping mechanisms for improving storage performance. Hysteresis trapping, governed by capillary forces, significantly increased CO2 immobilization within the pore spaces, stabilizing at 7.0 × 107 SCF, which is crucial for preventing CO2 remobilization and potential leakage. The combination of hysteresis and aquifer effects provided the highest overall storage efficiency (7.8 × 107 SCF) by promoting brine displacement and residual trapping, making it the most effective strategy for long-term CO2 sequestration. Solubility trapping, although a slower process, contributed to chemical stabilization, reducing the amount of free CO2 and mitigating the risk of migration. However, mineralization trapping remained negligible, reinforcing the understanding that it is a long-term process requiring extended timescales before it can significantly contribute to CO2 storage security. These findings emphasize the importance of early-stage trapping mechanisms, particularly hysteresis and solubility, in ensuring effective CO2 containment over decades to centuries.
A critical discovery was the role of producer wells in maintaining caprock stability under continuous CO2 injection conditions. Without active production, caprock integrity was compromised, leading to tensile failure within three years. However, the pressure dissipation facilitated by producer wells significantly extended the structural integrity of the caprock, reinforcing the necessity of strategic well placement in carbon storage projects.
The smart proxy models successfully captured CO2 trapping behavior across different scenarios, offering a rapid and effective tool for predicting storage efficiency under various reservoir conditions. Among the models, gradient boosting consistently outperformed decision tree and random forest models, achieving the highest predictive accuracy (Test R2 up to 0.9991). This underscores its robust ability to model nonlinear reservoir dynamics and complex CO2 interactions. The key parameters influencing CO2 retention included reservoir pressure, gas saturation, net pore volume, and time, all of which significantly impacted trapping efficiency. Higher pressures enhanced solubility trapping by increasing CO2 dissolution into formation brine, while gas saturation played a crucial role in residual trapping efficiency. The interaction between pressure depletion from producer wells and aquifer support was found to be a defining factor in CO2 retention, as pressure fluctuations dictated the movement and stabilization of the CO2 plume. These insights reinforce the need for continuous reservoir monitoring and advanced predictive models to ensure optimal CO2 sequestration strategies in both natural and engineered storage sites.
The findings provide critical insights into optimizing CO2 storage strategies and mitigating risks associated with plume migration and leakage. The importance of hysteresis-driven mechanisms, aquifer interactions, and strategic producer well placement in enhancing long-term CO2 containment cannot be overstated. Future research should focus on extending simulation timescales to better understand mineralization trapping, refining smart proxy models to incorporate geological heterogeneities, and investigating reservoir heterogeneity effects on CO2 migration pathways.
In addition to R2 scores, the inclusion of mean absolute error (MAE) and mean relative percentage error (MRPE) in model evaluation provided a more comprehensive assessment of predictive accuracy. While R2 evaluates the proportion of variance explained by the model, MAE reflects the average absolute deviation between predicted and actual CO2 trapped volumes, reported in SCF. MRPE, on the other hand, contextualizes model performance as a percentage error relative to actual values, offering a scale-invariant metric that is especially valuable when comparing performance across scenarios with varying magnitudes of CO2 trapping.
Across all scenarios, MRPE values remained consistently below 1%, with many falling in the 0.25–0.75% range, indicating highly reliable predictions with minimal deviation from true values. This exceptionally low relative error reinforces the robustness of the smart proxy models and validates their effectiveness in replicating complex simulation outputs.
Among the evaluated models, random forest emerged as the overall best performer. It consistently achieved the lowest MRPE scores (as low as 0.25% in both the Hysteresis + Infinite Aquifer and Solubility scenarios) and the lowest MAE in four out of the six trapping mechanisms. While gradient boosting occasionally reported higher R2 values, random forest struck the best balance between high explanatory power (R2), low absolute error (MAE), and low relative error (MRPE). These results confirm random forest’s superior generalization ability and precision, making it the most reliable smart proxy model for predicting CO2 trapping efficiency across diverse reservoir conditions.
4.1. Coupled Geochemical–Geomechanical Interactions During CO2 Injection
While this study individually examined geochemical trapping mechanisms (such as solubility and mineralization) and geomechanical stability (such as caprock integrity), the interaction between these processes plays a critical role in determining long-term CO2 storage effectiveness. Geochemical reactions, particularly mineral dissolution and precipitation, can significantly alter the physical properties of the reservoir and caprock, influencing porosity, permeability, and mechanical strength. These changes, in turn, impact CO2 migration pathways, trapping efficiency, and the potential for leakage.
During CO
2 injection, the acidification of brine due to CO
2 dissolution can trigger mineral dissolution reactions, enlarging pore spaces and increasing porosity. This increase in porosity can lead to enhanced permeability, potentially facilitating CO
2 migration and reducing trapping efficiency if unchecked [
31].
Conversely, mineral precipitation reactions, such as the formation of carbonate minerals, can occlude pore throats, reduce permeability, and strengthen the rock matrix, thereby promoting CO
2 immobilization and enhancing caprock sealing capacity [
37].
These geochemical changes directly affect geomechanical properties. For example, increased porosity from dissolution reduces rock stiffness and lowers the effective stress threshold, making formations more susceptible to mechanical failure, including fracture propagation or shear failure [
38]. On the other hand, mineral precipitation can enhance rock stiffness and mechanical strength, improving the formation’s ability to resist fracturing under elevated injection pressures. Thus, the interplay between geochemistry and geomechanics determines not only the storage capacity but also the mechanical integrity of the storage complex over time.
Although the present study separately modeled geochemical trapping and mechanical behavior, a fully coupled geochemical–geomechanical simulation would further refine the understanding of these interactions. Future work should integrate dynamic updates to reservoir porosity, permeability, and mechanical properties based on real-time geochemical alterations. This would enable more accurate prediction of CO2 plume evolution, trapping efficiencies, and long-term storage security under evolving reservoir conditions.
Acknowledging these couplings is critical because ignoring geochemical–geomechanical feedback could lead to underestimating risks associated with permeability evolution, caprock integrity loss, or unintended CO2 migration. Incorporating these processes into predictive frameworks and smart proxy models represents an important step toward developing safer, more efficient carbon sequestration strategies.
4.2. Qualitative Validation Using Literature-Based Benchmarks
In the absence of direct access to field-scale data or site-specific experimental measurements, it becomes essential to establish the credibility of simulation outcomes through alternative means. To address this limitation, a qualitative cross-validation approach was adopted, leveraging well-documented findings from peer-reviewed literature and benchmark case studies. This strategy enables a contextual comparison of the trapping behaviors, spatial CO2 distribution, and overall reservoir response observed in our simulations with those reported in field-scale projects (such as Sleipner and the Illinois Basin Decatur Project) and widely recognized numerical models.
By aligning our simulation results with these established studies, we aim to demonstrate that the modeled mechanisms in this study—structural trapping, residual (hysteresis) trapping, solubility trapping, and mineral trapping—exhibit consistent behaviors and trends observed in real-world or benchmarked scenarios. This literature-based validation framework not only supports the robustness of our methodology but also emphasizes the value of conceptual and sensitivity-driven modeling in enhancing understanding of CO2 storage dynamics when direct calibration is not feasible.
4.2.1. Qualitative Validation of Structural Trapping and Caprock Integrity
The robustness of caprock integrity presented in this study qualitatively aligns with documented experiences from the Sleipner CO
2 storage project. As detailed by [
39], the Nordland Shale caprock at Sleipner, characterized by a substantial thickness of 50–100 m and a high capillary entry pressure, effectively prevents upward migration of CO
2, validating our findings regarding the critical role of caprock thickness and sealing capacity. The Sleipner studies further support our emphasis on comprehensive geological characterization and continuous monitoring, confirming that extensive seismic, micro-seismic, and geomechanical analyses are essential for accurate prediction of migration pathways and assurance of caprock stability. Our observations about pressure management through strategic producer well placements align with Sleipner’s successful pressure containment strategies, demonstrating consistency with proven practices in managing reservoir pressure to maintain structural integrity and storage security.
These qualitative parallels substantiate the conclusions drawn from our simulations, reinforcing that our model appropriately captures the fundamental mechanisms contributing to geological storage security observed in well-established field projects.
4.2.2. Qualitative Validation of Residual (Hysteresis) Trapping
The residual trapping mechanism demonstrated in our simulations qualitatively aligns with established radial simulation results by Ennis-King and Paterson [
40]. Our findings indicate a rapid increase in CO
2 immobilization shortly after injection ceases, driven by capillary forces that promote residual trapping. Over time, the trapped CO
2 stabilizes, reaching a plateau that signifies effective long-term containment.
This behavior mirrors the patterns observed in Ennis-King and Paterson’s work, where gas saturation distributions show that residual trapping becomes effective immediately after injection. Their results highlight how capillary forces immobilize CO2 within pore spaces, preventing upward migration due to buoyancy over timescales ranging from hundreds to thousands of years. Specifically, their simulations demonstrate that, following the injection phase, CO2 becomes increasingly immobilized through residual and solubility mechanisms, leaving a minimal mobile fraction.
Further support comes from [
14], whose simulations show that, while buoyancy-driven flow may initially expand the CO
2 plume post-injection, residual trapping and dissolution gradually dominate, significantly reducing the proportion of mobile CO
2. This emphasizes the importance of residual trapping as a primary immobilization mechanism during early post-injection periods, a trend that was also reflected in our simulation results.
Together, these qualitative comparisons reinforce the validity of our hysteresis-induced residual trapping model and affirm the critical role of capillary forces in securing long-term CO2 storage in geological formations
4.2.3. Qualitative Validation of Solubility Trapping
The solubility-trapping behavior observed in our simulations aligns closely with findings from [
10], who investigated long-term CO
2 storage performance using coupled geochemical transport modeling. Their study identified solubility trapping as a dominant mechanism in the early post-injection period, driven by convective mixing and molecular diffusion processes that gradually increase the amount of CO
2 dissolved in the formation brine.
This qualitative agreement supports our simulation results, which similarly show a progressive accumulation of dissolved CO2 over time. The continued dissolution contributes significantly to storage security during the early to mid-term phase, effectively reducing the mobile gas phase and lowering the risk of buoyant migration. These consistent trends validate our representation of solubility trapping as a critical mechanism in the intermediate timeframe following injection.
4.2.4. Qualitative Validation for Mineralization and Geochemical Trapping
The minimal contribution of mineral trapping observed in our study is consistent with a wide body of literature that highlights the slow nature of geochemical reactions leading to mineral formation. Ennis-King and Paterson [
40] emphasize that, while mineralization offers the most permanent form of CO
2 storage, its impact is typically negligible during the first few decades to centuries post-injection. [
41] further explain that the kinetics of carbonate and silicate precipitation are slow and highly dependent on site-specific factors such as brine composition, mineral surface area, and pH buffering capacity.
Additional validation is provided by [
10], whose results show that mineral trapping evolves gradually, with aqueous CO
2 converting to solid carbonate phases over extended timescales. These transformations are influenced by geochemical conditions, such as brine chemistry and rock mineralogy, and are often observed only over hundreds to thousands of years. Together, these studies reinforce our interpretation of mineralization as a secondary, time-delayed trapping mechanism with a limited short-term impact but critical importance for long-term containment and storage permanence.
5. Conclusions
This study provides new insights into the mechanisms governing CO2 trapping in subsurface reservoirs, emphasizing the complex interplay between aquifer dynamics, pressure sinks from producer wells, and geochemical and geomechanical processes. Through an integrated modeling framework combining detailed numerical simulations and machine learning-based smart proxy models, we demonstrated how strategic reservoir configurations and operational strategies influence plume behavior, trapping efficiency, and caprock integrity.
The inclusion of adjacent producer wells was found to be a pivotal factor in managing reservoir pressure and enhancing structural stability, particularly in mitigating the risk of caprock failure during prolonged injection. Meanwhile, the infinite-acting aquifer played a crucial role in supporting pressure maintenance, which in turn facilitated enhanced residual and solubility trapping. These findings underscore the importance of understanding dynamic reservoir responses for designing safe and efficient carbon storage operations.
A major scientific contribution of this work is the development and validation of smart proxy models, which were evaluated using a comprehensive set of performance metrics: R2, mean absolute error (MAE), and mean relative percentage error (MRPE). While gradient boosting achieved the highest R2 scores in some scenarios, random forest consistently demonstrated superior performance across most trapping mechanisms, achieving the lowest MAE and MRPE values. These results confirm random forest as the most reliable and generalizable model for predicting CO2 trapping behavior in a variety of reservoir conditions. The low MRPE values—below 1% in all cases—further validate the high predictive accuracy of the proxy models and their value in replicating complex simulation outcomes.
Beyond technical performance, the broader implications of this study lie in its relevance for scalable and sustainable carbon sequestration. As regulatory and economic pressures mount to reduce industrial CO2 emissions, optimizing storage site configurations and improving model-based forecasts become essential. Our results highlight the feasibility of integrating machine learning with traditional reservoir engineering tools to accelerate this process, offering a rapid, data-driven approach for scenario evaluation and decision-making in CCS project design. Importantly, the simulation outcomes were qualitatively validated against established field studies and benchmark modeling efforts, confirming that the modeled trapping mechanisms, structural, residual, solubility, and mineralization, align with trends observed in real-world systems. This added layer of validation enhances confidence in our methodology and further supports the applicability of our findings in practical CCS design.