Next Article in Journal
Investigation of Titanium Alloy Cutting Dynamics in Thin-Layer Machining
Previous Article in Journal
A Novel Real-Time Battery State Estimation Using Data-Driven Prognostics and Health Management
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Prediction of CO2 Diffusion in Brine: Model Development and Salinity Influence Under Reservoir Conditions

by
Qaiser Khan
1,
Peyman Pourafshary
1,*,
Fahimeh Hadavimoghaddam
2,3 and
Reza Khoramian
1
1
School of Mining and Geosciences, Nazarbayev University, 010000 Astana, Kazakhstan
2
Chemical Engineering Department, Ufa State Petroleum Technological University, 450000 Ufa, Russia
3
Institute of Unconventional Oil & Gas, Northeast Petroleum University, Daqing 163318, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(15), 8536; https://doi.org/10.3390/app15158536
Submission received: 22 May 2025 / Revised: 27 June 2025 / Accepted: 3 July 2025 / Published: 31 July 2025

Abstract

The diffusion coefficient (DC) of CO2 in brine is a key parameter in geological carbon sequestration and CO2-Enhanced Oil Recovery (EOR), as it governs mass transfer efficiency and storage capacity. This study employs three machine learning (ML) models—Random Forest (RF), Gradient Boost Regressor (GBR), and Extreme Gradient Boosting (XGBoost)—to predict DC based on pressure, temperature, and salinity. The dataset, comprising 176 data points, spans pressures from 0.10 to 30.00 MPa, temperatures from 286.15 to 398.00 K, salinities from 0.00 to 6.76 mol/L, and DC values from 0.13 to 4.50 × 10−9 m2/s. The data was split into 80% for training and 20% for testing to ensure reliable model evaluation. Model performance was assessed using R2, RMSE, and MAE. The RF model demonstrated the best performance, with an R2 of 0.95, an RMSE of 0.03, and an MAE of 0.11 on the test set, indicating high predictive accuracy and generalization capability. In comparison, GBR achieved an R2 of 0.925, and XGBoost achieved an R2 of 0.91 on the test set. Feature importance analysis consistently identified temperature as the most influential factor, followed by salinity and pressure. This study highlights the potential of ML models for predicting CO2 diffusion in brine, providing a robust, data-driven framework for optimizing CO2-EOR processes and carbon storage strategies. The findings underscore the critical role of temperature in diffusion behavior, offering valuable insights for future modeling and operational applications.

1. Introduction

Anthropogenic greenhouse gas (GHG) emissions have recently reached critically high levels, significantly contributing to global warming and causing long-term changes in Earth’s climate system. To mitigate these effects, reducing atmospheric CO2 concentrations has become an urgent priority [1]. One effective solution is carbon capture and storage (CCS), where CO2 is either utilized for enhanced oil recovery (EOR) or permanently stored in deep saline aquifers. During this process, CO2 diffuses into the surrounding brine due to concentration gradients, following the natural drive toward thermodynamic equilibrium [2]. This dissolution reduces the buoyancy-driven vertical migration of CO2, leading to greater plume stability and more secure sequestration. A detailed understanding of the diffusion process enhances the accuracy of CO2 transport models, supports optimized storage design, and reduces the potential for long-term leakage—ultimately improving the environmental and operational safety of CCS and EOR initiatives [3].
The diffusion coefficient (DC) is a crucial parameter in determining the rate of CO2 movement into the brine. Higher DC values correspond to faster and more efficient diffusion, contributing to improved storage stability [4]. Since DC governs the transition of CO2 from a free to a dissolved phase, its accurate determination is essential for evaluating sequestration efficiency. Precise measurement under varying reservoir conditions—particularly temperature, pressure, and salinity—is vital for predicting CO2 behavior in saline aquifers. Three main approaches are used to determine DC: (a) experimental methods [5], (b) empirical or semi-empirical correlations [6], and (c) molecular simulations [7]. In addition, artificial intelligence (AI) techniques offer a powerful alternative by integrating data from all three sources to build robust predictive models capable of handling diverse and non-linear conditions. Several experimental methods have been developed to measure DC of gases, as reviewed in multiple studies. These methods, illustrated in Figure 1, include Taylor dispersion [8], laser-induced fluorescence [9], and pressure decay techniques [10]. Among them, the pressure decay method is widely used due to its reliability, simplicity, and suitability for controlled environments. Since the 1930s, the PVT method has also been extensively applied—often in combination with pressure decay—to assess CO2 diffusivity [11]. Experimental results show that DC is sensitive to pressure, temperature, and salinity. For example, Zhang et al. (2015) [12] reported values between 1.3 × 10−9 m2/s to 2.7 × 10−9 m2/s in NaCl solutions over 0.5–2.0 MPa and 25–70 °C. Yang and Gu (2006) [13] observed much higher DC values (170.7–269.8 × 10−9 m2/s) in brines at 2.6–7.5 MPa and 27–58 °C, likely due to natural convection. Lu et al. (2013) [14], using Raman spectroscopy at 10–45 MPa, found nearly pressure-independent behavior. Zarghami et al. (2017) [15] showed that increased salinity (20–80 ppm NaCl) at 68 °C significantly reduces diffusivity.
The CO2 diffusion coefficient in brine and pure water has been widely measured and reported. For example, Cadogan et al. (2014) [16] used the Taylor dispersion method and reported diffusivity values approximately 16% higher than those by Ratcliff and Holdcroft (1963) [17], who applied the wetted sphere absorber technique—particularly under low salinity (1 mol·L−1) and similar temperature-pressure conditions. Zhang et al. (2015) [12] studied CO2 diffusion in 3 wt% brine under offshore conditions (0.1–5 MPa, 286.15–303.15 K), reporting values from 1.3 to 2.7 × 10−9 m2/s and observing a linear increase with both pressure and temperature. Sell et al. (2013) [18], using microfluidics, showed that higher salinity inhibits diffusion due to stronger ion–water interactions. The Taylor dispersion method has been especially useful across a broad range of temperatures and pressures, as shown by Cadogan et al. (2014) [16], who highlighted the positive effect of temperature on diffusivity due to enhanced molecular motion. Tewes and Boury (2005) [19] employed the pendant drop technique to investigate gas–liquid interfacial dynamics, revealing key diffusion behaviors. Hirai et al. (1997) [20] used laser-induced fluorescence (LIF) to validate theoretical predictions under high-pressure conditions. Pressure decay methods have also shown significant sensitivity to ionic strength: Azin et al. (2013) [21] and Yang and Gu (2006) [22] found that increased salinity and ionic strength reduce CO2 diffusivity by acting as physical barriers to molecular transport. Although experimental methods provide accurate CO2 DC, they are time-consuming, costly, and require specialized instrumentation. As a result, empirical correlations have become a practical alternative, particularly when direct measurements are unavailable. For example, the Stokes–Einstein relation, as applied by Cadogan et al. (2015) [23], uses viscosity, temperature, and solute radius but performs poorly in high-salinity systems (>5 M NaCl) due to ion pairing and viscosity effects. While it works for simple NaCl brines, it fails in multicomponent systems with divalent ions like Mg2+ and Ca2+. Similarly, the Wilke and Chang (1955) correlation [24] incorporates temperature, viscosity, solute size, and an association parameter but is limited to dilute, non-electrolytic solutions. It neglects ionic strength and solute hydration, leading to significant errors in concentrated brines (>1 M salinity) [24,25,26]. Overall, these correlations struggle with complex reservoir conditions and cannot reliably predict DC in realistic, high-salinity brine systems. Beyond experiments and empirical correlations, molecular dynamics (MD) simulations offer atomic-scale insights into CO2 diffusion in brine. Studies by Garcia-Rates et al. (2012) [25] and Omrani et al. (2022) [27] highlight the role of ion hydration and salinity in controlling diffusivity. While powerful, MD simulations require validation under extreme reservoir conditions and are often combined with experimental methods like Raman spectroscopy and pressure decay to improve reliability. These hybrid approaches help capture complex interactions involving multivalent ions and ionic strength effects.
Traditional methods, including correlations and MD simulations, provide useful insights but face challenges with large datasets and non-linear interactions. Experimental techniques [28,29,30] are also time- and resource-intensive. Machine learning (ML) models address these issues by integrating experimental and simulation data to predict CO2 diffusivity with high accuracy and efficiency [31]. Feng et al. (2019) [32] and Bemani et al. (2020) [33] developed ML models—MKSVM-GA and PSO-ANFIS—for predicting CO2 diffusivity in brines. However, both used limited datasets with narrow salinity and pressure ranges. While high accuracy was reported, the small sample sizes risk overfitting and limit generalization. Their model parameters and performance metrics are summarized in Table 1.
The present study addresses limitations of previous works by developing a more robust and diverse predictive framework tailored for CCS applications. It aims to predict CO2 diffusion coefficients in brine using advanced machine learning models—Random Forest (RF), Gradient Boosted Regression (GBR), and XGBoost—trained on a dataset of 176 experimental and simulation data points. The data span pressures of 0.1–30 MPa, temperatures of 286.15–398 K, and salinities up to 6 mol/L. RF was chosen for its robustness against overfitting, while GBR and XGBoost employ sequential learning to capture complex patterns. These models, applied here for the first time in this context, use temperature, pressure, and salinity as inputs. Results confirm salinity as the second most influential factor after temperature due to its effect on solvent viscosity, density, and molecular mobility. The workflow is summarized in Figure 2, and the findings provide a robust, data-driven approach to support CO2 CCS and EOR design.

2. Data Description

In this study, a dataset consisting of 176 data points was gathered from different studies reported in the literature, including experimental and MD simulation data points [3,16,18,21,25,30,36,37,38,39,40,41]. While the experimental dataset was selected due to its high quality, public availability, and strong relevance to our study objectives, simulation data was also incorporated to address the limitations posed by the scarcity of experimental measurements. Simulation offers a controlled environment in which various variables can be systematically manipulated, enabling exploration across a wider parameter space. This approach allows us to generate meaningful findings that support our research objectives, particularly in areas where experimental data is difficult, costly, or time-consuming to obtain. Three input parameters including pressure (MPa), temperature (K), and salinity (mol/L) were considered in the data modeling process. The descriptive statistics for these parameters, presented in Table 2, highlight the diversity of the collected dataset. Notably, the mean values of pressure, temperature, salinity, and DC are 12.15 MPa, 320.06 K, 2.35 mol/L, and 2.07 ×10−9 m2/s, respectively, with corresponding standard deviations reflecting significant variability, especially for salinity (2.42 mol/L) and pressure (7.99 MPa). The dataset covers values from 0.10 MPa to 30.00 MPa for Pressure, 286.15 K to 398.00 K for temperature, and 0 mol/L to 6.76 mol/L. Quartile analysis provides additional insights into the data distribution, with Q1, Q2 (median), and Q3 values indicating lower, central, and upper thresholds, as detailed in Table 2. These descriptive statistics of dataset collectively make it suitable for the use of ML modeling and analysis. To support the performance comparison of the developed models, a radar chart was created using R2, RMSE, and MAE values for each model. These metrics were normalized to a 0–1 scale to enable fair and consistent visual comparison, the methodology for its construction involved use of Python 3.9 with the pandas and matplotlib libraries. In addition, model interpretability was enhanced using Shapley Additive Explanations (SHAP). SHAP values were calculated for the RF model using the SHAP library (version 0.45.0), allowing for the quantitative assessment of each input feature’s impact on the predicted diffusion coefficient. The SHAP plots, which are also presented in Section 3, help to identify and visualize the influence of temperature, pressure, and salinity on model output. All ML models were implemented using Python 3.9. The RF and GBR models were developed using the scikit-learn library (version 1.2.2), while the XGBoost model was implemented using the XGBoost library (version 1.7.5).
The Pearson correlation coefficients among pressure, temperature, salinity, and DC are visualized in Figure 3 as a heatmap. The correlation coefficients range between +1 and −1, where values near +1 or −1 indicate strong positive or negative linear relationships, respectively. The DC shows a moderate positive correlation with temperature (0.30), while salinity has a weak negative correlation (−0.21), and pressure (P) exhibits an almost negligible correlation (0.02) with DC. These weak correlations suggest that simple linear relationships may not fully explain the interactions between these input variables and output DC. Therefore, advanced ML models such as RF, GBR, and XGBoost are needed to capture complex, non-linear relationships and improve the prediction of the DC.
Figure 4 displays violin plots that reveal the distribution of each variable, highlighting potential outliers. To make the dataset suitable for developing ML models, the parameter values were normalized due to their varying scales across. The normalization approach shown in Figure 5 provides representation of the data by a histogram graph, enabling a detailed examination of each variable’s distribution. The distribution of DC shows a pronounced central peak, suggesting limited variability and a strong clustering of values around the mean. Conversely, the salinity parameter exhibits a multimodal distribution with broader variability, indicative of distinct subsets or measurement inconsistencies within the dataset. The pressure and temperature parameters reveal asymmetrical distributions, pointing to potential skewness or non-uniform sampling across experimental conditions. The boxplots embedded within each violin highlight the interquartile range, median, and potential outliers, offering complementary insights into central tendency and spread. Such detailed exploratory data analysis is indispensable for understanding the dataset’s underlying structure and ensuring informed decisions in subsequent analytical or modeling workflows.

2.1. Modeling with RF, GBR, and XGBoost

In this study, RF, GBR, and XGBoost models were selected due to their ensemble learning capabilities and ability to handle complex data patterns, non-linear relationships, and diverse input features which are essential for predicting CO2 DC in brine systems. In RF there is no need for feature normalization, it can handle both numerical and categorical data [42]. RF reduces overfitting by averaging multiple decision trees trained on random data subsets, providing robust predictions [43]. GBR and XGBoost build models sequentially by adding multiple weak learners, each correcting errors from the previous iteration. This approach captures complex patterns while reducing bias, as each new weak learner incrementally improves model performance. XGBoost further improves efficiency and accuracy through advanced regularization and optimization techniques [44]. The main aim of this study is to compare the performance of RF, GBR, and XGBoost on the given dataset, and to determine which of the algorithms yields the highest predictive performance for the CO2 DC in brine system. The dataset was randomly split into two subsets: a training set (80%) for developing the model and a testing set (20%) for the model validation. Hyperparameter tuning was performed for each model to optimize performance. The independent testing set was used to evaluate the model’s accuracy on unseen data, providing a reliable measure of its generalizable predictive capability.

2.2. Model Evaluation Metrics

To assess the performance of all the models developed, several indices have been used. These metrics not only accurately measure the performance of each model but also facilitate the quantity of comparison and ranking them. The indices that were used in this study are presented in Table 3 where the general equations for the indices are given to enhance clarity as well as overall performance description.

3. Results and Discussion

3.1. Model Development and Evaluation

The present study, three alternative regression models namely RF, GBR, and XGBoost were used to predict the DC of CO2 in brine system. The models were evaluated for their effectiveness and accuracy in predicting DC considering three input parameters including pressure, temperature, and salinity. To evaluate the performance of the developed models, different statistical metrics such as the R2, RMSE, and MAE were calculated and reported in Table 4 for both test and train dataset. To obtain reliable results, the dataset was divided into 80% for training and 20% for testing and all models were optimized through hyperparameters tuning to further ensure the reliability of the developed models. Figure 6 demonstrates the performance of the developed models using clustered column charts, for all three-performance evaluation metrics (R2, RMSE, and MAE), highlighting their predictive capabilities and reliability for both the test and train dataset.
The hyperparameters for each ML model were tuned to enhance accuracy and generalization. The optimized parameters are summarized in Table 5. RF model employed 500 estimators, a maximum depth of 10, and “auto” for feature selection, achieving robust performance. XGBoost utilized 500 estimators, a learning rate of 0.05, and a maximum depth of 3 to effectively capture non-linear patterns in the dataset. GBR incorporated advanced regularization with 1000 estimators, a learning rate of 0.03, and a subsample ratio of 0.8, balancing flexibility and generalization of the model. All models demonstrated excellent predictive accuracy, with RF achieving the highest R2 of 0.96 and 0.95 for the train and test dataset, respectively.
Among all ML models developed in this study, the RF model emerged as the most effective overall, from R2 and RMSE point of view, as represented in Figure 6c and Table 5. The RF model outperformed the XGBoost and GBR models, delivering superior evaluation metrics that demonstrated higher accuracy and better performance for both test and train datasets.

3.2. Visual Validation and Trend Analysis

To assess the predictive fidelity of the developed models, scatter plots were constructed to compare actual and predicted DC values (Figure 7). A model’s accuracy is reflected by the proximity of data points to the 1:1 diagonal line; the closer the alignment, the stronger the prediction. Among the models, the Random Forest (RF) exhibited the tightest clustering around the diagonal for both training and testing datasets, indicating minimal prediction bias and high generalization capability. The Gradient Boosted Regression (GBR) model also demonstrated strong performance, though with slightly greater deviation in the test data. XGBoost, while performing adequately in the training set, exhibited a more pronounced spread in the test set, suggesting reduced robustness compared to RF and GBR.
To further evaluate model performance across multiple error and accuracy dimensions, radar charts were employed, as shown in Figure 8. These charts facilitate a comprehensive comparison by representing each performance metric—R2, RMSE, and MAE—on a separate axis. In the radar plots, metrics with lower error values (MAE, RMSE) are positioned closer to the center, while higher R2 values are located further outward, indicating superior predictive strength. The RF model consistently outperformed both GBR and XGBoost across all metrics for both training and test sets, reaffirming its robustness and superior generalization. GBR followed closely, while XGBoost displayed relatively lower predictive consistency.
Molecular dynamics studies, such as Omrani et al. (2022) [27], highlight the effect of salinity on DC of CO2 in brine at different pressures and temperatures. At temperature 323 K and pressure 100 MPa, the CO2 DC decreased from 3.8327 × 10−9 m2/s in pure water to 3.1553 × 10−9 m2/s (17.68% reduction) at 1 mol/L NaCl, 2.49 × 10−9 m2/s (35.08% reduction) at 3 mol/L, and 1.34 × 10−9 m2/s (64.92% reduction) at 6 mol/L, due to increased salinity and solute–ion interactions. Similar trends were observed in our study: increasing the salinity at constant pressure (10 MPa) and temperature (323 K) resulted in a decrease in DC of CO2 as shown in Table 6. Figure 9 represents the relationship trends of predicted DC at different salinities and at constant temperature and pressure using ML algorithms. For example, at temperature 310 K and pressure 10 MPa, the CO2 DC decreased from 2.69 × 10−9 m2/s in pure water to 2.53 × 10−9 m2/s (5.94% reduction) at 1 mol/L NaCl, 1.48 × 10−9 m2/s (44.98% reduction) at 4 mol/L, and 1 × 10−9 m2/s (62.82% reduction) at 6 mol/L, due to increased salinity and solute–ion interactions.

3.3. Performance Comparison with Previous Studies

Previous research has successfully demonstrated the application of ML hybrid models such as PSO-ANFIS and MKSVM-GA in different studies. Feng et al. (2019) [32] and Bemani et al. (2020)) [33] made important contribution in predicting CO2 diffusion in brine using advanced models like MKSVM-GA and PSO-ANFIS, achieving high R2 of 0.9960 and 0.9993 for the training dataset, respectively. While their test results were valid, the studies relied on small datasets (92 and 86 data points, respectively), which increases the risk of overfitting. Hybrid models require large datasets to perform reliably when such models are applied to small datasets, they are prone to overfitting [45]. In contrast to hybrid models, our study employed RF, GBR, and XGBoost using 176 datapoints. Although the best R2 achieved in our study is 0.96 and 0.95 for train and test datasets, respectively, it is suggested that our model can be more reliable due to dependence on large interval, large dataset and close data points which help the ML model to identify the best pattern among the input parameters (pressure, temperature, and salinity) with the output (DC) when compared to Feng et al. (2019) [32] and Bemani et al. (2020) [33]. Additionally, most studies selected viscosity and density as an input parameter, which is important, but salinity has a more direct relationship with the DC [3]. Therefore, our study prioritized salinity as an input parameter with pressure and temperature to capture the pattern and relationship with the output, which is DC. RF, GBR, and XGBoost models displayed exceptional predictive accuracy as shown by their high R2 values (e.g., 0.95 for RF testing) along with minimal error (0.11 MAE and 0.03 RMSE for RF) among both training and testing data.
For the evaluation of any ML model, it is essential to analyze different performance metrics such as R2, RMSE, and MAE. Relying on a single evaluated metric may lead to misleading conclusions. Kouhi et al. (2025) [35] also predicted CO2 DC in the brine case with models including MLP, CFNN, RNN, and GEP with high RMSE values of 3.5452, 5.2872, 4.9287, and 5.5611, respectively, indicating significant errors in prediction performance although their R2 is very high, as shown in Table 1 [35]. Their input parameters were temperature, pressure, and density. In contrast, the present study achieved considerably lower RMSE values, as shown in Table 7 and Figure 10, indicating improved accuracy and reliability in the predictions.
Another important advantage of our models is their interpretability and computational efficiency. Regression-based models like RF and GBR offer valuable insights into feature importance through SHAP values, illustrating the importance of each input parameter—such as temperature, salinity, and pressure—impact the DC. In contrast, hybrid models like PSO-ANFIS and MKSVM-GA are mostly difficult to interpret, resource-intensive, and time-consuming. These combined strengths demonstrate that our models provide a balance of accuracy, interpretability, and efficiency for reliable predictions of DC across varied conditions.

3.4. Input Variables Significance

To determine the effect of input parameters on DC, SHAP (Shapley Additive Explanations) values were utilized, as shown in Figure 11a. The RF model was chosen for this analysis due to its high accuracy, interpretability, and ability to manage complex relationships within data. RF effectively ranks feature importance by aggregating results across multiple decision trees [46]. When combined with SHAP values, it provides a clear explanation of each parameter’s (e.g., pressure, temperature, and salinity) contribution to the target variable (e.g., DC), making it an ideal choice for accurately evaluating feature influence. As shown in Figure 11a, temperature has the most significant impact on predicting the DC, followed by salinity and pressure. Notably, salinity, a fluid-related parameter (brine), demonstrates the second-highest influence on estimating the DC.
The “bee swarm” plot illustrated by SHAP values in Figure 11b ranks variables by their mean absolute SHAP values in descending order, with the most important parameters appearing at the top. Each point on the plot represents a data instance, plotted against its impact on the predicted DC value. The color of the points indicates the relative magnitude of the feature values, ranging from low (blue) to high (red). For instance, higher salinity values (red points with negative SHAP values) correspond to lower predicted DC values, demonstrating a negative relationship. In contrast, temperature, the most significant feature, shows a broader range of SHAP values, with predominantly positive impacts on DC as its value increases. Similarly, pressure exhibits a moderate influence, contributing less significantly than temperature and salinity.
To better understand the impact of each parameter on DC and validate the results shown by RF model using Tornado and SHAP charts, dependency plots for the three key parameters were also generated using RF, which was chosen for its high accuracy. In Figure 12c, it is evident that higher salinity values have a negative impact on the DC. This aligns with the physical understanding that an increase in salinity leads to a reduction in DC [3,27]. Additionally, since salinity inherently represents the interaction between salinity and CO2, this further underscores the importance of salinity as a significant factor influencing the diffusion behavior of CO2 in brine.

3.5. Future Research Directions

To build on the findings of this study and further advance the understanding of CO2 diffusion in brine systems, several important research directions are recommended. First, a deeper investigation into long-term CO2–brine–rock interactions is necessary. These include geochemical and mineralogical changes, wettability alterations, and pore-scale structural evolution, all of which directly influence multiphase flow dynamics and storage integrity. Future studies should also focus on improving diffusion coefficient measurement techniques under reservoir conditions, especially by incorporating density-driven convection effects and exploring the role of advanced materials such as green nanofluids on interfacial behavior. Machine learning models capable of predicting CO2–brine interfacial tension would significantly enhance the reliability of diffusion estimations in heterogeneous systems.
Moreover, addressing subsurface heterogeneity and associated uncertainties remains a key priority. This can be achieved through stochastic modeling that incorporates spatial variability in porosity and permeability, as well as improved formation characterization techniques that reduce prediction errors in CO2 injection scenarios. Operational variables such as CO2–brine co-injection strategies, salinity variations, and salt precipitation effects also require further study, particularly in how they influence injectivity and storage efficiency. In parallel, the development of advanced monitoring tools for early detection of CO2 leakage and assessment of its impact on groundwater quality is critical for long-term risk management.
Finally, special attention should be paid to hydrate formation and stability in depleted gas reservoirs under CO2 injection, especially near the wellbore region where pressure and thermal gradients are prominent. Addressing these multifaceted challenges will not only close existing knowledge gaps but also enhance the accuracy, scalability, and field relevance of predictive models. These efforts will collectively support the safe and efficient deployment of carbon capture and storage technologies, contributing to global climate mitigation strategies.

4. Conclusions

In this study, 176 data points from the literature were collected and used to develop ML models for predicting and evaluating the effects of salinity, temperature, and pressure on the DC of CO2 in brine. Three advanced ML models RF, GBR, and XGBoost were selected due to its ability of reducing the chances of overfitting. The data split into 80% for training and 20% for testing. Among these, the RF model demonstrated superior accuracy and efficiency in estimating CO2 DC based on input parameters, significantly reducing the time required compared to laboratory experiments and molecular simulations. The results revealed that temperature is the most influential factor, positively correlating with DC, followed by salinity and pressure, as determined through three different techniques: Tornado chart, SHAP value analysis, and dependency plot. Salinity exhibited a negative correlation with DC across all models, with RF showing the highest accuracy and smallest error (R2 of 0.96 and RMSE value of 0.03 for the test dataset) in predicting DC values and sensitivity to salinity changes. While the proposed models effectively captured diffusion behavior within the studied parameter range (pressures of 0.1–30 MPa, temperatures of 286.15–398 K, and salinities up to 6 mol/L), their applicability is limited to these conditions. For future work, it would be advisable to use the molecular structure of the salt and use some advanced ML models capable of capturing molecular-level features, which would help to elucidate different mechanisms at the molecular scale. Moreover, there is still a lack of information about the composition of the injected CO2 in most ML models; including these parameters would further improve predictions of the diffusion coefficient. Most importantly, expanding the dataset to include additional experiments and input features such as permeability or porosity could further enhance model performance and robustness. These findings underscore the potential of ML models to provide a robust, data-driven framework for optimizing CO2 capture, storage, and enhanced oil recovery operations by accurately predicting diffusion behavior in brine systems.

Author Contributions

Conceptualization, Q.K., P.P. and F.H.; methodology, Q.K.; software, Q.K.; validation, Q.K., P.P., R.K. and F.H.; formal analysis, Q.K.; investigation, Q.K.; resources, Q.K., P.P. and F.H.; data curation, Q.K.; writing—original draft preparation, Q.K.; writing—review and editing, P.P., F.H. and R.K.; visualization, Q.K. and R.K.; supervision, P.P. and F.H.; project administration, P.P.; funding acquisition, P.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Nazarbayev University under the Faculty Development Competitive Research Grant (Grant No. 201223FD2608).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to express their sincere gratitude to Nazarbayev University for supporting this research through the Faculty Development Competitive Research Grant. We also acknowledge the contributions of previous researchers whose published datasets enabled this study. Special thanks go to the anonymous reviewers for their comments and suggestions, which significantly improved the quality of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Holtz, M.H.; Nance, P.K.; Finley, R.J. Reduction of Greenhouse Gas Emissions through CO2 EOR in Texas. Environ. Geosci. 2001, 8, 187–199. [Google Scholar] [CrossRef]
  2. Mosavat, N.; Abedini, A.; Torabi, F. Phase Behaviour of CO2–Brine and CO2–Oil Systems for CO2 Storage and Enhanced Oil Recovery: Experimental Studies. Energy Procedia 2014, 63, 5631–5645. [Google Scholar] [CrossRef]
  3. Wang, H.; Li, Y.; Li, C.; Zhu, H.; Li, Z.; Wang, L.; Medina-Rodriguez, B.X. Unveil the controls on CO2 diffusivity in saline brines for geological carbon storage. Geoenergy Sci. Eng. 2025, 244, 213483. [Google Scholar] [CrossRef]
  4. Zhang, Y.; Geng, W.; Chen, M.; Xu, X.; Jiang, L.; Song, Y. Experimental Measurements of the Diffusion Coefficient and Effective Diffusion Coefficient of CO2-Brine under Offshore CO2 Storage Conditions. Energy Fuels 2023, 37, 19695–19703. [Google Scholar] [CrossRef]
  5. Rezk, M.G.; Foroozesh, J.; Abdulrahman, A.; Gholinezhad, J. CO2 Diffusion and Dispersion in Porous Media: Review of Advances in Experimental Measurements and Mathematical Models. Energy Fuels 2022, 36, 133–155. [Google Scholar] [CrossRef]
  6. Renner, T.A. Measurement and Correlation of Diffusion Coefficients for CO2 and Rich-Gas Applications. SPE Reserv. Eng. 1988, 3, 517–523. [Google Scholar] [CrossRef]
  7. Feng, Q.; Xing, X.; Wang, S.; Liu, G.; Qin, Y.; Zhang, J. CO2 diffusion in shale oil based on molecular simulation and pore network model. Fuel 2024, 359, 130332. [Google Scholar] [CrossRef]
  8. Secuianu, C.; Maitland, G.C.; Trusler, J.P.M.; Wakeham, W.A. Mutual diffusion coefficients of aqueous KCl at high pressures measured by the Taylor dispersion method. J. Chem. Eng. Data 2011, 56, 4840–4848. [Google Scholar] [CrossRef]
  9. Jimenez, M.; Dietrich, N.; Cockx, A.; Hébrard, G. Experimental study of CO2 diffusion coefficient measurement at a planar gas–liquid interface by planar laser-induced fluorescence with inhibition. AIChE J. 2013, 59, 325–333. [Google Scholar] [CrossRef]
  10. Riazi, M.R.; Whitson, C.H. Estimating Diffusion Coefficients of Dense Fluids. Ind. Eng. Chem. Res. 1993, 32, 3081–3088. [Google Scholar] [CrossRef]
  11. Kumar, N.; Sampaio, M.A.; Ojha, K.; Hoteit, H.; Mandal, A. Fundamental aspects, mechanisms and emerging possibilities of CO2 miscible flooding in enhanced oil recovery: A review. Fuel 2022, 330, 125633. [Google Scholar] [CrossRef]
  12. Zhang, W.; Wu, S.; Ren, S.; Zhang, L.; Li, J. The modeling and experimental studies on the diffusion coefficient of CO2 in saline water. J. CO2 Util. 2015, 11, 49–53. [Google Scholar] [CrossRef]
  13. Yang, C.; Gu, Y. Accelerated mass transfer of CO2 in reservoir brine due to density-driven natural convection at high pressures and elevated temperatures. Ind. Eng. Chem. Res. 2006, 45, 2430–2436. [Google Scholar] [CrossRef]
  14. Lu, W.; Guo, H.; Chou, I.M.; Burruss, R.C.; Li, L. Determination of diffusion coefficients of carbon dioxide in water between 268 and 473 K in a high-pressure capillary optical cell within situ Raman spectroscopic measurements. Geochim. Cosmochim. Acta 2013, 115, 183–204. [Google Scholar] [CrossRef]
  15. Zarghami, S.; Boukadi, F.; Al-Wahaibi, Y. Diffusion of carbon dioxide in formation water as a result of CO2 enhanced oil recovery and CO2 sequestration. J. Pet. Explor. Prod. Technol. 2017, 7, 161–168. [Google Scholar] [CrossRef]
  16. Cadogan, S.P.; Maitland, G.C.; Trusler, J.P.M. Diffusion coefficients of CO2 and N2 in water at temperatures between 298.15 K and 423.15 K at pressures up to 45 MPa. J. Chem. Eng. Data 2014, 59, 519–525. [Google Scholar] [CrossRef]
  17. Ratcliff, G.A.; Holdcroft, J.G. Diffusivities of gases in aqueous electrolyte solutions. Trans. Inst. Chem. Eng. 1963, 41, 315–319. [Google Scholar]
  18. Sell, A.; Fadaei, H.; Kim, M.; Sinton, D. Micro fluidic Approach for Reservoir-Speci fi c Analysis. Environ. Sci. Technol. 2013, 47, 71–78. [Google Scholar] [CrossRef]
  19. Tewes, F.; Boury, F. Formation and rheological properties of the supercritical CO2—Water pure interface. J. Phys. Chem. B 2005, 109, 3990–3997. [Google Scholar] [CrossRef] [PubMed]
  20. Hirai, S.; Okazaki, K.; Yazawa, H.; Ito, H.; Tabe, Y.; Hijikata, K. Measurement of CO2 diffusion coefficient and application of LIF in pressurized water. Energy 1997, 22, 363–367. [Google Scholar] [CrossRef]
  21. Azin, R.; Mahmoudy, M.; Raad, S.M.J.; Osfouri, S. Measurement and modeling of CO2 diffusion coefficient in saline aquifer at reservoir conditions. Cent. Eur. J. Eng. 2013, 3, 585–594. [Google Scholar] [CrossRef]
  22. Yang, C.; Gu, Y. A New Method for Measuring Solvent Diffusivity in Heavy Oil by Dynamic Pendant Drop Shape Analysis (DPDSA). 2006. Available online: https://onepetro.org/SPEATCE/proceedings-abstract/03ATCE/03ATCE/SPE-84202-MS/137562 (accessed on 8 February 2025).
  23. Cadogan, S.P.; Hallett, J.P.; Maitland, G.C.; Trusler, J.P.M. Diffusion coefficients of carbon dioxide in brines measured using 13C pulsed-field gradient nuclear magnetic resonance. J. Chem. Eng. Data 2015, 60, 181–184. [Google Scholar] [CrossRef]
  24. Wilke, C.R.; Chang, P. Correlation of diffusion coefficients in dilute solutions. AIChE J. 1955, 1, 264–270. [Google Scholar] [CrossRef]
  25. Garcia-Ratés, M.; De Hemptinne, J.C.; Avalos, J.B.; Nieto-Draghi, C. Molecular modeling of diffusion coefficient and ionic conductivity of CO2 in aqueous ionic solutions. J. Phys. Chem. B 2012, 116, 2787–2800. [Google Scholar] [CrossRef]
  26. Numbere, D.; Brigham, W.E.; Standing, M.B. Correlations for Physical Properties of Petroleum Reservoir Brines. Master’s Thesis, Stanford University, Stanford, CA, USA, 1977. [Google Scholar] [CrossRef]
  27. Omrani, S.; Ghasemi, M.; Mahmoodpour, S.; Shafiei, A.; Rostami, B. Insights from molecular dynamics on CO2 diffusion coefficient in saline water over a wide range of temperatures, pressures, and salinity: CO2 geological storage implications. J. Mol. Liq. 2022, 345, 117868. [Google Scholar] [CrossRef]
  28. Tewes, F.; Boury, F. Dynamic and rheological properties of classic and macromolecular surfactant at the supercritical CO2–H2O interface. J. Supercrit. Fluids 2006, 37, 375–383. [Google Scholar] [CrossRef]
  29. Moghaddam, R.N.; Rostami, B.; Pourafshary, P. A method for dissolution rate quantification of convection-diffusion mechanism during CO2 storage in saline aquifers. Spec. Top. Rev. Porous Media 2013, 4, 13–21. [Google Scholar] [CrossRef]
  30. Belgodere, C.; Dubessy, J.; Vautrin, D.; Caumon, M.-C.; Sterpenich, J.; Pironon, J.; Robert, P.; Randi, A.; Birat, J.-P. Experimental determination of CO2 diffusion coefficient in aqueous solutions under pressure at room temperature via Raman spectroscopy: Impact of salinity (NaCl). J. Raman Spectrosc. 2015, 46, 1025–1032. [Google Scholar] [CrossRef]
  31. Helmy, T.; Al-Azani, S.; Bin-Obaidellah, O. A machine learning-based approach to estimate the CPU-burst time for processes in the computational grids. In Proceedings of the AIMS 2015, 3rd International Conference on Artificial Intelligence, Modelling and Simulation, Kota Kinabalu, Malaysia, 2–4 December 2015; pp. 3–8. [Google Scholar] [CrossRef]
  32. Feng, Q.; Cui, R.; Wang, S.; Zhang, J.; Jiang, Z. Estimation of CO2 diffusivity in brine by use of the genetic algorithm and mixed kernels-based support vector machine model. J. Energy Resour. Technol. 2019, 141, 041001. [Google Scholar] [CrossRef]
  33. Bemani, A.; Baghban, A.; Mosavi, A.; S, S. Estimating CO2-Brine diffusivity using hybrid models of ANFIS and evolutionary algorithms. Eng. Appl. Comput. Fluid Mech. 2020, 14, 818–834. [Google Scholar] [CrossRef]
  34. Amar, M.N.; Ghahfarokhi, A.J. Prediction of CO2 diffusivity in brine using white-box machine learning. J. Pet. Sci. Eng. 2020, 190, 107037. [Google Scholar] [CrossRef]
  35. Kouhi, M.M.; Kahzadvand, K.; Shahin, M.; Shafiei, A. New connectionist tools for prediction of CO2 diffusion coefficient in brine at high pressure and temperature ─ implications for CO2 sequestration in deep saline aquifers. Fuel 2025, 384, 134000. [Google Scholar] [CrossRef]
  36. Raad, S.M.J.; Azin, R.; Osfouri, S. Measurement of CO2 diffusivity in synthetic and saline aquifer solutions at reservoir conditions: The role of ion interactions. Heat Mass Transf. 2015, 51, 1587–1595. [Google Scholar] [CrossRef]
  37. Ahmadi, H.; Jamialahmadi, M.; Soulgani, B.S.; Dinarvand, N.; Sharafi, M.S. Experimental study and modelling on diffusion coefficient of CO2 in water. Fluid Phase Equilib. 2020, 523, 112584. [Google Scholar] [CrossRef]
  38. Tamimi, A.; Rinker, E.B.; Sandall, O.C. Diffusion Coefficients for Hydrogen Sulfide, Carbon Dioxide, and Nitrous Oxide in Water over the Temperature Range 293–368 K. J. Chem. Eng. Data 1994, 39, 330–332. [Google Scholar] [CrossRef]
  39. Polat, H.M.; Coelho, F.M.; Vlugt, T.J.H.; Franco, L.F.M.; Tsimpanogiannis, I.N.; Moultos, O.A. Diffusivity of CO2 in H2O: A Review of Experimental Studies and Molecular Simulations in the Bulk and in Confinement. J. Chem. Eng. Data 2023, 69, 3329. [Google Scholar] [CrossRef]
  40. Basilio, E.; Addassi, M.; Al-Juaied, M.; Hassanizadeh, S.M.; Hoteit, H. Improved pressure decay method for measuring CO2-water diffusion coefficient without convection interference. Adv. Water Resour. 2024, 183, 104608. [Google Scholar] [CrossRef]
  41. Mutoru, J.W.; Leahy-Dios, A.; Firoozabadi, A. Modeling infinite dilution and Fickian diffusion coefficients of carbon dioxide in water. AIChE J. 2011, 57, 1617–1627. [Google Scholar] [CrossRef]
  42. Caiola, G.; Reiter, J.P. Random Forests for Generating Partially Synthetic, Categorical Data. Trans. Data Priv. 2010, 3, 27–42. [Google Scholar]
  43. Probst, P.; Boulesteix, A.-L. To Tune or Not to Tune the Number of Trees in Random Forest. J. Mach. Learn. Res. 2018, 18, 1–18. [Google Scholar]
  44. Rathakrishnan, V.; Beddu, S.B.; Ahmed, A.N. Predicting compressive strength of high-performance concrete with high volume ground granulated blast-furnace slag replacement using boosting machine learning algorithms. Sci. Rep. 2022, 12, 9539. [Google Scholar] [CrossRef] [PubMed]
  45. Rather, I.H.; Kumar, S.; Gandomi, A.H. Breaking the data barrier: A review of deep learning techniques for democratizing AI with small datasets. Artif. Intell. Rev. 2024, 57, 226. [Google Scholar] [CrossRef]
  46. Fang, Y.; Gao, S.; Tai, D.; Middaugh, C.R.; Fang, J. Identification of properties important to protein aggregation using feature selection. BMC Bioinform. 2013, 14, 314. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Overview of direct and indirect methods used to measure CO2 DC in aqueous systems.
Figure 1. Overview of direct and indirect methods used to measure CO2 DC in aqueous systems.
Applsci 15 08536 g001
Figure 2. Workflow of the machine learning framework for CO2 DC prediction.
Figure 2. Workflow of the machine learning framework for CO2 DC prediction.
Applsci 15 08536 g002
Figure 3. Correlation heatmap illustrating relationships among P, T, salinity, and Diffusion Coefficient (DC).
Figure 3. Correlation heatmap illustrating relationships among P, T, salinity, and Diffusion Coefficient (DC).
Applsci 15 08536 g003
Figure 4. Violin plots depicting the distributions of (a) pressure (P) in MPa, (b) temperature (T) in K, (c) salinity in mol/L, and (d) diffusion coefficient.
Figure 4. Violin plots depicting the distributions of (a) pressure (P) in MPa, (b) temperature (T) in K, (c) salinity in mol/L, and (d) diffusion coefficient.
Applsci 15 08536 g004
Figure 5. Histograms show frequency of input parameters pressure (MPa), temperature (K), salinity (mol/L), and output parameters DC (10−9 m2/s).
Figure 5. Histograms show frequency of input parameters pressure (MPa), temperature (K), salinity (mol/L), and output parameters DC (10−9 m2/s).
Applsci 15 08536 g005
Figure 6. Comparative plots showcasing the performance of RF, GBR, and XGBoost models in terms of (a) RMSE, (b) MAE, and (c) R2 metrics across training and testing dataset.
Figure 6. Comparative plots showcasing the performance of RF, GBR, and XGBoost models in terms of (a) RMSE, (b) MAE, and (c) R2 metrics across training and testing dataset.
Applsci 15 08536 g006
Figure 7. Predicted vs. actual DC for (a) RF, (b) GBR, and (c) XGBoost models.
Figure 7. Predicted vs. actual DC for (a) RF, (b) GBR, and (c) XGBoost models.
Applsci 15 08536 g007
Figure 8. Radar chart comparing RF, GBR, and XGBoost models on RMSE, MAE, and R2 metrics for training and testing datasets, highlighting their prediction accuracy.
Figure 8. Radar chart comparing RF, GBR, and XGBoost models on RMSE, MAE, and R2 metrics for training and testing datasets, highlighting their prediction accuracy.
Applsci 15 08536 g008
Figure 9. Represents the relationship trends of predicted DC (10−9 m2/s) at different salinities (mol/L) at constant temperature (310 K) and pressure (10 MPa).
Figure 9. Represents the relationship trends of predicted DC (10−9 m2/s) at different salinities (mol/L) at constant temperature (310 K) and pressure (10 MPa).
Applsci 15 08536 g009
Figure 10. Comparison of previous ML models based on test RMSE value with present study.
Figure 10. Comparison of previous ML models based on test RMSE value with present study.
Applsci 15 08536 g010
Figure 11. (a) Tornado chart shows the importance of each parameter obtained from the RF model during hyperparameter tuning and (b) SHAP summary plot illustrating parameter importance.
Figure 11. (a) Tornado chart shows the importance of each parameter obtained from the RF model during hyperparameter tuning and (b) SHAP summary plot illustrating parameter importance.
Applsci 15 08536 g011
Figure 12. Dependency plots show the impact of (a) pressure (MPa), (b) temperature (K), and (c) salinity (mol/L) on the diffusion coefficient, as predicted by the RF model.
Figure 12. Dependency plots show the impact of (a) pressure (MPa), (b) temperature (K), and (c) salinity (mol/L) on the diffusion coefficient, as predicted by the RF model.
Applsci 15 08536 g012
Table 1. Comparison of models and evaluation metrics for predicting CO2 diffusion performance.
Table 1. Comparison of models and evaluation metrics for predicting CO2 diffusion performance.
SourceModelData Points (Train/Test)Parameters (Ranges)Train MetricsTest Metrics
Feng et al. (2019) [32]MKSVM-GA92 (72/20)T: 273–473.15 K
P: 0.1–49.3 MPa
µ: 0.139–1.950 mPa·s
R2: 0.9975
MAE: 0.1112 × 10−9 m2/s
RMSE: 0.1527 × 10−9 m2/s
MARE: 7.17%
R2: 0.9910
MAE: 0.2028 × 10−9 m2/s
RMSE: 0.3028 × 10−9 m2/s
MARE: 10.55%
Bemani et al. (2020) [33]PSO-ANFIS86 (N/A)T: 273–473.15 K
P: 0.1–49.3 MPa
µ: 0.139–1.950 Pa·s
R2: 0.9993
MARE: 2.0945%
RMSE: 0.0869
R2: 0.9978
MARE: 2.7188%
RMSE: 0.113
GA-ANFISR2: 0.9957
MARE: 4.2591%
RMSE: 0.2156
R2: 0.9932
MARE: 4.9245%
RMSE: 0.1976
ACO-ANFISR2: 0.9924
MARE: 5.9726%
RMSE: 0.2877
R2: 0.9854
MARE: 6.6933%
RMSE: 0.3161
BP-ANFISR2: 0.9862
MARE: 12.2787%
RMSE: 0.3905
R2: 0.9738
MARE: 12.787%
RMSE: 0.398
DE-ANFISR2: 0.9708
MARE: 14.545%
RMSE: 0.633
R2: 0.9514
MARE: 15.965%
RMSE: 0.633
Amar et al. (2020) [34]GEP92 (72/20)T: 273,473.15 K
P: 0.1–49.3 MPa
µ: 0.139–1.950 mPa·s
R2: 0.9980
AARD: 3.8584%
RMSE: 0.1427 × 10−9 m2/s
R2: 0.9978
AARD: 6.0035%
RMSE: 0.1245 × 10−9 m2/s
GMDHR2: 0.9943
AARD: 8.6269%
RMSE: 0.2479 × 10−9 m2/s
R2: 0.9874
AARD: 5.6292%
RMSE: 0.2271 × 10−9 m2/s
Kouhi et al. (2025) [35]MLP191 (80/20)P: 0.10–100 MPa
T: 210–673 K
Brine Density: 98.38–1400 kg/m3
DC: 0.0007–285 × 10−9 m2/s
R2: 0.9979
RMSE: 2.7521
MAE: 1.6421
R2: 0.9965
RMSE: 3.4812
MAE: 2.3647
CFNNR2: 0.9968
RMSE: 3.6024
MAE: 2.4597
R2: 0.9949
RMSE: 5.2113
MAE: 3.9210
RNNR2: 0.9974
RMSE: 2.9021
MAE: 1.8890
R2: 0.9958
RMSE: 4.8735
MAE: 3.2241
GEPR2: 0.9938
RMSE: 5.1432
MAE: 4.0023
R2: 0.9918
RMSE: 5.4981
MAE: 4.3184
Table 2. Descriptive statistics for the employed dataset of all models.
Table 2. Descriptive statistics for the employed dataset of all models.
StatisticP (MPa)T (K)Salinity (mol/L)DC (10−9 m2/s)
Count176176176176
Mean12.15320.062.352.07
Std Dev7.9926.352.420.97
Min0.10286.150.000.13
25%5.66300.150.511.47
Median10.00313.001.001.81
75%19.79341.154.002.73
Max30.00398.006.764.50
Table 3. Summary of evaluation metrics used to assess model performance.
Table 3. Summary of evaluation metrics used to assess model performance.
MetricExpressionDescriptionGood Range
Coefficient of
Determination (R2)
R 2 = 1 i = 1 N ( y i y ^ i ) 2 i = 1 N ( y i y ¯ ) 2 Measures the proportion of variance in observed data explained by the model. Higher values (closer to 1) indicate a better fit. R2 = 1 represents perfect fit, while
R2 = 0 indicates no explanatory power.
R2 > 0.75 (Very
Good)
Root Mean Square
Error (RMSE)
RMSE = 1 N i = 1 N ( y i y ^ i ) 2 Reflects the average magnitude of prediction errors, penalizing larger deviations more heavily. Lower values indicate better accuracy.RMSE → 0.15
(Lower is
Better)
Mean Absolute
Error (MAE)
M A E = 1 N i = 1 N y i y ^ i Represents the average absolute
difference between predicted and
observed values. Less sensitive to
outliers than RMSE. Lower values
indicate better performance.
MAE → 0.15
(Lower is
Better)
Table 4. Performance metrics for RF, GBR, and XGBoost models.
Table 4. Performance metrics for RF, GBR, and XGBoost models.
ModelSetMAERMSER2
RFTrain0.100.020.96
Test0.110.030.95
GBRTrain0.180.160.973
Test0.190.0260.925
XGBoostTrain0.120.1840.964
Test0.130.3890.91
Table 5. Optimized hyperparameters for RF, XGBoost, and GBR.
Table 5. Optimized hyperparameters for RF, XGBoost, and GBR.
HyperparametersRFXGBoostGBR
Number of Estimators (n estimators)5005001000
Learning Rate (learning rate)-0.050.03
Maximum Depth (max depth)1034
Minimum Samples Split (min samples split)2-8
Minimum Samples Leaf (min samples leaf)1-4
Maximum Features (max features)auto--
Subsample (subsample)--0.8
Random State (random state)424242
Table 6. Comparison of ML models precited DC at different salinities and at constant pressure and temperature.
Table 6. Comparison of ML models precited DC at different salinities and at constant pressure and temperature.
Salinity (mol/L)RF
DC (10−9 m2/s)
GBR
DC (10−9 m2/s)
XGBoost
DC (10−9 m2/s)
02.692.632.41
12.532.392.61
22.192.12
41.481.371.29
610.911.07
Table 7. Comparison of ML models based on test RMSE Performance metric with current studies.
Table 7. Comparison of ML models based on test RMSE Performance metric with current studies.
AuthorModelData PointsRMSE (Test)
Kouhi et al. (2025) [35]MLP1913.5452
CFNN5.2872
RNN4.9287
GEP5.5611
Bemani et al. (2020) [33]PSO-ANFIS860.113
Amar and Jahanbani
Ghahfarokhi (2020) [34]
GMDH920.2271
GEP0.1245
Feng et al. (2019) [32]MKSVM-GA0.3028
Current StudyRF1760.03
GBR0.026
XGBoost0.389
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Khan, Q.; Pourafshary, P.; Hadavimoghaddam, F.; Khoramian, R. Machine Learning Prediction of CO2 Diffusion in Brine: Model Development and Salinity Influence Under Reservoir Conditions. Appl. Sci. 2025, 15, 8536. https://doi.org/10.3390/app15158536

AMA Style

Khan Q, Pourafshary P, Hadavimoghaddam F, Khoramian R. Machine Learning Prediction of CO2 Diffusion in Brine: Model Development and Salinity Influence Under Reservoir Conditions. Applied Sciences. 2025; 15(15):8536. https://doi.org/10.3390/app15158536

Chicago/Turabian Style

Khan, Qaiser, Peyman Pourafshary, Fahimeh Hadavimoghaddam, and Reza Khoramian. 2025. "Machine Learning Prediction of CO2 Diffusion in Brine: Model Development and Salinity Influence Under Reservoir Conditions" Applied Sciences 15, no. 15: 8536. https://doi.org/10.3390/app15158536

APA Style

Khan, Q., Pourafshary, P., Hadavimoghaddam, F., & Khoramian, R. (2025). Machine Learning Prediction of CO2 Diffusion in Brine: Model Development and Salinity Influence Under Reservoir Conditions. Applied Sciences, 15(15), 8536. https://doi.org/10.3390/app15158536

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop