Data-Driven Analyses of Low Salinity Waterﬂooding in Carbonates

: Low salinity water (LSW) injection is a promising Enhanced Oil Recovery (EOR) technique that has the potential to improve oil recovery and has been studied by many researchers. LSW ﬂooding in carbonates has been widely evaluated by coreﬂooding tests in prior studies. A closer look at the literature on LSW in carbonates indicates a number of gaps and shortcomings. It is difﬁcult to understand the exact relationship between different controlling parameters and the LSW effect in carbonates. The active mechanisms involved in oil recovery improvement are still uncertain and more analyses are required. To predict LSW performance and study the mechanisms of oil displacement, data collected from available experimental studies on LSW injection in carbonates were analyzed using data analysis approaches. We used linear regression to study the linear relationships between single parameters and the incremental recovery factor (RF). Correlations between rock, oil, and brine properties and tertiary RF were weak and negligible. Subsequently, we analyzed the effect of oil/brine parameters on LSW performance using multivariable linear regression. Relatively strong linear correlations were found for a combination of oil/brine parameters and RF. We also studied the nonlinear relationships between parameters by applying machine learning (ML) nonlinear models, such as artiﬁcial neural network (ANN), support vector machine (SVM), and decision tree (DT). These models showed better data ﬁtting results compared to linear regression. Among the applied ML models, DT provided the best correlation for oil/brine parameters, as ANN and SVM overﬁtted the testing data. Finally, different mechanisms involved in the LSW effect were analyzed based on the changes in the efﬂuent PDIs concentration, interfacial tension, pH, zeta potential, and pressure drop.


Introduction
Estimates show that approximately 60% of the world's oil reserves are held in carbonate reservoirs [1]. The amount of oil that can be produced from these reservoirs by natural production is below 30%. This small value of oil recovery is due to the heterogeneity, low matrix permeability, presence of fractures, and oil-wet conditions in carbonates. Hence, enhanced oil recovery (EOR) methods are required to reduce the residual oil and increase oil production. Low salinity water (LSW) flooding is one of the promising techniques for EOR in carbonate formations. It is a process of injecting low saline water with an optimized ion composition into the reservoir in order to recover incremental oil [2].
Most of these mechanisms result in the alteration of wettability in carbonate rock, which is the most desirable and widely accepted reason for improving oil recovery by LSW. It is believed that some rock/fluid and fluid/fluid properties control oil recovery improvement using LSW, which should be considered to achieve a successful outcome.
Most carbonates are observed to be neutral or oil-wet [20]. This wettability can be related to the retention of the carboxylic group, with negative charging of heavy oil compounds on the positively charged rock surface. Injection of LSW with specific ions and interaction between the injected active ions, called potential determining ions (PDIs), and rock surfaces may alter the initial wettability, resulting in oil detachment and incremental oil recovery. PDIs are primarily sulfate, calcium, and magnesium (SO 4 2− , Ca 2+ , Mg 2+ ) ions that interact with the carbonate surface. Hence, the amount of active ions in the injected brine and the porous media are essential for alteration in fluid/rock interactions and the LSW performance [21][22][23][24][25]. The concentrations of inactive ions, such as Na + and Cl − , are also critical in influencing different mechanisms [26,27].
The acidic number (AN) is defined as the amount of KOH (in mg) required to neutralize 1 g of oil [28]. AN is another controlling parameter during LSW injection, as it determines the amount of carboxylic group in the crude oil, which has a major influence on the carbonate wettability. The effect of base number (BN), which is the quantity of the basic components in oil, is less than the effect of AN [29]. The importance of temperature during LSW flooding in limestones was investigated in different studies [21,30,31]. Measurements from several studies have identified that the wettability alteration in carbonates also involves changing the effluent pH during LSW injection [32,33]. Hence, AN, temperature, and pH are additional parameters affecting the success of LSW flooding in carbonate formations.
A closer look at the literature on LSW in carbonates indicates a number of gaps and shortcomings. It is challenging to understand the relationship between various parameters and the low salinity effect in carbonates. The mechanisms involved in increasing oil recovery are still not clear and more analyses are required. The data available from the literature can be analyzed, using data analysis methods, to predict the performance of LSW in carbonates and study the active mechanisms of oil displacement. By data analysis, it is possible to develop linear and nonlinear relationships between the variables and the recovery factor. Machine learning (ML) can be applied as a powerful tool to develop these models. ML methods have been successfully implemented in different aspects of "Exploration & Production" operations, such as analyzing LSW flooding in sandstones [34,35], fracture pressure predictions [36], relative permeability estimation [37], liquid holdup modeling in two-phase fluid flow [38], and phase classification problems [39].
Different parameters affect LSW performance (such as rock properties, oil acidity, injected water composition, temperature, and pH) but the relationships between them and oil recovery remain unclear. The influences of these parameters on the active mechanisms have not been clarified. Hence, a comprehensive study is required to examine different oil displacement studies to answer these questions. In this work, the available data of oil displacement at the core scale are collected and the effect of different parameters on active mechanisms and the performance of LSW are analyzed.

Data Collection and Cleaning
Experimental studies of oil displacement by LSW in carbonates, available in the literature, were carefully studied and the relevant core flooding tests were extracted. Fluid/rock properties and experimental results were collected in an unbiased manner, from the tables and graphs in various sources [3,17,21,24,[30][31][32][33]. Each data entry corresponds to a core flooding test. Both secondary and tertiary modes of LSW flooding in carbonates were considered in the data extraction process. The data from 145 core flooding tests were extracted and compiled. The laboratory experiments of oil displacement tests by LSW injection in limestone cores were categorized to extract information about the injection mode, injection sequences, and the controlling parameters that affect oil recovery.
The rock/fluid properties, which control the performance of LSW flooding, are shown in Table 1, including the number of available data points for each controlling parameter. Because not all of the parameters were reported in every LSW flooding experiment, there is a significant number of missing data for some parameters, which affects the accuracy of our models. Table 2 shows the minimum, maximum and mean values of the controlling parameters.  The low number of secondary core flooding tests in the literature indicates the greater importance of the application of LSW in tertiary mode and as an EOR approach. Hence, this study focused on analyzing data collected from tertiary core flooding experiments. There are 117 data points that show the incremental oil recovery by LSW injection in the tertiary mode. These data points range from 0 to 42% of OOIP (original oil in place), with a mean of 6.17% and standard deviation of 7.6%. Figure 1 shows the distribution of incremental oil recovery by tertiary recovery. The collected data points were organized and prepared for regression analysis. Different units for parameters (such as the composition of brines, total salinities, temperature, and pressure drop) were reported in the literature. Hence, all data were converted to a unified unit system in order to make comparative analyses. Different dimensionless numbers, such as dimensionless sulphate concentration (DS), cations concentration (DC), salinity (DTDS), acidity of oil (AB), and recovery factor (DRF), were developed to scale the controlling parameters while preserving their physical significance, as shown in Equations (1) where HS and LS show the high salinity and low salinity water conditions, respectively. SO 2− 4 is the concentration of sulphate (ppm), TDS is the total dissolved salts (ppm), Cations is the concentration of cations, AN is the acid number of crude oil (mgKOH/g), BN is the base number of crude oil (mgKOH/g), RF 3 is the recovery factor after tertiary flooding (% OOIP), RF 2 is the recovery factor after secondary flooding (%OOIP), and S wi is the initial water saturation (%).
The active mechanisms that explain the positive effect of LSW injection on oil recovery enhancement are difficult to establish. To study the relationships between the proposed mechanisms and the conditions required for LSW to work, data collected from available coreflooding tests were statistically analyzed. Mechanisms such as MIE, rock dissolution, IFT reduction, EDL expansion, and micro-dispersions were evaluated. By statistical analysis of the controlling parameters (including effluent PDI concentration, wettability, pressure drop, IFT, pH of effluent brine, and zeta potential), the occurrence of these mechanisms was investigated.

Data Analysis Methods
Machine learning (ML) methods were used to analyze the effect of single and multiple controlling parameters on the incremental oil recovery by LSW. Linear and nonlinear correlations were developed between different independent variables, such as dimensionless rock/fluid properties and oil recovery factor. The correlation coefficients were calculated to quantify the correlation strength.
Simple and multivariable linear regression models were applied to analyze the data. Simple linear correlations were developed by the least-squares method, which minimizes the summed squares of the vertical separation between the actual and the predicted recovery factor values from the regression of each independent variable. Multivariable linear regression models were used to study correlation among a group of independent variables and recovery factors. Nonlinear regression methods are more accurate in data analysis because they assume that the relationship between coefficients is not linear, which is more realistic in many cases. Machine learning algorithms (such as decision tree (DT), artificial neural network (ANN), and support vector machine (SVM)) were applied to assess the contribution of different parameters to the LSW effect. Sensitivity analysis was conducted to select the number of simulations, and 5000 simulations were run based on the results of the analysis ( Table 3). The best training and testing coefficients were obtained with 5000 simulations for ML models. ANN is based on the analysis of connections between components, which are also known as neurons in input, hidden, and output layers [65]. Variables called weights are assigned to the connections to represent the contribution of the input variables to output [65]. The structure of ANN model was chosen based on the sensitivity analysis (Table 4). Oil/brine parameters based on 500 data entries were used in the analysis. In the ANN used here, 1 hidden layer with 4 neurons was chosen. Coefficient of determination, R 2 , is the squared correlation coefficient, R. The values of the coefficients vary from 0 to 1, representing a no linear relationship and good linear relationship, respectively. R 2 is calculated by: whereR F i is the predicted RF, and RF i is the mean of RF. The DT method is a supervised ML algorithm which is based on a group of nodes called root, decision, and leaf nodes [66]. Each leaf represents a numeric value for an important independent variable [67]. The SVM model is also a supervised learning method that is used in handling regression and classification problems [67]. This model can fit variables using a nonlinear transformation equation to predict responses of predictor data [68,69].
p-value and coefficients of correlation and determination (R and R 2 , respectively) were calculated for each regression model. p-value represents the probability that the null hypothesis is true, and it shows if the change in the model is the cause of the desired result or not. A low p-value is preferable for regression models.
When the number of variables increases, R 2 usually increases, even with the same data set. The adjusted R 2 is used to minimize the impact of the number of variables, and it is calculated by: where k is the number of independent variables. To compare the results, we used the qualitative interpretation of the relationship strength based on the correlation coefficient [34].

Results and Discussion
Using data analysis approaches, we studied different rock/fluid properties to seek effective parameters. We also analyzed the effect of the rock, fluid, and crude oil properties (both individually and together) on the incremental oil recovery achieved by LSW flooding. This section presents the findings of the clarification of effective parameters and active mechanisms during LSW flooding, using data analysis methods.

Effect of LSW Governing Parameters on Oil Recovery
Injection of LSW affects the interaction between rock and fluids and alters parameters such as wettability and surface charges. Although the dependence of LSW performance on various controlling parameters in carbonates has been proved in different experimental studies, there are some contradictions reported in the literature. The preliminary data analysis of the incremental oil recovery by LSW shows that the crucial oil/brine/rock parameters are permeability, salinity of brines, cations concentration, SO 4 2− concentration, AN, BN, and temperature. Table 5 summarizes the single-variable linear regression of each parameter and incremental oil recovery. Only coefficients of correlation obtained for temperature and BN showed weak linear relationships. Among the brine parameters, the linear regression of SO 4 2− concentration against RF showed better data-fitting than the others. However, it is not enough to explain the variance of LSW effect. The linear relationships between single parameters and the improved recovery factor are mostly negligible, so a single parameter cannot explain the LSW performance in carbonates. It is thus inferred that LSW effect is probably the synergistic result of several properties. Since linear regression between single variables and incremental RF failed in the interpretation of LSW performance, the combined effects of controlling parameters on the LSW effect were investigated. For this purpose, we analyzed the oil/brine effect using multivariable regression and nonlinear regression techniques. The dimensionless numbers shown in Equations (1)-(5) were applied to preserve the physical significance of controlling parameters. The predicted RF values are calculated based on the estimated regression coefficients using independent variables.

Linear Multivariable Regression
A total of 96 data points reported simultaneous PDIs and salinity in the experimental studies available. If the acid number is also considered, the number of data points reduces to 42. We compared the effect of different combinations of these parameters by multivariable linear regression, as shown in Table 6. Inclusion of salinity improved the regression model, as the adjusted R 2 becomes higher and the p-value for ion concentration variables decreased. Adjusted R 2 increased when TDS was added to the model, suggesting better data-fitting. Figures 2 and 3 show the predicted RF from linear regression models for two cases against actual RF values.

Nonlinear Multivariable Regression
Linear regression analysis did not show acceptable results to explain the relationship between governing parameters and the LSW effect. The strengths of the relationships from the multivariable linear regression model for different sets of variables were found to be from negligible to weak. Hence, no linear relationship between parameters and RF was established. We applied ML approaches and nonlinear regression models to further analyze these parameters. Data analyses were conducted using three different ML models: SVM, ANN, and DT. The random division of data points was achieved by separating them into training and testing groups, in the proportion of 0.7 to 0.3. Average correlation coefficients were obtained from 5000 simulations. The best-fit model was found to interpret the LSW performance.
Oil/brine parameters were analyzed in this section. Different models and average coefficients of correlation obtained from three ML techniques are shown in Table 7. The best interpretation of LSW flooding based on brine parameters was achieved by DT with a minimum leaf size of 10, and the correlation coefficients for training and testing data are the highest among all ML models. A set of oil/brine properties, including dimensionless brine parameters and AN, were analyzed based on 42 data entries. All three ML models showed strong and very strong relationships between oil/brine parameters and RF; the data was a better fit than the case with only brine parameters. For brine properties, DT provided the best fit, as the average values of R are considered to be strong and moderate for training and testing data, respectively. For oil/brine parameters, ANN showed the highest training results for average R (0.75) but overfitted the testing data. DT yielded high correlation coefficients (0.68 for training and 0.63 for testing) with negligible overfitting, exhibiting good performance. Figures 4 and 5 show predicted RF values from the DT model and actual RF values for brine and oil/brine parameters. The average values and ranges of R, obtained from 5000 simulations, are illustrated in Figures 4 and 5. Table 8 shows the strengths of nonlinear relationships for these ML models.    Using linear regression, we showed that the LSW effect could not be modeled based on a single parameter, it is the result of combined contributions by several parameters. Therefore, we made predictions of LSW based on a set of main parameters and discovered that the best prediction was made using oil/brine properties. ML models helped us to achieve better results in explaining the connection between a set of controlling parameters and the incremental RF by LSW flooding.

Linking Mechanisms to Parameters
In previous studies, different mechanisms were proposed for governing LSW performance in carbonates [6,9,12,14]. The change in PDI concentration (Ca 2+ , Mg 2+ and SO 4 2− ions) in the injected and effluent brine can be used to study the MIE and rock dissolution mechanisms. When MIE is dominant, Ca 2+ , Mg 2+ and SO 4 2− decrease due to the adsorption of ions onto the rock surface. In contrast, the rock dissolution mechanism involves a rise in the effluent Ca 2+ and SO 4 2− concentrations. Alterations in IFT values explain the IFT reduction mechanism. Change in zeta potential can be evidence to show the EDL expansion mechanism.
A total of 24 data recordings of Ca 2+ concentration changes in the effluent were found in the literature. RF values observed for cases with either an increase, a decrease or no change in effluent Ca 2+ concentration are compared in Figure 6. A relatively equal number of rises and reductions of Ca 2+ concentration were found in the experimental studies (13 and 10 data points). A similar RF is achieved when either an increase or a decrease in Ca 2+ concentration was measured, which means that MIE and rock dissolution mechanisms have almost the same strength in detaching and recovering oil. There are nine data points containing the effluent Mg 2+ concentration collected from experimental studies. Approximately the same number of data points reported an increase and a decrease in Mg 2+ concentrations. Average RF and temperature were higher when Mg 2+ ion concentrations decreased in the effluent (Figure 7). This can be explained by the effect of temperature on increased Mg 2+ activity towards the carbonate surface, which results in the adsorption of cations [23]. SO 4 2− concentration change in the effluent brine was reported in 14 coreflooding tests, as shown in Figure 8. Reduction of SO 4 2− concentrations in effluent was recorded in seven tests and an increase in anion concentration was measured in six experiments. Higher average RF corresponded to decreases in SO 4 2− concentration which, in combination with Ca 2+ reduction, supports the MIE mechanism. Changes in the concentrations of all PDIs should be analyzed together, to evaluate the possible active mechanism. Table 9 shows the recordings of ion changes in the effluent brine and proposed mechanisms for these cases. Alteration in PDIs is an indicator of the MIE/rock dissolution mechanism. It is generally accepted that a simultaneous decrease in cations and anion concentrations in the effluent shows the ion exchange on the carbonate rock surface. On the other hand, an increase in these ions is due to the dissolution of the carbonate surface. Brine/oil interfacial tension reduction was suggested as a mechanism that is active during LSW flooding, which affects the capillary force and reduces the residual oil. There were 17 measurements of IFT collected from experiments after the secondary and tertiary stages of flooding. In the coreflooding tests, measurements mostly indicated a decrease in IFT. We also analyzed RF for different ranges of IFT decrease and noticed that higher incremental oil is recovered by greater changes in IFT, as shown in Figure 9. Hence, this mechanism can be considered effective only if the change in IFT is large enough. Changes in zeta potential (due to the reduced concentration of cations, such as Mg 2+ and Ca 2+ , in LSW) results in the predominance of repulsive forces [9]. Thus, the EDL expands, and water-wet films become thicker and more stable. As a result, oil components are desorbed, and oil recovery is improved [9]. There are 14 experimental measurements of zeta potential of carbonate rock before and after the contact with LSW; 12 of them reported that zeta potential changed for more than 6 mV and became more negative, as shown in Figure 10. Even a small change in zeta potential can yield a noticeable improvement in oil recovery. Hence, zeta potential cannot solely show the LSW active mechanism.
During LSW injection, pressure drop is expected to decrease due to a change in relative permeability as a result of switching from high to low salinity brines [56]. We collected the recordings of pressure change and found 56 data points reporting decreases in pressure drop and no change was observed in only two experiments. Approximately the same average RF was obtained for both cases, which shows that pressure drop should be analyzed along with other parameters to evaluate the performance of LSW. Alteration of the wettability to a more water-wet state was also observed in almost all experimental studies, by comparing the contact angle and wettability index values before and after LSW tests. From a total of 61 data points, 57 core flooding tests reported a change of wettability toward a more water-wet state. Alteration toward more oil-wet conditions was only found in four experiments. More water-wet conditions were achieved due to alteration in the rock surface by MIE, rock dissolution, and expansion of EDL mechanisms that detach the oil from the rock.
Different mechanisms have been suggested by researchers. There are 54 recordings of mechanisms proposed for LSW flooding tests in the literature. Figure 11 compares the number of tests that mentioned different mechanisms. The most popular mechanisms were rock dissolution and EDL expansion. MIE was suggested as an active mechanism for LSW injection on eleven occasions, based on PDI concentration measurements. IFT reduction is the least popular mechanism in experimental studies. Reduction in IFT is not large enough to significantly change the capillary number. LSW performance cannot be explained by one mechanism, as different parameters (such as PDI concentration change, IFT reduction, and zeta potential) are found to have a correlation with oil recovery. However, all of these mechanisms contribute to the wetta-bility alteration and after a change of wettability toward a water-wet state, oil recovery is improved by LSW.

•
Different single parameters (such as salinity, contrast in salinity change, PDI concentration, oil acidity, base number of crude oil, permeability, and temperature) were individually analyzed using linear regression to study their correlation with the incremental oil recovery by LSW flooding. Negligible and weak relationships indicate that a single parameter is not sufficient to explain the performance of LSW injection. • Among groups of parameters, a set of oil/brine parameters that include AN, alteration in salinity, SO 4 2− and cation concentrations, showed the best, but still weak, correlation. So, linear correlations are insufficient to forecast LSW potential. • A nonlinear relationship between parameters and RF was observed using ML models. Among the ML models, DT produced the best correlation for brine only parameters; the correlation coefficients for training and testing data were 0.57 and 0.35, respectively. For oil/brine parameters, all models showed strong and very strong relationships. However, ANN and SVM showed unsatisfactory results for testing data due to overfitting. In contrast, less overfitting was achieved by DT, where the correlation coefficients for training and testing data were 0.68 and 0.63, respectively. • Several mechanisms involved in the LSW process and the LSW effect cannot be explained by a single mechanism. MIE and rock dissolution are the most widely accepted mechanisms in the literature. These mechanisms result in wettability alteration in coreflooding tests in carbonates. Our studies showed that, by analyzing oil/brine parameters, a better understanding of the active mechanisms during LSW can be achieved, and it is possible to predict the mechanism by analyzing parameters such as salinity, ion concentrations, pH, and IFT. • Future research should be further conducted to confirm these findings by increasing the data set size. In addition, with more experimental data, other parameters should be added to the model to show fluid/fluid interactions.
Author Contributions: Conceptualization, methodology, investigation, writing-original draft preparation, R.S.; writing-review and editing, supervision, P.P. and L.W. All authors have read and agreed to the published version of the manuscript.

Funding:
The authors would like to acknowledge Nazarbayev University for aiding this research through the NU Faculty Development Competitive Research Grants program (Award number: 110119FD4541).