On the Evaluation of Interfacial Tension (IFT) of CO 2 –Parafﬁn System for Enhanced Oil Recovery Process: Comparison of Empirical Correlations, Soft Computing Approaches, and Parachor Model

: Carbon dioxide-based enhanced oil-recovery (CO 2 -EOR) processes have gained considerable interest among other EOR methods. In this paper, based on the molecular weight of parafﬁns (n-alkanes), pressure, and temperature, the magnitude of CO 2 –n-alkanes interfacial tension (IFT) was determined by utilizing soft computing and mathematical modeling approaches, namely: (i) radial basis function (RBF) neural network (optimized by genetic algorithm (GA), gravitational search algorithm (GSA), imperialist competitive algorithm (ICA), particle swarm optimization (PSO), and ant colony optimization (ACO)), (ii) multilayer perception (MLP) neural network (optimized by Levenberg-Marquardt (LM)), and (iii) group method of data handling (GMDH). To do so, a broad range of laboratory data consisting of 879 data points collected from the literature was employed to develop the models. The proposed RBF-ICA model, with an average absolute percent relative error (AAPRE) of 4.42%, led to the most reliable predictions. Furthermore, the Parachor approach with different scaling exponents (n) in combination with seven equations of state (EOSs) was applied for IFT predictions of the CO 2 –n-heptane and CO 2 –n-decane systems. It was found that n = 4 was the optimum value to obtain precise IFT estimations; and combinations of the Parachor model with three-parameter Peng–Robinson and Soave–Redlich–Kwong EOSs could better estimate the IFT of the CO 2 –n-alkane systems, compared to other used EOSs.


Introduction
The need to recover the amount of oil left behind in oil reservoirs during primary and secondary oil-production operations has raised the tendency to develop different enhanced oil-recovery (EOR) approaches [1][2][3][4][5]. Miscible and/or immiscible injection of various gases, including nitrogen, flue gas, carbon dioxide, and natural gas, are considered as the most conventional EOR techniques [6][7][8]. As an efficient EOR strategy, CO 2 injection can lead to more oil recovery by reducing oil viscosity, swelling, and vaporizing during miscible flooding [9]. Furthermore, CO 2 injection is a widely accepted method due to the reduction of greenhouse gas emissions by sequestrating CO 2 in underground formations for a long term [10][11][12][13][14][15]. Thus, CO 2 injection, which has been introduced for both EOR and CO 2 sequestration purposes, is further intended to decrease the emission of greenhouse gases [12,16].
Among the driven mechanisms in the petroleum industry for gas injection, the miscible injection method is superior to the other techniques. In industrial processes, including distillation, evaporation, and absorption, the mass transfer rate in two-phase oil and gas systems plays a key role. It is clear that when the oil pressure goes below the bubble point, the gas is released, causing more interfacial traction between the two phases; as the pressure decreases further below the bubble pressure, more gas molecules are released from the liquid phase and the IFT value increases. The evolution rate of gas bubbles is also controlled by molecular diffusion. Therefore, modeling molecular diffusion is important, and several research studies have been conducted in this field, such as modeling vapor solvents in heavy oil and miscible gas injection in naturally fractured reservoirs to better understand the governing mechanisms [17][18][19][20][21][22].
IFT is used as a vital parameter in the production from oil reservoirs to determine residual oil remaining in the reservoir, predict the production potential of fractured reservoirs, and evaluate the recovery performance of EOR processes [23][24][25]. IFT, as a critical phase-behavior feature, affects CO 2 flooding efficiency [26]. With the dissolution of CO 2 gas in heavy oil, the IFT value, as well as the viscosity of the heavy crude oil, will decrease, which is favorable for EOR, especially when the viscous forces are the driving forces for the production of the oil trapped in the reservoir [27]. The reservoir temperature, pressure, and fluid composition are the key parameters that influence IFT extent [28,29]. Experimental and correlation-based methods are two distinct ways of determining IFT between two phases [30]. The empirical correlations for IFT estimation are categorized into classic and modern methods [31]. Experimental approaches for IFT measurement include capillary rise ring, drop weight, pendant drop, spinning drop, sessile drop, and Wilhelmy plate [32]. Among the laboratory IFT measurement methods, the pendant drop technique is known as the most reliable technique for two-phase fluid systems [33]. The IFT between the fluids can be determined by the shape of the suspended droplet from the needle [34]. Axisymmetric drop shape analysis-profile is the most suitable method used for IFT measurement [35]. Gasem et al. [36] and Rao [37] used the pendant drop technique to measure the IFT value of systems composed of various oil and gas mixtures at different pressure and temperature conditions. Nagarajan and Robinson [38] used a setup consisting of a gas chromatograph and an IFT view cell to simultaneously determine the IFT and fluid phase behavior of the CO 2 -n-decane system. After that, Shahvar et al. [39] improved the setup used by Nagarani and Robinson and obtained the IFT of CO 2 -n-alkane systems at different ranges of pressures. They applied the method introduced by Roush [40] to convert the droplet profile to Cartesian coordinates; and the Jennings and Pallas [41] methodology was then used for the subsequent estimation of the IFT.
From a modeling perspective, several techniques, including gradient theory [42], thermodynamic correlations [43], corresponding states theory [44], and the Parachor model [45], have been proposed for IFT prediction of the single-and/or multiphase flow systems. Among them, the Parachor model, which is an old-fashioned and logical method, gives adequate information about hydrocarbon mixtures at harsh conditions of reservoirs; and it is also a widely used method in the industry. Weinaug and Katz [46] used the average molar properties to extend the models suggested by Macleod [47] and Sugden [45] for pure components to mixtures. Equation (1) [47] can be used for IFT calculation of mixtures based on component mole fractions: where N stands for the total number of components; P ch,i denotes the Parachor value for component i; x i and y i are the mole fractions of component i in the liquid and gas phases, respectively; ρ l and ρ v introduce the molar density of the liquid and vapor phases, respectively; and E represents the scaling exponent. The value of E ranges between 3.45 and 4. The correlations based on the specific gravity, molecular weight, and/or critical properties of the components for the Parachor values have been suggested in previous studies [48,49]. Hough and Stegemeier [50] modified the correlation developed by Weinaug and Katz [46], and obtained more accurate results for the systems containing multicomponents. They introduced a constant of 3.67 for the scaling exponent in their correlation. A model for complex mixtures also was proposed by Lee and Chain [51]. They introduced the value of 3.91 as an alternative for the scaling exponent, and Parachor was estimated based on the critical properties of components. Firoozabadi et al. [48] proposed that the E = 4 can be applied in asphaltene-free oil reservoirs. They also employed an empirical equation to determine Parachor as a function of the molecular weight of components. Fawcett [52] developed a model to calculate the IFT of condensate systems. It was found that it was not easy to match the model parameters due to the lack of data in the previous studies for condensate systems. Liu et al. [53] studied the behavior of supercritical CO 2 at the n-decane/water interface. They conducted a molecular dynamics simulation; it was shown that the CO 2 gas acted as a surface-active agent to reduce the IFT between the phases. Nourozieh et al. [54] employed the Soave-Redlich-Kwong (SRK) and Peng-Robinson (PR) EOSs to model the equilibrium properties of CO 2 -n-decane and n-octadecane systems. Their work revealed that both EOSs had almost similar estimations for gas solubility, while PR EOS could better predict the density.
As discussed previously, it is of vital importance to attain a precise estimation of the oil-gas IFT, which exhibits an important role in the petroleum and chemical industries. Determination of the IFT through experimental methods is often expensive, tedious, and time-consuming. Using empirical correlations has some problems, such as insufficient accuracy. The theoretical techniques also require a considerable amount of numerical and computational calculations; for instance, applying an EOS as well as the flash calculations are necessary. In previous studies, Ayatollahi et al. [55] developed a least squares support vector machine (LSSVM) model to calculate the IFT values for CO 2 -n-alkanes systems. They used a relatively comprehensive data set containing almost 500 experimental data for n-alkanes (including nC4, nC7, nC10, nC12, nC14, and nC16) and CO 2 systems. After that, Ameli et al. [56] used the same data set and mixtures, which were incorporated in Ayatollahi et al.'s model. It was found that their radial basis function-genetic algorithm (RBF-GA) model surpassed the previously developed model.
In this study, we utilized a comprehensive data set of n-alkane and CO 2 containing 879 laboratory data, including n-alkanes from nC4 to nC20, and modeled the IFT by advanced smart models, namely a multilayer perceptron (MLP) neural network model trained by Levenberg-Marquardt, a radial basis function (RBF) neural network linked with various optimization techniques, as well as group method of data handling (GMDH) neural network to determine the IFT behavior of the hydrocarbon-CO 2 systems more accurately. To the best of our knowledge, this is the first study that benefits such extended ranges of input data and models to obtain accurate predictions/estimations of IFT for the CO 2 -nalkane systems. In addition to advanced hybrid smart models, we employed white-box models to generate explicit correlations for calculation of the IFT of the CO 2 -n-alkanes systems. In addition, the performance of the physical models (i.e., the Parachor model) for IFT prediction of the CO 2 -n-alkane systems, and the combination of these models with different EOSs, were evaluated. The precision and validity of the models were examined using a number of tables and graphical diagrams obtained from the statistical analysis. Finally, the results of the experimental measurements for IFT values of the CO 2 -n-decane and n-heptane systems were compared with those obtained from the developed models, proposed correlations, and the combination of the Parachor equation with various EOSs.

Data Collection
In order to develop a comprehensive model that could precisely predict the IFT value of the CO 2 -n-alkane systems, a thorough literature survey was conducted to collect 879 laboratory data from several open sources [22,[57][58][59][60][61]. The reliability and the operational conditions incorporated in the data bank utilized for the model development had a significant influence on the accuracy, applicability, and validity of the model. Using an unprecedentedly large data set in broad ranges of temperatures (from 297.85 to 443.05 K), pressures (from 0.1 to 23.01 MPa), and n-alkanes (nC4 to nC20), coupled with various reliable models, makes this study novel and provides notable progress in predicting/estimating the IFT values of crude oil-gas systems. Details of the utilized data for model development are tabulated in Table 1.

Multilayer Perceptron (MLP) Neural Networks
The MLP technique is one of the frequently used types of artificial neural networks (ANNs) [65][66][67]. MLP is an intelligent/smart approach for finding system properties; it was developed based on biological nervous systems. The principal features of such systems are to process elements (neurons) and links (interconnections). The interconnections task is to make connections among neurons, which have to process the information [66][67][68][69][70]. This technique is made of at least three layers. The first, second, and third layers of MLP are pertinent to input data, processing stages, and outputs, respectively. The intermediate layer is known as a hidden layer. The total number of variables corresponds to the total number of input neurons [68][69][70][71][72]. The trial-and-error method is employed to attain the optimum number of neurons and layers in the hidden layers. However, in complex systems, it is common to have a large number of neurons and layers for models to make a connection between input and output data. There are interconnections among the neurons of the hidden layers and those of the previous and next layers [70][71][72][73]. By adding the bias term to the corresponding results (neuron values × corresponding weights) in the last layer, the values of neurons in the output and/or hidden layers will be estimated [73][74][75]. An activation function is used for better processing of the data. We considered an MLP model with two hidden layers and activation functions such as tansig (for the first layer), logsig (for the second layer), and purelin (for the outer layer). Equation (2) describes the procedure to determine the MLP model outputs [72][73][74]: where W 1 and W 2 refer to the weight matrices of the two hidden layers; W 3 introduces the outer layer's weight matrix; and b 1 , b 2 , and b 3 are their corresponding bias terms. The algorithm employed for optimization and training the model has a crucial influence on the results of the MLP model. In this study, the Levenberg-Marquardt (LM) technique was employed as an optimization algorithm to optimize the parameters of the MLP neural network model.

Radial Basis Function (RBF) Neural Networks
The RBF neural network is a well-known neural network that is applied in both classification and regression problems. Broomhead and Lowe, for the first time, described the RBF neural network as a feedforward neural network [76]. The theory of function approximation was the origin of the RBF neural network idea. RBF neural networks have been broadly applied in many physical features, approximations, and mathematical investigations [77,78]. These networks can handle arbitrarily distributed data, simply induce to a multidimensional scope, and eventually offer reliable outcomes for the difficult cases [76] of either classification or regression. Generally, an RBF neural network has three feedforward layers, namely input, output, and an intermediate layer that relates the input layer to the output layer [75]. The input layer consists of several input nodes, which correspond to the number of input variables of the model. This layer is responsible for transmitting the input vectors to the hidden layer(s). The main element of the RBF neural network is its hidden layer, which has higher dimensions than the input space [79]. The points within the hidden layer are placed in a specific location with a determined radius, and the gap between their center and the input vector is measured in their neurons [80]. The center vector has cluster centers expressed by c ij , where j stands for the number of center vectors (j = 1, . . . , N). It should be mentioned that the number of input data incorporated in model training should be more than N [81]. In the training phase, the RBF neural network model makes a nonlinear transformation to capture the complexities, as well as to convey the input data into the hidden layer [82]. Some of the functions, including linear, Gaussian, cubic, thin plate spline, multiquadratic, generalized, and inverse multiquadratic, are of utmost application interest for radial basis functions [75,83]. In this study, due to the smoother behavior and extensive application of the RBF models, the Gaussian function was employed as the activation function. The center (c i ) and the spreading coefficient (σ 2 ) characterize the Gaussian function. The Euclidian norm, which is defined by Equation (3) [82], was employed to determine the location of the input vector (x) in terms of the center (c i ) of the Gaussian function: where C ij introduces the centers; and P denotes the number of variables. Eventually, r will be substituted into the Gaussian function as given below [80][81][82][83]: where ϕ is the Gaussian function, while r and σ 2 stand for the Euclidian distance and the spreading coefficient, respectively. The outcomes of the output layer were obtained using the following equation [80][81][82][83]: where N represents the size of the training samples; K n symbolizes the number of neurons in the hidden layer; y i introduces the j th output of input vector x; W i is the bias; and W ji refers to the weight linking the hidden node i to the output layer.
In this study, we proposed the implantation of some algorithms, such as Levenberg-Marquardt (LM), to optimize the MLP neural network model, and the imperialist competitive algorithm (ICA), particle swarm optimization (PSO), gravitational search algorithm (GSA), genetic algorithm (GA), and ant colony optimization (ACO) for optimization of the RBF neural network model. All of the optimization algorithms along with their corresponding flowcharts (see Figure S1a-e) are explained in the Supplementary Materials.

Group Method of Data Handling (GMDH)
Artificial intelligence (AI) approaches, including ANNs, GA, and GMDH, can be used as novel approaches to address complicated computational problems. Also known as the polynomial neural network (PNN), the GMDH is based on a layered framework in which quadratic polynomials are utilized to integrate, through crossing separate neurons over, in each layer to obtain a new generation of neurons in the next layer [84][85][86]. For this purpose, two independent neurons are connected via the quadratic polynomial method. The results of the combination generate new nodes for the next layer [84]. Shankar was the first to consider the GMDH in terms of a self-organizing system [85]. Subsequently, different versions of the algorithm were proposed by scholars from Poland and Japan [86], highlighting the application of GMDH as the superior approach to AI-based pattern recognition and short/long-term forecasts of random processes for complex systems. Reporting the first-ever research on the GDMH, Ivakhnenko focused on the proper selection of the quadratic polynomial expression [87]. As expressed by Equation (6), the Volterra-Kolmogorov-Gabor sequences can be applied to define the link between the inputs and outputs [87]: where Y introduces the model output; x i , x j . . . , and x k are the independent variables; and a, b i , c ij , d ij...k , and so on indicate the model coefficients defined by the algorithm. When two independent parameters are subjected to a quadratic polynomial, the original parameters are converted to new variables designated as Z 1 , Z 2 ,..., Z n , as given below [85][86][87]: Referring to Equation (7), the constants are quantified by the least-squares method (LSM). The LSM attempts to diminish the sum of squares of the error (SSE) to enhance the prediction precision, as expressed below [88]: where Z i represents the i th actual value and y ij denotes the i th prediction/estimation result (corresponding to the i th real value) of the j th polynomial equation, with δ j referring to the SSE of the j th polynomial. It should be noted that the polynomial corresponding to the least SSE is selected as the solution. If the solution is not desired enough, the current-generation parameters are subjected to other operations, including the division or multiplication, to generate new parameters that, together with the previous parameters, comprise a new generation. Hence, the numbers of parameters and equations are subject to change, upon which the process is relaunched right from the first step. The correlations obtained from the GMDH model to estimate the IFT between CO 2 and n-alkanes are provided in the Supplementary Materials in Table S1a-c, and a schematic view of the recommended model for molecular weights lower than 128 g·mol −1 is depicted in Figure 1. As the number of nodes in the GMDH model increases, the accuracy of correlation increases; however, the correlation becomes longer with more terms. In other words, the structure of the correlation becomes more complicated and needs more computational time. For calculating the IFT, it is needed to first calculate all nodes values (N 1 , N 2 , . . . , N 7 ) and then obtain the magnitude of IFT.

Results and Discussion
In the present study, the IFT between CO 2 and n-alkanes (nC 4 to nC 20 ) was scrutinized as a function of the pressure, temperature, and molecular weight of n-alkanes. As discussed in the model development section, the introduced approaches for predicting/estimating the IFT values were RBF-(ICA, GA, ACO, PSO, and GSA), MLP-LM, and GMDH. Smart models for all temperature and pressure ranges exhibited acceptable accuracies. However, to develop the mathematical model using GMDH, the experimental data were classified into three groups based on the molecular weights of n-alkanes, including less than 128 g·mol −1 from C 4 H 10 (58) to C 8 H 18 (114), more than or equal to 128 and less than 170 g·mol −1 from C 9 H 20 (128) to C 11 H 24 (156), and more than or equal to 170 g·mol −1 from C 12 H 26 (170) to C 20 H 42 (282). Furthermore, the real IFT data were compared against the results obtained from the Parachor approaches coupled with a number of equations of state, including the Zudkevitch-Joffe (ZJ), Schmidt-Wenzel (SW), Redlich-Kwong (RK), two-and threeparameter Peng-Robinson (PR2 and PR3, respectively), and two-and three-parameter Soave-Redlich-Kwong (SRK2 and SRK3, respectively). In order to assess the accuracies and reliabilities of the models, the entire set of data was separated into training (80% of the data) and testing subsets (20% of the data). It also should be noted that the data points in each subset were selected randomly.

Accuracy and Validity of the Models
The model accuracy for predicting the IFT of CO 2 -n-alkane systems was evaluated statistically based on the correlation coefficient (R 2 , higher is better) and average percent relative error (APRE, %), average absolute percent relative error (AAPRE, %), root mean square error (RMSE), and standard deviation (SD, lower is better) for the testing and training subsets, as well as the entire data set.
Based on Table 2, no significant difference was noticed between the training and testing subsets regardless of the considered optimization model, implying the possibility of overtraining, a common concern in smart modeling studies. Moreover, Table 2 reveals that the RBF-ICA model resulted in the lowest prediction errors, compared to other predictive techniques, with an APRE, AAPRE, RMSE, R 2 , and SD of −1.47%, 4.42%, 0.54, 0.99, and 0.02, respectively. In this respect, the methodologies used in this work were ranked in terms of accuracy as follows: RBF-ICA > RBF-GA > RBF-GSA > RBF-ACO > GMDH > RBF-PSO > MLP-LM. A comparison between the AAPRE values attained for the models introduced in this study is demonstrated in Figure S2. According to Figure S2, the RBF-ICA model, with an AAPRE of about 3.34%, provided the most accurate prediction for the IFT of the CO 2 -nalkanes systems. However, the MLP-LM model could not lead to accurate estimations, and the AAPRE attributed to this model was determined to be 12.48%. With a suitable approximation, the AAPRE magnitudes of the RBF-GA, RBF-GSA, and RBF-ACO models were close to each other. In addition, a similar AAPRE based on the predictions of the GMDH and RBF-PSO was obtained. It implied that not only the type of model, but also the optimization technique, is important for developing an accurate model. Figure 2 illustrates the cumulative frequency of the AAPRE for various models when applied to the entire database. As shown in Figure 2, the cumulative frequency of the estimation error was smaller than 20% for 95% of the RBF-ICA results, 94% of the RBF-GA results, 93% of the RBF-ACO results, 86% of the RBF-PSO results, 82% of the MLP-LM results, 94% of the RBF-GSA results, and 81% of the GMDH results. This finding implied the superiority of the RBF-ICA model for estimating the IFT of the CO 2 -n-alkane systems.  Figure S3 (panels a-g) depicts the graphical illustration of the APRE of the applied models. The red color indicates the maximum error in the IFT estimation, and the purple color represents the smallest error in model outputs. The peaks indicate the value of absolute percent relative error reported at each temperature and different molecular weights. At a specific point, as the peaks were higher, the error in predictions at that point was greater. Generally, all the proposed models showed acceptable results for IFT estimation of the CO 2 -n-alkane systems, according to their low values of APRE. Nevertheless, as is clear from Figure S3, for an extended range of molecular weight of n-alkanes, the RBF-ICA model resulted in precise estimations for IFT values. It is noteworthy that although the RBF-GA model was superior to the other predictive tools (i.e., RBF-ACO, RBF-GSA, RBF-PSO, MLP-LM, and GMDH), for n-alkanes with 150 g·mol −1 < MW < 200 g·mol −1 and 340 K < T < 380 K (MW is the molecular weight of n-alkane, and T refers to the temperature, respectively), the RBF-ACO model could better estimate the IFT values than the RBF-GA model. In addition, the GMDH model's results for IFT of the CO 2 -n-alkane systems were more precise than those for the MLP-LM model, particularly for n-alkanes with MW < 150 g·mol −1 . Figure 3 presents cross plots of the experimental versus predicted/estimated magnitudes of the CO 2 -n-alkane IFT by RBF-ICA model for the training and testing stages, respectively. These curves prove, graphically, the model precision and efficiency during both the training and testing phases over broad spectrums of pressure, temperature, and molecular weights of the n-alkane studied in this paper. N-alkanes have a crucial influence in controlling the IFT value of gas/crude oil systems, making them appropriate candidates for representing the oil for IFT modeling.  The deviation of the models' results from the laboratory data is visualized through the so-called graphical error analysis. Figure S4 demonstrates such an analysis for both training and testing phases. As the point clouds corresponding to different models are concentrated around the bisector of the diagram, it reveals that the models, the RBF-ICA in particular, successfully produce predictions/estimations that closely resemble corresponding experimental data. Moreover, the models exhibit no bias in the estimations and are rather quite symmetrically distributed around the bisector, implying the absence of significant over/underestimation. Figure 4 describes a graphical analysis of the mathematical model presented by Shang et al. [22]. Due to the large scattering of data around the X = Y line, it can be concluded that the model has low accuracy, which can be due to the validity of this model within certain temperature and pressure ranges.   The distributions of the IFT estimation errors using the RBF-ACO, RBF-ICA, RBF-GA, RBF-PSO, RBF-GSA, MLP-LM, and GMDH models versus the temperature are shown in Figure 5 (panels a-g). As it is clear, the models introduced in this study resulted in near-zero errors, confirming their accuracies and reliabilities.

Analyzing Trend of RBF-ICA Outcomes
In order to examine the impact of pressure on the IFT of the CO 2 -n-alkane binary systems, IFT versus pressure was plotted at various temperatures based on both the experimental data and modeling results. As demonstrated in Figure 6, the IFT of the CO 2 -n-alkane system, at a constant temperature, was reduced as the pressure of the system increased. The observed trend for IFT variation with the pressure was in good agreement with the findings of our previous investigation for CO 2 -n-heptane binary systems at various temperatures (from 313.15 to 393.15 K) [55]. It is broadly believed that the solubility of CO 2 gas into n-heptane increases with increasing pressure, causing a reduction in IFT between the phases [59]. Analyzing the temperature impact on the IFT of the CO 2 -n-heptane binary systems, Figure 7 (panels a and b) highlights an inverse relationship between the IFT and temperature at low pressures, which turned into a direct relationship at high pressures.

Sensitivity Analysis
Using the RBF-ICA model, a sensitivity analysis was conducted. For this purpose, a relevance factor (Equation (9)) in the range of −1 to +1 was considered for each input parameter, with the positive relevance factors corresponding to direct impacts on the model output, and vice versa [89]: where I k represents the k th model input (temperature, pressure, or MW of n-alkane); I k indicates the mean of the k th input; I k,i refers to the i th value of the k th input; and O and O denote the estimated and mean of the modeled IFT values, respectively. A higher value of r for an input parameter implies the higher effect of that parameter on the model outcomes [89]. Figure 8 depicts the outcomes of the sensitivity analysis performed on the input parameters (i.e., the MW of n-alkane, temperature, and pressure). According to Figure 8, the MW of the n-alkane was found to directly affect the predicted IFT, while the pressure and temperature exhibited inverse relationships with the IFT, as the obtained relevance factor had the negative sign. As discussed earlier, at high-pressure conditions, the IFT of CO 2 -n-alkane systems increase as the temperature increases. Contrarily, for low-pressure cases, the IFT will decrease upon an increase in the system temperature. Hence, the negative value of r for temperature revealed that the data points utilized in this study were measured mostly at low-pressure conditions. In addition, the magnitude of the relevance factor was maximal for pressure; this implies that its impact on the IFT was more than other input parameters.

Outlier Detection
An outlier is a data point that is significantly detached from other data points [90]. Usually, the outlier data may emerge in large sets of laboratory data, which could harm the reliability and accuracy of the developed model [70,91,92]. Thus, the exclusion of these data from the training data set is of paramount importance. In this work, we used the leverage statistic for outlier detection, which involves the differences between the experimental data and the corresponding estimated results [90,93]. The differences between the represented/predicted data are named "standardized cross-validated residuals" and are incorporated in a matrix called the Hat matrix [91,93]. The elements on the main diagonal of the Hat matrix are defined as hat values. The outliers can also be detected graphically by drawing the so-called William's plot. Figure 9 displays William's scheme for the results of the proposed RBF-ICA model while determining IFT values of the CO 2 -nalkane systems. As shown in Figure 9, bad high leverage were the points where, regardless of their hat value, their SR value was less than -3 and more than 3. These data were laboratory-suspicious. In addition, the data points with a hat less than hat* and SR between -3 and 3 are called good high leverage. These data are different from most of the data used and may be accurately estimated, but are outside the applicability domain of the model [55,91]. A list of the suspected data points that were out of the applicability of the models is provided in Table 4.

Comparison between Proposed and Pre-Existing Models
In this section, we first evaluate the density and MW of the phases in the CO 2 -n-decane systems. The Parachor approach is then implemented, into which the estimated parameters are incorporated, with different values of n (3.57, 3.66, 3.88, and 4) combined with seven different EOSs to determine IFT values. The Parachor parameters were determined as follows [48]: The EOSs used in this research included RK, PR, SRK, Schmidt-Wenzel (SW), and Zudkevitch-Joffe (ZJ). The RK and two-parameter PR (PR2) are equations of degree 3 with two empirical constants. These EOSs are widely used to obtain the physical properties and equilibrium behaviors of hydrocarbon liquid-vapor systems. RK introduced an EOS in which the bond of molecular tension with temperature is investigated in a manner similar to the Clausius equation. The advantage of this EOS over Clausius is that it does not have a third empirical constant. Soave-Redlich-Kwong suggested that the term a T 1 2 should be replaced with a temperature-dependent term, a T . In the SRK EOS, a T changes with temperature, and most applications of this equation are at a constant temperature. The temperature and pressure, reduced temperature, reduced vapor pressure, and acentric factor are involved in SRK EOS. Molecular attraction in the PR EOS is proposed in different sizes; a T , like in the SRK EOS, is temperature-dependent. The coefficients are also calculated using a number of mathematical relations. The SRK EOS is very accurate for substances with a small acentric factor (ω), while the PR EOS offers better results for substances/chemicals with ω of about 1.3. To reduce the possible error, the Schmidt-Wenzel relation introduces ω as the third parameter of attraction, as given below [48]: By plugging (ω) = 0 and (ω) = 1.3 in the SW equation, the SRK and PR EOSs are generated, respectively. These EOSs accurately predict liquid density. Therefore, it can be concluded that the SW EOS is the general form of the SRK and PR EOSs.
Matching the predicted data with the data measured in the saturation state has been recently used by almost all researchers to determine the parameters of the EOSs. Zudkevitch-Joffe suggested that the parameters need to be measured when required under system conditions (temperature and pressure), and that the relations need to be used to simplify calculations. Using correlations to calculate parameters certainly makes calculations easier. On the other hand, it reduces the accuracy of calculations, because each relationship has some deviation from the available data. In the method proposed by Zudkevitch-Joffe, when used for oil reservoirs, considering that the temperature in the reservoir is mostly constant, the calculations do not change appreciably, and only once is enough to calculate the parameters [94,95].
Using these EOSs coupled with the Parachor model, the IFT values of CO 2 -n alkane systems were estimated. Then, the IFT experimental data and the values predicted by the best intelligent model and the mathematical model presented in this research and the IFT values obtained using the mentioned EOSs were compared at the same thermodynamic conditions. This comparison showed that there was a better agreement between the IFT data predicted by the RBF-ICA algorithm and the experimental data than the values predicted by the EOSs under the same conditions. Figures 10 and 11 further confirm the high accuracy of RBF-ICA in forecasting IFT.
As is evident in Figure 10a, the PR3 and SW EOSs outperformed the other models in predicting the CO 2 -n-decane IFT values. Similar analyses were performed for the CO 2 -n-heptane systems, indicating the superiority of the PR2, PR3, SRK3, and ZJ EOSs (Figure 10b). In Figure 10a,b, the RK and SRK2 EOSs show low accuracy. Thus, it is recommended not to use these two EOSs to predict IFT. Figure 11a-c depict the experimental IFT data of the CO 2 -n-decane along with the predicted/estimated results using the RBF-ICA and GMDH models, as well as those evaluated by the Parachor equation combined with either of three EOSs, namely PR3, SRK3, and ZJ (see Tables 5 and 6). The outcomes suggested superior accuracy of the Parachor equation (at n = 4) combined with any of the PR3, SRK3, and ZJ EOSs. In order to properly check the results of the RBF-ICA and GMDH models and conduct a performance evaluation on other EOSs when coupled with the Parachor equation (at n = 4) for predicting the CO 2 -n-alkanes IFT (i.e., nC 7 and nC 10 ), the results of the models and EOSs were compared to the corresponding experimental data at various pressures. It is clear that the developed models in this study were superior to the Parachor models.    [13].

Equation of State Equation
Zudkevitch-Joffe (ZJ) Table 6. Parameters of the EOSs employed in this research work [13].

Equation of State Parameter
Zudkevitch-Joffe (ZJ) α and b can be determined using pressure and temperature. For intricate mixture.

Equation of State Parameter
Three-parameter Peng-Robinson (PR3)

Conclusions
Modeling the interfacial tension (IFT) of the CO 2 -n-alkanes systems, comprising 879 data points for various temperatures (varying from 297.85 K to 443.05 K), pressures (ranging from 0.10 MPa to 23.01 MPa) and n-alkane molecular weights (nC 4 to nC 20 ), was performed, and the following main outcomes were drawn:

1.
All the developed models for IFT prediction of the CO 2 -n-alkane systems yielded accurate results both for the training stage and the testing stage. The proposed techniques could be sorted in decreasing order of accuracy as follows: 2.

3.
The expected physical trends for the CO 2 -n-alkane systems were successfully followed, and the effects of pressure, temperature, and MW of n-alkanes on IFT behavior of the targeted systems were thoroughly studied. 4.
The higher value of the relevancy factor for pressure, in comparison with the values for temperature and MW of n-alkane, implied the significant impact of the pressure on the IFT of the CO 2 -n-alkane.

5.
By comparing the performance of the RBF-ICA, GMDH, and Parachor models in IFT estimation of the CO 2 -n-heptane and CO 2 -n-decane systems, the superiority of the RBF-ICA model over the other models was obvious. After that, the GMDH model, followed by the Parachor equation (n = 4) combined with the PR3, SRK3, and ZJ EOSs, showed an excellent match with the experimental data for the aforementioned binary systems. 6.
From a statistical viewpoint, both the used laboratory data and the developed models were valid. Only 1.9% of the used data were out of the applicability domain of the suggested RBF-ICA model, proving the high accuracy of the model.  Figure S2: the average absolute relative error based on the models introduced in this study. Figure Figure S4: experimental data versus RBF-ICA predictions of IFT between CO 2 and n-alkane: (a) training subset, and (b) testing subset. Table S1: developed correlations by the GMDH model to estimate the IFT between CO 2 and n-alkanes with different ranges of molecular weights of n-alkanes. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available in the Supplementary file.

Conflicts of Interest:
The authors declare no conflict of interest.