Extreme Learning Machine-Based Model for Solubility Estimation of Hydrocarbon Gases in Electrolyte Solutions

Calculating hydrocarbon components solubility of natural gases is known as one of the important issues for operational works in petroleum and chemical engineering. In this work, a novel solubility estimation tool has been proposed for hydrocarbon gases—including methane, ethane, propane, and butane—in aqueous electrolyte solutions based on extreme learning machine (ELM) algorithm. Comparing the ELM outputs with a comprehensive real databank which has 1175 solubility points yielded R-squared values of 0.985 and 0.987 for training and testing phases respectively. Furthermore, the visual comparison of estimated and actual hydrocarbon solubility led to confirm the ability of proposed solubility model. Additionally, sensitivity analysis has been employed on the input variables of model to identify their impacts on hydrocarbon solubility. Such a comprehensive and reliable study can help engineers and scientists to successfully determine the important thermodynamic properties, which are key factors in optimizing and designing different industrial units such as refineries and petrochemical plants.


Introduction
Solubility of hydrocarbon and nonhydrocarbon gases-i.e., mixtures of methane, ethane, propane, CO 2 , and N 2 in aqueous phases-is known as one of the important practical and theoretical challenges in petroleum, geochemical, and chemical engineering. This property has an effective role in different processes, such as achieving optimum conditions for oil and gas transportation, gas hydrate formation, designing thermal separation processes, gas sequestration for protecting environment, and coal gasification. Petroleum reservoirs normally have some natural gases with aqueous solution at high-pressure and high-temperature conditions so that the solubility of gas becomes attractive for engineers [1][2][3][4][5][6][7][8]. In production and transportation of hydrocarbons, it is possible that water content of gas undergoes an alteration in phase from vapor to ice and gas hydrates. The crystalline solid phases called gas hydrates are created when small-sized gas molecules are trapped in lattice of water molecules. Creation of hydrates can cause major flow assurance problems during production and transportation of hydrocarbons steps such as pipeline blockage, corrosion, and many other issues resulted from the two-phase flow [1,[9][10][11].
In the recent years, investigations on CO 2 solubility in aqueous electrolyte solutions have grown significantly as well as they are related to CO 2 capture and storage. It is a clear fact that the dominant cause of global warming is emission of CO 2 gas generated from fossil fuels so its sequestration and disposal in the ocean have been known as a reasonable choice to overcome global warming problems [12][13][14]. Simulation of enhanced oil recovery, design of supercritical extraction, and optimization of CO 2 dissolution in the ocean need a comprehensive knowledge about carbon dioxide solubility in aqueous electrolytes solutions [13][14][15].
Investigation of natural gas phase behavior in aqueous solutions in different operational conditions is known one of the important issues in the industry, which has wide applications for avoiding problems in designing and optimization of gas processing. In the literature, there are different solubility datasets for various gas-liquid systems. These datasets mostly include hydrocarbons' dissolution in water/brine systems [1,4,5,9,[16][17][18][19][20] and non-hydrocarbons such as CO 2 and N 2 dissolution in water/brine systems [7,[12][13][14]18,[21][22][23][24]. A brief summary of the hydrocarbon systems datasets is shown in Table 1 for hydrocarbons. The experimental data of water content of hydrocarbons and non-hydrocarbons are limited because of difficulties in measurement of the low water content gases at high pressure and low temperature. Mohammadi and coworkers expressed that an accurate estimation of water content can be obtained by gas solubility data, therefore, they overcame the complexities of experimental determination of the water content in natural gases [1]. Due to limited number of measurement data, wide attempts have been made to model and describe the gas-liquid equilibrium in aqueous electrolyte solutions. There are several thermodynamic models which uses the Henry's constant, activity coefficient, and cubic equations of state to obtain more information about the equilibrium conditions. The changes of Henry's constant for the pressure lower than 5 MPa are negligible and it is dominantly affected by temperatures [19]. The high dependency on temperature is obvious at low temperature and also the nonlinear decreasing relationship is observed at high temperatures [25]. Furthermore, there is just a limited number of Henry's constants for hydrocarbon systems at low temperature. According to this fact, there are several drawbacks in applying the Henry's law, whereas it has great ability for accurate prediction of solubility. As an example, it is suitable for dilute solutions or near-ideal solutions [26]. Additionally, this method is correct for single compounds in no chemical reaction conditions for aqueous phase. Another method is cubic EOS which has several advantages such as small number of parameters, computational efficiency, and ease of performance [3,4,21]. The EOSs were proposed originally for pure fluids, after that, their applications were expanded for mixtures by combining the constants from different pure components. This extension can be done by different methods such as Dalton's law of additive partial pressures and Amagat's rule of additive volumes [5]. For complex compounds, there are some limitations in accuracy of EOS which highlight the importance of empirical adjustments by dealing with the binary interaction parameters. In order to determine these parameters, a reliable source of experimental data for vapor-liquid equilibrium is required which induces some uncertainty into EOSs [7].
Due to above discussions, development of an accurate and reliable approach for estimation of solubility of hydrocarbons and non-hydrocarbons in aqueous electrolyte solutions has been highlighted. Nowadays, machine learning approaches have shown extensive applications in different topics [27][28][29][30][31][32][33][34][35]. This work organizes a novel artificial intelligence method called extreme learning machine (ELM) to estimate solubility of hydrocarbons in aqueous electrolyte mixtures in terms of types of gas, mole fractions of gases, pressure, temperature, and ionic strength.

Experimental Dataset Collection
In order to construct a highly accurate and comprehensive model capable of estimating the solubility of mixtures of hydrocarbons in aqueous electrolyte solutions, a comprehensive databank was provided based on existing experimental data in Table 1. This databank contains total number of 1175 solubility points for hydrocarbons (881 and 294 points for training and testing phases, respectively) (see Table S1 of data set in Supplementary Materials). According to the literature [1,4,5,9,[16][17][18][19][20], the solubility of gases in these systems is highly function of aqueous solutions, pressure, temperature, and gaseous phase composition. The aqueous phase composition was change into ionic strength (I) from salt concentrations to reduce dimensions of modeling process. The following equation presents the relationships between ionic strength, valance of charged ions (z i ), and molar concentration of each ion (m i ).
In this study, the solubility of hydrocarbons is predicted in terms of concentration of components in gaseous mixture, ionic strength of solution, temperature, and pressure.

Extreme Learning Machine
Huang proposed a new intelligence method based on single-layer feedforward neural network (SLFFNN) called extreme learning machine to satisfy the drawbacks of gradient-based algorithms such low training speed and low learning rate. In the ELM algorithm, the hidden nodes are selected randomly and the weights of output of the SLFFNN are calculated by applying Moore-Penrose generalized inverse [36,37].
The scheme of ELM algorithm is demonstrated in Figure 1. By assuming N training sets such as (x i , y i ) ∈ R n × R m for L hidden nodes, the SLFFNN algorithms can be written as In which, a i = [a i1 , . . . , a in ] T points to input weights matrix which is related to hidden nodes, β i = [β i1 , . . . , β im ] T represents the output weights matrix which is related to hidden nodes, and b i symbolizes the hidden layer bias.
The first step of this model is the random calculation of input weight and the bias of hidden layer for the training phase. Then, for determining these values, the hidden layer matrix is obtained by utilization of input variables. Then, the SLFFNN training is changed to a least-square problem. The ELM algorithms implement regularization theory to define a target function as [38][39][40]

Results and Discussion
In this study, the solubility of hydrocarbons in the aqueous electrolyte phase is determined based on ELM algorithm. To this end, the sigmoid function is set as activation function and the input weights were initialized randomly in range of (−1, 1). Additionally, the number of nodes in the hidden layers was estimated as 30 based on the lowest value of RMSE as determined in Figure 2. As shown, after 30 nodes, by increasing complexity of model, the testing error increased so the optimum structure of the algorithm has 30 nodes to prohibit overfitting. In the following, the statistical results of the estimation of hydrocarbon solubility are inserted in Table 2. The following equations are used to achieve this end: Root mean square error (RMSE) Mean squared error (MSE) As shown in Table 2  On the one hand, the comparison between the estimated and real hydrocarbons solubility in aqueous electrolyte solutions are shown in Figure 3. This depiction demonstrates an excellent agreement between estimated and real solubility values. Figure 4 also represents the regression plot of actual hydrocarbons solubility versus estimated one. A light cloud of data near the 45 • line expresses the validity and accuracy of ELM algorithm. Additionally, Figure 5 also shows the distribution of relative deviations between forecasted and actual hydrocarbons solubility in aqueous solutions. It can be seen that the ELM outputs deviate slightly from the real solubility and most of relative deviations are near to zero. Furthermore, Figure 6 shows the histograms of relative deviations for training and testing phases. In this demonstration, frequency diagram confirms that most of the error points are close to zero and also cumulative axis express the fact that range of deviation is very limited and the highest slope of the cumulative curve occurred near the zero point.    The ELM algorithm implemented in the current work shows an excellent ability in calculation of solubility of hydrocarbons in aqueous phases. One of the important factors which can influence the validation of model is degree of precision of utilized data. In order to clarify the accuracy of solubility databank, the leverage mathematical method is recruited. This method has some rules to identify the suspected solubility data so that a matrix which is known as hat matrix, should be constructed based on formulation [41][42][43][44][45] In which, U symbolizes a matrix of i × j dimensional. i and j are known as the number of algorithm parameter and training points which are used for determination of critical leverage limit as In order to detect the reliable zone, there are two standard residual indexes (−3 and 3) which are used in the leverage method. As shown in Figure 7, the reliable area is bound by these two residual indexes and critical leverage limit. The critical straight lines are shown by red and green colors. This plot is known as William's plot. In this plot, normalized residual is depicted versus hat value which is determined from the main diagonal of aforementioned matrix. It is obvious that the major number of solubility data are located in this area which expresses validation of the hydrocarbon solubility databank. In the most of parametric studies, it is a valuable attempt to identify the effectiveness of all inputs on the target. According to this fact, the sensitivity analysis is employed to investigate effect of concentration of components in gaseous mixture; ionic strength of solution; and temperature and pressure on the solubility of hydrocarbons in aqueous electrolyte systems. To this end, the relevancy factor should be determined as follows for each input parameter [46][47][48][49][50][51][52][53][54]: In which Y i and Y denote the 'i' th output and output average. X k,i and X k are known as 'k' th of input and average of input. Figure 8 shows the relevancy factor for each effective variable of hydrocarbon solubility. It is necessary to explain that the relevancy factor lies in range of −1 to 1 so that the higher absolute value has more impact on hydrocarbon solubility. Furthermore, the positive relevancy factor shows the straight relationship between input and target. The relevancy factors for pressure, temperature, the index of fraction, ionic strength, methane, ethane, propane, and butane mole fraction in gas phase are 0.52, 0.20, −0.48, −0.16, 0.11, 0.06, −0.19, and −0.07 respectively. According to this explanation and results, as pressure, temperature, and mole fraction of methane and ethane increase, the solubility of investigated hydrocarbon increases. Moreover, pressure and mole fraction of ethane in gaseous phase are the most and least effective parameters for determination of solubility of hydrocarbons.

Conclusions
The hydrocarbon solubility in aqueous electrolyte phases at high temperature and pressure conditions is known as a major effective parameter in variety of applications for petroleum industries and chemical engineering. Numerous attempts have been made in the current study to suggest a highly accurate and comprehensive predicting tool on the basis of extreme learning machine to calculate hydrocarbons solubility in wide ranges of operational conditions. Comparing the ELM outputs with a comprehensive real databank which has 1175 solubility points concluded to R-squared values of 0.985 and 0.987 for training and testing phases respectively. The excellent agreements of ELM and real hydrocarbon solubility values express that the ELM algorithm is a valuable tool for design and optimization of various processes that are related to vapor-liquid equilibrium. Furthermore, this study gives more information about the intensity of each input parameter on solubility of hydrocarbons. Due to the aforementioned results, this work has potential application in commercial software packages such as CMG and ECLIPSE for simulation of fluid flow in porous media.