Prediction of CO 2 Solubility in Ionic Liquids Based on Multi-Model Fusion Method

: Reducing the emissions of greenhouse gas is a worldwide problem that needs to be solved urgently for sustainable development in the future. The solubility of CO 2 in ionic liquids is one of the important basic data for capturing CO 2 . Considering the disadvantages of experimental measurements, e.g., time-consuming and expensive, the complex parameters of mechanism modeling and the poor stability of single data-driven modeling, a multi-model fusion modeling method is proposed in order to predict the solubility of CO 2 in ionic liquids. The multiple sub-models are built by the training set. The sub-models with better performance are selected through the validation set. Then, linear fusion models are established by minimizing the sum of squares of the error and information entropy method respectively. Finally, the performance of the fusion model is veriﬁed by the test set. The results showed that the prediction e ﬀ ect of the linear fusion models is better than that of the other three optimal sub-models. The prediction e ﬀ ect of the linear fusion model based on information entropy method is better than that of the least square error method. Through the research work, an e ﬀ ective and feasible modeling method is provided for accurately predicting the solubility of CO 2 in ionic liquids. It can provide important basic conditions for evaluating and screening higher selective ionic liquids.


Introduction
Nowadays, energy crises and environmental issues are frontier problems that arouse great concern. Reducing the emissions of CO 2 is one of the crucial challenges for sustainable development in the future. Carbon Capture and Storage (CCS) is by far a mature theory to study the reduction of CO 2 emissions. The ionic liquids (ILs) have some properties of low volatility, high solubility and high selectivity, which make them increasingly interesting in capturing CO 2 . These advantages make the ILs considered as a relatively novel type of solvents [1][2][3][4].
The value about the solubility of CO 2 in ILs is important information. It can not only help us to study the interaction between CO 2 and ILs, but also provides important guidance for ILs design that meets industrial needs [5,6]. At present, the main methods for obtaining solubility of CO 2 in ILs include experimental measurement and modeling. Due to the difficulties, e.g., the non-ideal behavior of the research system, the complexity of ionic liquid system, the limited measurement conditions, the time-consuming and high costs on the measurement of ILs, it is impossible to obtain the solubility by the experimental measurement method for practical applications [7,8]. The modeling methods mainly consist of mechanism modeling and data-driven modeling.
The thermodynamic models have the advantages of clear engineering background, strong interpretability and good extrapolation ability. For these reasons, researchers have tried to present in many fields of chemistry, chemical industry and economy that have a degree of difficulty and are complex to solve.
The approximation and generalization ability of the BP neural network model strongly depends on the samples [22,23]. The convergence velocity of the algorithm of the model is slow, and is easily trapped to a local optimum. In the process of establishing the BP neural network model, the undertraining and overtraining of model will affect the prediction effectiveness. Therefore, it is important to reasonably select the number of the hidden layers and neurons in each hidden layer [24].

Support Vector Machine
Support Vector Machine (SVM) is a supervised learning method that is developed from statistical learning theory and is similar to neural network [25]. The basic idea of this design is based on Vapnik Chervonenkis (VC) dimension theory and structural risk minimization principle. Under the finite sample information, the complexity and learning ability of the model are adjusted by constructing the loss function, and then the model with better prediction performance is established.
In the process of developing a support vector machine model, different kernel functions and non-linear mapping can map the input patterns into different higher dimensional linear feature space [26,27]. In order to obtain better performance of SVM models, it is necessary to select the kernel function type and optimize the relevant parameters of the kernel function reasonably. The commonly used kernel functions are polynomial kernel function, radial basis kernel function and sigmoid kernel function.
The expression of the polynomial kernel function is as follows: Parameter q represents the order of the polynomial. The expression of the radial basis kernel function is as follows: Parameter σ represents the core width. The expression of the sigmoid kernel function is as follows: Parameter v represents a scalar and c displacement parameters.
The SVM model has advantages in solving non-linear, local minima and high dimensional pattern recognition and regression prediction problems. Although it shows certain robustness on sample sets and has little impact on the model when adding or removing samples, it is difficult to be applied to large training sets and has limitations on multi-classification problems.

Extreme Learning Machine
Extreme Learning Machine (ELM) is a new network learning algorithm based on an improved traditional neural network [28]. The weights between input layer and hidden layer are generated randomly or artificially and the output weights are determined analytically during the learning process. It is a single hidden layer feed forward artificial neural network model without any adjustment. ELM can achieve better balance in terms of model learning speed, predictive stability, generalization and so on [29][30][31].
When developing the ELM, reduction of computations and improvement of stability and accuracy of the model can be achieved, by changing the type of activation functions and the number of neurons in the hidden layer and optimizing the input weight and the bias of the hidden layer [32]. Compared with the traditional algorithm, the ELM method is easy to use and theoretically achieve a globally optimum solution with much faster learning speed and good generalization capability. However, relevant parameters are given randomly, which may invalidate some hidden layer nodes and resulting in poor prediction of the model.

Linear Fusion Method
The part of the information of the predicted objects is usually included in the different types of sub-models. These information contributions are different to the fusion model, and there is some uniqueness. The basic idea of the multi-model fusion prediction method is to synthetically utilize the information provided by each sub-model. A multi-model fusion prediction model is established by the appropriate fusion method. It is expected that the more comprehensive prediction information will be contained in this model.
The commonly used sub-model fusion method is the linear fusion method. The reliability of weight coefficient is very important in improving the prediction accuracy and stability of the model. In this paper, two methods for calculating the weight coefficient are presented. The first method is to minimize the squared error. The optimization objective of this method is to minimize the sum of squares of errors between predicted and actual values. The weight coefficients of the fusion model are obtained by optimization. The second method is the information entropy method. The weight coefficient of the fusion model is determined by evaluating the prediction effects of each sub-model in the method.

Minimum Squared Error
The assumption is that there is a true value set {y i } (i = 1,2, . . . ,n), where n is the total number of samples. To use the number of m of sub-models to predict, propose the y j,i (j = 1,2, . . . ,m, i = 1,2, . . . ,n) is the prediction value of the i sample in the j model, and the y s,i is the prediction value of the i sample in linear fusion model.
where parameter ω j indicates the weighting factor of the j sub-model in the linear fusion model and satisfies the following constraints: where e s,i represents the absolute prediction error of the linear fusion model of the i sample: where J denotes the sum of squared errors of the linear fusion model: where ω = (ω 1 , ω 2 , . . . , ω m ) T indicts the weight column vector of the linear fusion model; E = (E j,k ) m×m denotes the prediction error information matrix of the linear fusion model, when j k, E j,k represents prediction error covariance between the j and k models, and when j = k, E j,k represents sum of squared errors of the j model.
Minimizing the sum of squared prediction errors of the linear fusion model is the objective function. Then, the model calculating the weight can be transformed into a nonlinear programming model: where R m×1 represents column vectors whose elements is all one. By solving Equation (8), the formula for calculating the weight in the linear fusion model is obtained as follows: The optimal value of the corresponding objective function is:

Information Entropy
The entropy in thermodynamics is a measure of the degree of disorder of the system. The information entropy in information theory is a measure to describe the degree of order of the system. Therefore, the absolute values of the entropy and information entropy are equal but the values are opposite to each other [33]. A system may have different states, and the assumption is that the P i (i = 1, 2, . . . , n) is the probability of the occurrence of the i state, then the information entropy in the system is: From the viewpoint of information entropy, it is based upon the variations of the prediction error of each sub-model and takes into account the differences and error factors among sub-models. The information entropy method can be used to compute the weight coefficient of the linear fusion model. The smaller the variation of information, the larger the weight coefficient of the sub-model in the linear fusion model.
The relative prediction error of the i sample in the j sub-model is Re j,i : The ratio of Re j,i to the total relative prediction error of n sample is as follows: The information entropy value of relative prediction error of the j sub-model is: The coefficient of variation of the relative prediction error of the j sub-model is: The weighting coefficient calculation formula of the j sub-model is: (16)

Implementation Steps
The preferred BP neural network, support vector machine and extreme learning machine sub-models are used to establish the prediction model by linear fusion method. This process is divided into three steps including data collection and grouping, sub-model training and evaluation, fusion model developing and testing. The implementation process is shown in Figure 1.
The preferred BP neural network, support vector machine and extreme learning machine sub-models are used to establish the prediction model by linear fusion method. This process is divided into three steps including data collection and grouping, sub-model training and evaluation, fusion model developing and testing. The implementation process is shown in Figure 1. The implementation steps are as follows: (1) Data collection and grouping According to the modeling requirements, the dataset for modeling is collected. The whole dataset is divided into a training set (X1), validation set (X2) and test set (X3) in the appropriate proportion. The training set (X1) is selected in a way that covers all the ranges of the experimental data and operating conditions.
(2) Sub-model training and evaluation The implementation process of sub-model training and evaluation is shown in Figure 2. Different types of sub-models are developed through training sets (X1). The BP sub-models (BP-ANN1, BP-ANN2, …, BP-ANNm) are established by changing the number of hidden layer nodes. The different types of kernel functions are chosen to develop SVM sub-models (SVM1, SVM2, …, SVMn). Based on different hidden layer neurons and iterative functions, the ELM sub-models (ELM1, ELM2, …, ELMk) are built. The model parameters are optimized by genetic algorithm (GA) to obtain the best results for each model.
The prediction performances of each sub-model are evaluated by using the validation set (X2). According to the performance indicators of the validation set, the optimal sub-models are selected from the same kind of sub-models. Then, three optimal sub-models are obtained, which are BP-ANNOpt, SVMOpt, ELMOpt.
(3) Fusion models developing and testing The implementation process of the fusion model development and testing is shown in Figure 3. The parameters w1, w2 and w3 represent the combination weight of three optimal sub-models, respectively. The weight of the three sub-models is calculated by the method of minimum square error (Equation (9)) or the information entropy method (Equation (16)) separately. Then, the two linear fusion models are established. Finally, the prediction performance of the built linear fusion models are tested using the test set (X3). The implementation steps are as follows: (1) Data collection and grouping According to the modeling requirements, the dataset for modeling is collected. The whole dataset is divided into a training set (X 1 ), validation set (X 2 ) and test set (X 3 ) in the appropriate proportion. The training set (X 1 ) is selected in a way that covers all the ranges of the experimental data and operating conditions.
(2) Sub-model training and evaluation The implementation process of sub-model training and evaluation is shown in Figure 2. Different types of sub-models are developed through training sets (X 1 ). The BP sub-models (BP-ANN 1 , BP-ANN 2 , . . . , BP-ANN m ) are established by changing the number of hidden layer nodes. The different types of kernel functions are chosen to develop SVM sub-models (SVM 1 , SVM 2 , . . . , SVM n ). Based on different hidden layer neurons and iterative functions, the ELM sub-models (ELM 1 , ELM 2 , . . . , ELM k ) are built. The model parameters are optimized by genetic algorithm (GA) to obtain the best results for each model.
The prediction performances of each sub-model are evaluated by using the validation set (X 2 ). According to the performance indicators of the validation set, the optimal sub-models are selected from the same kind of sub-models. Then, three optimal sub-models are obtained, which are BP-ANN Opt , SVM Opt , ELM Opt .
(3) Fusion models developing and testing The implementation process of the fusion model development and testing is shown in Figure 3. The parameters w 1 , w 2 and w 3 represent the combination weight of three optimal sub-models, respectively. The weight of the three sub-models is calculated by the method of minimum square error (Equation (9)) or the information entropy method (Equation (16)) separately. Then, the two linear fusion models are established. Finally, the prediction performance of the built linear fusion models are tested using the test set (X 3 ).

Data Collecting and Grouping
The six important parameters of nine imidazole ionic liquids including temperature, pressure, critical temperature (Tc), critical pressure (Pc), molecular weight (M) and eccentricity factor (w) were taken as input variables of the prediction model. Theoretically, Tc, Pc, M and w are the essential thermodynamic properties of ILs. They can distinguish the species of ILs and reflect the characteristics of ionic liquid structures [20,34], which are listed in Table 1. Temperature and CO2 pressure will affect the solubility of CO2 in the ionic liquid. For the same kind of ionic liquid, when the temperature decreases or the pressure increases, the solubility of CO2 in the ionic liquid would increase. The solubility of CO2 in ILs was chosen as the output variable of the model.

Data Collecting and Grouping
The six important parameters of nine imidazole ionic liquids including temperature, pressure, critical temperature (Tc), critical pressure (Pc), molecular weight (M) and eccentricity factor (w) were taken as input variables of the prediction model. Theoretically, Tc, Pc, M and w are the essential thermodynamic properties of ILs. They can distinguish the species of ILs and reflect the characteristics of ionic liquid structures [20,34], which are listed in Table 1. Temperature and CO2 pressure will affect the solubility of CO2 in the ionic liquid. For the same kind of ionic liquid, when the temperature decreases or the pressure increases, the solubility of CO2 in the ionic liquid would increase. The solubility of CO2 in ILs was chosen as the output variable of the model.

Data Collecting and Grouping
The six important parameters of nine imidazole ionic liquids including temperature, pressure, critical temperature (Tc), critical pressure (Pc), molecular weight (M) and eccentricity factor (w) were taken as input variables of the prediction model. Theoretically, Tc, Pc, M and w are the essential thermodynamic properties of ILs. They can distinguish the species of ILs and reflect the characteristics of ionic liquid structures [20,34], which are listed in Table 1. Temperature and CO 2 pressure will affect the solubility of CO 2 in the ionic liquid. For the same kind of ionic liquid, when the temperature decreases or the pressure increases, the solubility of CO 2 in the ionic liquid would increase. The solubility of CO 2 in ILs was chosen as the output variable of the model.
In order to develop dependable models, it is necessary to collect properties of CO 2 in different ILs from the literature. In this study, the 544 samples were collected and depicted partly in Table 2 [20,[35][36][37][38][39][40]. All the solubility of CO 2 in ionic liquids in this paper was obtained in the equilibrium phase. The unit of stoichiometry of reagents gas/ionic liquids is molar ratio. The dataset was randomly divided into three sets. A total of 70% of the sample named training set was used to generate the sub-models. A total of 15% of the sample named validation set was applied to select the best performance sub-models among the same kind of models. The other named test set was used to test the performance of the fusion model.

Sub-Models Development
The sub-models of different structures were established by the training set. All sub-models were implemented by using MATLAB software (version 2016a, MathWorks, Natick, MA, United States). The details are as follows: (1) The BP neural network with a single hidden layer can realize arbitrary mapping of continuous nonlinear function [41]. Thus, a series of BP neural network sub-models with different numbers of neurons (from 3 to 10 in sequence) in the hidden layer were established by using a three layers structure. In order to achieve nonlinear mapping, the transfer iterative function of the hidden layer is the Tansig function. The training algorithm uses the Levenberg-Marquardt Algorithm. To expand the range, the output layer uses the purelin iterative function.
(2) Three sub-models of SVM were established by choosing the polynomial kernel function, radial basis function and sigmoid function respectively. Genetic algorithm was incorporated to optimize the parameters of various kernel functions. The optimization result of the order of the polynomial in polynomial kernel function is 12. The optimized kernel width of the radial basis kernel function is 0.002. The optimized parameters of scalar and displacement of the Sigmoid function are 0.081 and −1.68, respectively.
(3) Eight extreme learning machine sub-models were established by selecting different hidden layer neurons and activation functions. Among them, the hidden layer nodes of five sub-models whose activation function is sigmoid were taken from 148 to 152 in turns. The hidden layer nodes of the other models with sine function were taken from 151 to 153 in order. The genetic algorithm was used to optimize the weight and threshold of each extreme learning machine sub-model.

Sub-Models Evaluation
The validation set was used to evaluate the performance of different types of sub-models. It screened out the optimal sub-models for model fusion. Four different statistical parameters of mean absolute error (MAE), root mean square error (RMSE), correlation coefficient (R 2 ), and standard deviation (STD) were utilized (Equations (17)- (20)) to investigate the accuracy of the proposed models. The specific calculation formula for each indicator is as follows: where N is the number of samples, x i is the predicted value of the sample i,x i is the true value of the sample i, x is the average of all samples. The predictive performance of the each sub-model was obtained from the validation set. The result is shown in Table 3. The BP neural network sub-model with four hidden layer neurons achieves the most accurate performance. The prediction effects of SVM sub-models with different kernel functions are indicated in Table 4. The sub-model with the radial basis function has the least prediction error. As shown in Table 5, the sub-model with 150 neurons in the hidden layer and sigmoid function reaches the optimal results.

Sub-Models Fusion
The weights of three screened sub-models were obtained by the method of minimizing the sum of squares error (Equation (9)). The expression of the linear fusion model is as follows: where Y indicts the output of the linear fusion model; y BP denotes the output of the BP neural network sub-model with the topology of 6 × 4 × 1; y SVM represents the output of the SVM sub-model with the radial basis kernel function; y ELM denotes the output of the ELM sub-model with the number of neurons in the hidden layer as 150 and the activation function is sigmoid.
Combining the same selected sub-models, the weights of each sub-model were calculated by using the information entropy method (Equation (16)). The output of the linear fusion model is as follows:

Fusion Model Testing
The sub-models with better model performance were screened through the validation set. The fusion model established by the method of minimum squared error is the linear fusion model I, and the fusion model using the information entropy method is the linear fusion model II. The performance of the above five models mentioned above were tested by using the test set, and the prediction effects of the various models are shown in Figure 4.

Sub-Models Fusion
The weights of three screened sub-models were obtained by the method of minimizing the sum of squares error (Equation (9)). The expression of the linear fusion model is as follows: where Y indicts the output of the linear fusion model; yBP denotes the output of the BP neural network sub-model with the topology of 6 × 4 × 1; ySVM represents the output of the SVM sub-model with the radial basis kernel function; yELM denotes the output of the ELM sub-model with the number of neurons in the hidden layer as 150 and the activation function is sigmoid.
Combining the same selected sub-models, the weights of each sub-model were calculated by using the information entropy method (Equation (16)). The output of the linear fusion model is as follows:

Fusion Model Testing
The sub-models with better model performance were screened through the validation set. The fusion model established by the method of minimum squared error is the linear fusion model I, and the fusion model using the information entropy method is the linear fusion model II. The performance of the above five models mentioned above were tested by using the test set, and the prediction effects of the various models are shown in Figure 4.  In Figure 4, the horizontal and vertical axes are presented experimental and prediction values, respectively. The perfect fit is indicated by the solid line and the square points show the real predicted values. In other words, the closer the square points are to the solid lines the more accurate the correlation. Based on this definition, the results in Figure 4 show good correlative capability of the five models. Figure 5 depicts the error histogram between the experimental values and the predicted values of each sub-model. The percentage of error distribution of each model is shown, and the fused model follows the normal distribution curve more effectively than the single sub-model. In this case, Figure 5 also proves the superiority of the fused models.  In Figure 4, the horizontal and vertical axes are presented experimental and prediction values, respectively. The perfect fit is indicated by the solid line and the square points show the real predicted values. In other words, the closer the square points are to the solid lines the more accurate the correlation. Based on this definition, the results in Figure 4 show good correlative capability of the five models.  In order to quantitatively describe the prediction effects of the five models, the mean absolute error (MAE), root mean square error (RMSE), correlation coefficient (R 2 ) and standard deviation (STD) of the five models in the test set are given in Table 6. The error performance indicators of each model in Table 6 are shown in the form of a histogram in Figure 6. It can be seen more intuitively that the error performance indicators of two different linear fusion models are reduced compared with the optimal sub-models. By synthesizing all charts, it is proved that the linear fusion model has better prediction performance. As the linear fusion model can fully combine the characteristics of each sub-model and provide useful information for the prediction model from different perspectives, the accuracy and reliability of the prediction fusion model are improved. In order to quantitatively describe the prediction effects of the five models, the mean absolute error (MAE), root mean square error (RMSE), correlation coefficient (R 2 ) and standard deviation (STD) of the five models in the test set are given in Table 6. The error performance indicators of each model in Table 6 are shown in the form of a histogram in Figure 6. It can be seen more intuitively that the error performance indicators of two different linear fusion models are reduced compared with the optimal sub-models. By synthesizing all charts, it is proved that the linear fusion model has better prediction performance. As the linear fusion model can fully combine the characteristics of each sub-model and provide useful information for the prediction model from different perspectives, the accuracy and reliability of the prediction fusion model are improved. In the two linear fusion models, the performance of the fusion model based on the information entropy method is better than that of the fusion model established by the method of minimum squared error. The weight of the fusion model is obtained by the method of minimum square error, which can reduce the model prediction error. However, this fusion model is susceptible to the samples, which leads to descent of global performance. The weight of the fusion model which is obtained by the information entropy method comprehensively considers the difference and error factors between the sub-models. The method can use the explicit and invisible information of the sample to disperse the prediction risk of the model, and then improve the prediction accuracy of the model.

Conclusions
In this paper, a fusion modeling method was proposed for predicting the solubility of CO2 in ILs. Firstly, 544 sets of samples from nine ILs were collected from the literature, and divided into a In the two linear fusion models, the performance of the fusion model based on the information entropy method is better than that of the fusion model established by the method of minimum squared error. The weight of the fusion model is obtained by the method of minimum square error, which can reduce the model prediction error. However, this fusion model is susceptible to the samples, which leads to descent of global performance. The weight of the fusion model which is obtained by the information entropy method comprehensively considers the difference and error factors between the sub-models. The method can use the explicit and invisible information of the sample to disperse the prediction risk of the model, and then improve the prediction accuracy of the model.

Conclusions
In this paper, a fusion modeling method was proposed for predicting the solubility of CO 2 in ILs. Firstly, 544 sets of samples from nine ILs were collected from the literature, and divided into a training set, validation set and test set according to a certain proportion. The sub-models of the BP neural network, SVM and ELM were established by using the training set. Among them, three sub-models with the optimal evaluation performance were selected by using the validation set. Then, the linear fusion models were established by using the minimum square error method and the information entropy method, respectively. Finally, the test set was used to test the prediction performance of the linear fusion models and optimal sub-models. The results show that the prediction effect of the linear fusion model is better than other single sub-models. Furthermore, the prediction effect of the linear fusion model based on the information entropy method is better than based on the minimum square error method.
Although the prediction model established by the fusion modeling method has a good prediction effect of nine imidazole ionic liquids in the paper, it may not be suitable for the prediction of the solubility of other ILs in CO 2 . Nonetheless, the fusion modeling method solves the shortcomings of being time-consuming, high cost of experimental measurement and the complexity and generalization of mechanism model. It provides an effective method for predicting the solubility of CO 2 in the ILs, and also can be considered as a new method for prediction of different thermodynamic properties.
Author Contributions: L.X. designed the algorithm. J.W. contributed to collecting datasets and analyzing data. S.L. and Z.L. discussed and explained the results. H.P. took the lead in writing the manuscript. L.X. and J.W. wrote and revised the manuscript.