Next Article in Journal
Warsaw Glacial Quartz Sand with Different Grain-Size Characteristics and Its Shear Wave Velocity from Various Interpretation Methods of BET
Previous Article in Journal
Assessment of Morphological, Physical, Thermal, and Thermal Conductivity Properties of Polypropylene/Lignosulfonate Blends
 
 
Order Article Reprints
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predictive Models for the Binary Diffusion Coefficient at Infinite Dilution in Polar and Nonpolar Fluids

CICECO—Aveiro Institute of Materials, Department of Chemistry, University of Aveiro, 3810-193 Aveiro, Portugal
*
Author to whom correspondence should be addressed.
Materials 2021, 14(3), 542; https://doi.org/10.3390/ma14030542
Received: 23 December 2020 / Revised: 7 January 2021 / Accepted: 19 January 2021 / Published: 23 January 2021

Abstract

:
Experimental diffusivities are scarcely available, though their knowledge is essential to model rate-controlled processes. In this work various machine learning models to estimate diffusivities in polar and nonpolar solvents (except water and supercritical CO2) were developed. Such models were trained on a database of 90 polar systems (1431 points) and 154 nonpolar systems (1129 points) with data on 20 properties. Five machine learning algorithms were evaluated: multilinear regression, k-nearest neighbors, decision tree, and two ensemble methods (random forest and gradient boosted). For both polar and nonpolar data, the best results were found using the gradient boosted algorithm. The model for polar systems contains 6 variables/parameters (temperature, solvent viscosity, solute molar mass, solute critical pressure, solvent molar mass, and solvent Lennard-Jones energy constant) and showed an average deviation (AARD) of 5.07%. The nonpolar model requires five variables/parameters (the same of polar systems except the Lennard-Jones constant) and presents AARD = 5.86%. These results were compared with four classic models, including the 2-parameter correlation of Magalhães et al. (AARD = 5.19/6.19% for polar/nonpolar) and the predictive Wilke-Chang equation (AARD = 40.92/29.19%). Nonetheless Magalhães et al. requires two parameters per system that must be previously fitted to data. The developed models are coded and provided as command line program.

Graphical Abstract

1. Introduction

Diffusivities are important properties for the proper design, simulation and scale-up of rate-controlled separations and chemical reactions, where they are required for the estimation of dispersion coefficients, convective mass transfer coefficients, and catalysts effectiveness factors [1,2,3]. However, diffusivity data is scarce both in terms of compounds and operating conditions, leading to the need of accurate models capable of predicting diffusivities when no experimental data is available [4].
Currently the Wilke-Chang model [5], proposed in 1955, remains the most widely used equation to estimate binary diffusivities mainly due to its simplicity. It requires only knowledge of solvent viscosity, solute molar mass, solute volume at normal boiling point and operating conditions like temperature. Other hydrodynamic equations have been proposed such as Scheibel [6], Tyn-Calus [7], and Hayduk and Minhas [8]. Correlative models validated for both polar and nonpolar systems have been put forward by Magalhães et al. [9,10], and one may also cite the 2-parameter correlation of Dymond–Hildebrand–Batschinski (DHB) [11,12], based on the free-volume theory, for nonpolar and weakly polar systems at moderate densities. However, these correlations require that data of a given system is available in order to interpolate and extrapolate diffusivities for the desired condition. Hybrid models are also available, such as the predictive Zhu et al. [13] and the predictive Tracer Liu-Silva-Macedo (TLSM) and its 1-parameter correlations (TLSMd and TLSMen) [4,14,15]. These are Lennard-Jones fluid models and comprehend two contributions: a free-volume part and an energy component.
With the increase of readily available computing power, Artificial Intelligence and machine learning (ML) techniques have been increasingly applied for the estimation of physical properties of various compounds. In the chemistry field, machine learning is commonly applied in the scope of quantitative structure-property relationship (QSPR) or quantitative structure-activity relationship (QSAR) studies. These are regression or classification models that relate the structure and physicochemical properties of a molecule with a desired response: a chemical property, in the case of QSPR, or a biological activity, in the case of QSAR. QSPR/QSAR approaches have been applied to predict the diffusivity of pure chemicals [16] and acids in water [17] using a database of 320 chemicals and 65 acids, respectively. In both cases, a genetic algorithm was employed to select the molecular descriptors while feed-forward and radial basis function neural networks were used to build the diffusion coefficients models. A squared correlation coefficient above 0.98 was obtained for the test set in either case. Beigzadeh and coworkers [18] developed a feed-forward artificial neural network to estimate the Fick diffusion coefficient in binary liquid systems, using a database of 851 points. Results showed superior performance when compared with other theoretical and empirical correlative models commonly used, with a total average relative deviation of 4.75%. Eslamloueyan and Khademi [19] used a database of 336 experimental data points to developed a feed forward neural network to predict binary diffusivities of gaseous mixtures at atmospheric pressure as a function of temperature and based on the critical temperature, critical volume and molecular weight of each component. This model showed a relative error of 4.47%, lower than other alternative correlations. A QSRP model by Abbasi and Eslamloueyan [20] applied a multi-layer perceptron (MLP) neural network and an adaptive neuro-fuzzy inference system (ANFIS) to estimate the binary diffusion coefficients of liquid hydrocarbons mixtures. These models were constructed on a database of 345 experimental points and showed very good accuracies, with average absolute relative deviation (AARD) of 7.79% for the test data, when compared with five semi-empirical correlations, such as the Tyn-Calus and Wilke-Chang equations. Another QSPR model with five parameters based on genetic function approximation has been proposed to predict diffusion coefficient of non-electrolyte organic compounds in air at ambient temperature [21]. It used a dataset of 4579 organic compounds and provided a very low AARD of 0.3%. The authors applied leverage value statistics to define the applicability domain of the final model. A neural network model based on mega-trend diffusion algorithm was proposed to predict CO2 diffusivity in biodegradable polymers [22]. It showed better precision when compared with free-volume and conventional back-propagation models. More recently, machine learning and neural networks models have also been applied to the estimation of the thermal diffusivity of hydrocarbons [23], aromatic compounds insulating material [24], and diffusivity of solutes in supercritical carbon dioxide [25].
In this work we develop models for the prediction of binary diffusion coefficients in polar and nonpolar systems by employing several machine learning algorithms, such as decision tree, nearest neighbors and ensemble methods. A large database of experimental data was collected, divided into polar and nonpolar systems, and used for training the models. The database comprehends experimental points for liquids (except water), compressed gases and supercritical fluids (except CO2). Water was excluded due to its usual distinct behavior from other polar solvents, and the large amount of experimental data available for aqueous systems may cause a bias in the model. This later argument also applies to binary diffusivities in supercritical CO2. Results were compared with four methods to estimate diffusivities: two hydrodynamic equations (Wilke-Chang and Tyn-Calus), a 2-parameter correlation (Magalhães et al.), and a hybrid model (Zhu et al.).

2. Theory and Methods

The methodology used in this work to develop machine learning (ML) models for the prediction of diffusivities can be summarized in the following steps: (i) variable selection; (ii) learning algorithms selection; (iii) data splitting into training and testing sets; (iv) data scaling; (v) hyper-parameters optimization by grid search with cross validation; and (vi) final model evaluation. These steps are detailed below. The ML models were compared with the hydrodynamic models of Wilke-Chang [5] and Tyn-Calus [7], the hybrid model of Zhu et al. [13] and one of the correlations proposed by Magalhães et al. [9].

2.1. Database Compilation

The database of binary diffusivities used in this work relied on the recent compilation published by Zêzere et al. [4], in the case of nonpolar solvent systems, and on an updated version of the database reported by Magalhães et al. for polar solvent systems [10]. Globally, database covers a wide range of temperatures (213.2–567.2 K) and densities (0.30–1.65 g cm−3) being composed by 244 binary systems and 2560 data points. This includes 90 polar systems (polar solvent/solute) totalizing 1431 points and 154 nonpolar systems (nonpolar solvent/solute) totalizing 1129 points. Data were collected for the 20 properties shown in Table 1. Whenever not reported by the authors, densities and viscosities were taken from the National Institute of Standards and Technology (NIST) database or calculated by the following set of equations: Yaws [26], Cibulka and Ziková [27], Cibulka et al. [28,29], Cibulka and Takagi [30], Przezdziecki and Sridhar [31], Viswanath et al. [32] the Lucas method [33], Assael et al. [34], Cano-Gómez et al. [35] and Pádua et al. [36]. Solute molar volumes at normal boiling point were estimated by Tyn–Calus equation [37]. The critical constants, whenever not provided with the diffusion data and not found in the other references [26,31,38,39,40,41,42,43,44], were estimated by Joback [31,45,46], Somayajulu [47], Klincewicz [31,48], Ambrose [31,49,50,51] and Wen–Qiang [52] methods. For ionic liquids the critical constants were retrieved from Valderrama and Rojas [53]. The acentric factors, when not provided, were estimated by the Lee-Kesler [54] and Pitzer [55] equations or retrieved from [26,31,38,39,40,41,42,43,44]. The Lennard-Jones diameter and energy were taken from Silva et al. [12] and, when not available, were estimated by equations 7 and 8 from Liu et al. [15] and equation 9 from Magalhães et al. [14]. Detailed information on the database used, including pure compound properties, is presented in Table 2.
Polar and nonpolar systems were separated into two databases based on the polarity of the solvent and, for each, data were split randomly 70/30% into training and testing sets. The training set was used for model learning and fitting, and the testing set was used to evaluate the performance of the fitted model after learning. Information from the testing set is never known during learning. In order to guarantee that all models are fed the same data, these data sets were also used for the evaluation of the classic models.
Most learning algorithms benefit from scaling input variables in order to improve model robustness and training speed [61]. The most common scaling strategies are normalization or standardization. Normalization consists in transforming the real range of values into a standard range (e.g., [0, 1] or [−1, 1]). Standardization consists in transforming variables so that they follow a standard normal distribution (mean of zero and standard deviation of one). In this work, variables were normalized to the [0, 1] range before passing them to training.

2.2. Variable Selection and Hyper-Parameter Optimization

Model variables were selected from the ones shown in Table 1 while removing collinear variables systematically. For each pair of variables with collinearity above a defined threshold of 0.50, the one with lower correlation with D 12 was removed from the model. The simplicity of obtaining a variable for a given system was also considered if both show similar correlation with D 12 . This was done to improve the simplicity and ease of use of the final model.
Besides the model parameters discussed thus far, each learning algorithm possesses a set of parameters, which can be seen as configuration options, that specify how the algorithm behaves. These variables are often called hyper-parameters and are not fitted to data but rather must be set before training. Hyper-parameters were optimized for each learning algorithm using a grid search method with 4-fold cross validation implemented using GridSearchCV of scikit-learn (version 0.22.1). This method performs an exhaustive search for the best hyper-parameter values in a predefined grid by evaluating the model performance by 4-fold cross-validation. The k-fold cross-validation approach divides the training set into k subsets and trains the model with data from k − 1 of the folds while testing it on the fold. This process is repeated using every different k − 1 fold combination and the best model (best combination of hyper-parameters) emerges as that with the best average performance while avoiding both overfitting and underfitting of the models. The evaluated hyper-parameters for each learning algorithm are showed in Table S1 of the Supplementary Material.

2.3. Machine Learning Algorithms

Five ML algorithms were evaluated for the prediction of binary diffusivities: A multilinear regression, a k-nearest neighbors model, a decision tree algorithm, and two ensemble methods (random forest and gradient boosted). Models were implemented using the Python machine learning library scikit-learn version 0.22.1 [62].
A simple ordinary least squares multilinear regression was used as a baseline model for the prediction of binary diffusivities. In a multilinear regression [63], the target value, y , is a linear combination of explanatory variables, x i , weighted by coefficients b i . The coefficients are optimized to minimize the residual sum of squares between the observed and the calculated values. It was implemented using the LinearRegression class in scikit-learn.
The k-nearest neighbors (kNN) [64,65] is one of the simplest machine learning algorithms. Its prediction is the average of its k closest neighbors in the input space. Neighbors are selected from a set of examples for which the target property is known. This can be seen as the training set, although unlike other algorithms, kNN does not require an explicit training phase. The nearest neighbors are identified by position vectors in the multidimensional input space, usually in terms of Euclidean distance, nonetheless other distance measures could be applied. The kNN algorithm was implemented using the KNeighborsRegressor class in scikit-learn.
Decision tree [65,66] models take the training data and create a set of decision rules that are applied to the explanatory variables. Prediction is performed by following these tree-like rule graphs and selecting the paths that return the best metric, usually lowest entropy or largest information gain, until an output node is reached. The decision tree algorithm was implemented using the DecisionTreeRegressor class in scikit-learn.
Finally, ensemble methods are a combination of a large number of simple models, thus improving generalizability and robustness over a single model [63]. They can be divided into averaging ensemble methods, as the random forest algorithm, and boosting ensemble methods, such as the gradient boosted model, and have proven to be effective for regression learning [67].
Random forests [65,68] are comprised by several strong models, such as decision trees, trained independently. For the construction of each tree a random subset of training data is selected, while the remaining subset is used for testing. The final prediction is obtained as an average of the ensemble. Random forests are fast and simple to apply as they have simpler hyper-parameters settings than other methods, can be applied in cases with a large amount of noise and are less prone to overfitting [65]. The random forest model was implemented in scikit-learn using the RandomForestRegressor class.
Gradient boosted [69] models combine several learners, which are not independently trained but combined so that each new learner mitigates the bias of the previous one. The gradient boosted model also uses decision trees which are fitted to the gradient of a loss function, for instance, the squared error. The gradient is calculated for every sample of the training set but only a random subset of those gradients is used at by each learner. Gradient boosted has showed to provide very good predictions at least on par with random forests and usually superior to other methods [70]. The gradient boosted algorithm was implemented using the GradientBoostingRegressor class.

2.4. Classic Models

Several classic D 12 models were used as a benchmark for the proposed ML models, including the still extensively used Wilke-Chang equation [5], the Tyn-Calus equation [7], one of the Magalhães et al. correlations [9], and the Zhu et al. hybrid model [13]. Bellow, these models are briefly presented.
The Wilke-Chang equation [5] is an empirical modification of the Stokes-Einstein relation and is given by:
D 12 ( cm 2 s 1 ) = 7.4 × 10 8 ( ϕ M 1 ) 0.5 T μ 1 ( V bp , 2 ) 0.6
where ϕ (dimensionless) is the association factor of the solvent (1.9 for the case of methanol, 1.5 for ethanol and 1.0 if it is unassociated [31]), and V bp , 2 (cm3 mol−1) is the solute molar volume at normal boiling temperature, which can be estimated using the critical volume ( V c , i ) by the Tyn-Calus relation [31,37]:
V bp , i = 0.285 × V c , i 1.048
The Tyn-Calus equation [7] is another commonly used hydrodynamic equation, which is described by:
D 12 ( cm 2 s 1 ) = 8.93 × 10 8 V bp , 1 0.267 V bp , 2 0.433 T μ 1
Magalhães et al. [9] proposed nine correlations for D 12 , and four of them depend explicitly on solvent viscosity and temperature. Here we adopt the following:
D 12 = a T μ 1 + b
where a and b are fitted parameters for each system. This equation consists of a modification of the Stokes–Einstein theory [31].
Zhu et al. [13] developed a hybrid model containing a component related with the free volume and another related with energy. It was devised for the estimation of D 12 of real nonpolar fluids. It is described by:
D 12 = 3 8 π σ LJ , 12 2 ε LJ , 12 m 1 T 12 * ρ 12 * ( 1 ρ 12 * 1.029079 T 12 * 0.165377 ) × [ 1 + ρ 12 * 0.126978 ( 0.596103 ( ρ 12 * 1 ) 0.539292 ( ρ 12 * 1 ) + T 12 * ( 0.400152 0.41054 ρ 12 * ) + 0.68856 ) ] × e x p ( ρ 12 * 2 T 12 * )
where the subscripts 1 and 2 denominate solvent and solute, respectively, m 1 is the mass of the solvent, and ρ 12 * and T 12 * are the density and temperature reduced using binary Lennard-Jones (LJ) parameters ε LJ , 12 and σ LJ , 12 as described by:
T 12 * = T / ( ε LJ , 12 / k B )
ρ 12 * = ρ n , 1 σ LJ , 12 3
The binary LJ parameters are calculated by the following combining rules:
σ LJ , 12 = ( 1 k 12 d ) ( σ LJ , 2 + σ LJ , 1 ) 2 ;       ε LJ , 12 / k B = ( ε LJ , 1 / k B ) ( ε LJ , 2 / k B )
and the interaction parameter k 12 d is estimated through:
k 12 d = 0.7926 ( σ LJ , 2 σ LJ , 1 ) ( σ LJ , 2 + σ LJ , 1 )
Finally, the LJ parameters ε LJ / k B and σ LJ for the solute are calculated by:
ε LJ , 2 / k B = T c , 2 / 1.313 ;                     σ LJ , 2 = ( 0.13 ε LJ , 2 / P c , 2 )
and for the solvent:
ε LJ , 1 / k B = T c , 1 / 1.313 ( 1 + 0.47527332 ρ r , 1 + ( 0.06300484 + 0.12374707 ρ r , 1 ) T r , 1 )
σ LJ , 1 = ( 0.31 / ρ n , c , 1 ) 1 / 3 ( 1 0.0368868 ρ r , 1 + ( 0.00006945 + 0.01089228 ρ r , 1 ) T r , 1 )
where ρ n , c , 1 is the number critical density (cm−3) and ρ r , 1 and T r , 1 are the reduced density and reduced temperature of the solvent, calculated with the corresponding critical constants: P r , 1 = P 1 / P c , 1 and T r , 1 = T 1 / T c , 1 .

3. Results and Discussion

3.1. Machine Learning Models

The first step towards model development was the choice of relevant variables for the model. Selection was conducted on the basis of the collinearities between the available variables/properties and their level of correlation with the diffusivity. Figure 1 and Figure 2 show the correlation matrix (in the form of a heat map) for the polar and nonpolar data sets, where the values represent the absolute Pearson correlation. When two variables presented collinearities above a defined threshold of 0.50, only one was kept in the model, namely the one providing of the best correlation with diffusivity. Following this procedure, six variables were selected for the polar diffusivity model: temperature, solvent viscosity, solute molar mass, solute critical pressure, solvent molar mass, and the Lennard-Jones energy constant of solvent. For the nonpolar diffusivity model, temperature, solvent viscosity, solute molar mass, solute critical pressure, and solvent molar mass were chosen, totaling five variables. A summary of the variables required for the machine learning models for polar (ML Polar) and nonpolar (ML Nonpolar) systems is presented in Table 3, together with the required inputs for the classic models of Wilke-Chang, Tyn-Calus, Magalhães et al., and Zhu et al. The two hydrodynamic equations require four input variables, the same number as the Magalhães et al. correlation although, in this later case, two of the four parameters must be fitted to experimental data, thus reducing the model applicability. The Zhu et al. hybrid model requires the larger number of parameters (seven) and is only applicable to nonpolar systems.
The performance of all models was evaluated by calculating the average absolute relative deviation (AARD) of each system:
AARD ( % ) = 100 NDP i = 1 NDP | D 12 , i calc D 12 , i exp D 12 , i exp |  
where superscripts calc and exp denote calculated and experimental values, and NDP is the number of data points of a system. For the whole database, the global deviation (i.e., weighted AARD) and the arithmetic systems average (AARDarith) were calculated. The minimum and maximum system AARD are reported as an indication of the performance of the best and worst systems. The root mean square error (RMSE) was also calculated and is defined as:
RMSE = 1 NDP i = 1 NDP ( D 12 , i calc D 12 , i exp )   2
The coefficient of determination, R2, which is calculated for the training set, and the Q2 value, which corresponds to R2 value obtained when applying the model to the test set, are also reported for all models.
A final validation of the best machine learning models was conducted by performing a y-randomization test (also called y-scrambling). This test compares the performance of the original model with that of models built for a scrambled (randomly shuffled) response while still following the original model building procedure. The randomization process eliminates the relation between the independent variables and target response. If the performance of the models when using scrambled data is much lower than when using original data, one can be confident of the relevance of the original model. Five algorithms were tested to develop the supervised learning models including a multilinear regression, k-nearest neighbors, decision tree, random forest (an averaging ensemble method), and gradient boosted (a boosting ensemble method). The performance of the several machine learning algorithms when applied to the test set of polar data, covering 79 systems and 430 points, is shown in Table 4. The gradient boosted algorithm presents the best performance for the test set (pure prediction) with an AARD of 5.07% followed by the random forest, decision tree, k-nearest neighbors, and multilinear regression (from lower to higher AARD). Similar trends are present when analyzing the arithmetic average of 79 systems AARD, as well as the minimum and the maximum AARD. As expected, the multilinear regression exhibits much worse results than the other four algorithms for all the AARD metrics. The gradient boosted algorithm also presents the lowest RMSE and highest Q2. The Q2 value is also close to R2 indicating that the model works well independently of its training data. Figure 3 plots the diffusivities predicted by the gradient boosted ML model against the respective experimental values for the test set of polar systems, showing a very good distribution along the diagonal. Similar representations are provided for the remaining four algorithms in Figures S1–S4 of the Supplementary Material. The multilinear regression model presents significant underestimation at higher values of D 12 and overestimation in the intermediate region. On the other hand, the remaining three algorithms show good dispersion around the diagonal, however with larger deviations than the gradient boosted model.
Table 5 presents the results obtained using each ML algorithm for the test set of nonpolar compounds (130 systems and 342 points). Once again, the gradient boosted algorithm presents the best global AARD for the 130 systems of the test set (5.86%), followed by the random forest, then by the decision tree and k-nearest neighbors with similar results, and lastly by the multilinear regression with significantly worst results. A similar trend is visible when calculating a simple arithmetic average of systems AARD. The gradient boosted algorithm shows the lowest RMSE and highest Q2. The calculated versus experimental diffusivities for the test set of nonpolar compounds using the Gradient Boosted model are plotted in Figure 4, showing unbiased distribution along the diagonal over all range of experimental points. Figures S5–S8 of the Supplementary Material provide the calculated against experimental plots for the remaining four algorithms. As in the case of the polar data, the multilinear regression model once again presents significant deviations. The k-nearest neighbors, decision tree, and random forest algorithms provide better scattering around the diagonal. Few outliers may be observed, particularly in the case of the decision tree model.
As a final validation of the gradient boosted models selected for polar and nonpolar systems, a y-randomization test was performed by scrambling the diffusivity vector. This process was repeated 200 times and always returned random models with performances much lower than the original ones, thus confirming the significance of the proposed models. Figure S9 of the Supplementary Material shows the contrast between the Q2 values of our models (0.9919 for polar and 0.9879 for nonpolar) and the lower ones obtained for the permutations. It is worth noting that: (i) the best possible score of Q2 (and R2) is 1.0; (ii) for a constant model that always predicts the expected value of the response, both indicators are zero; (iii) Q2 (and R2) can be negative for arbitrarily worse model.
Summarily, the ML Polar Gradient Boosted model showed good performance for the prediction of diffusivities of multiple solutes in polar solvents in the following train and test domain: T = 268–554 K; μ 1 = 0.0241–17.6 cP; M 2 = 17–674 g mol−1; P c , 2 = 4.1–221.2 bar; M 1 = 20–113 g mol−1; and ε LJ , 1 / k B = 208–2121 K. Likewise the ML Nonpolar Gradient Boosted can be applied over: T = 213–567 K; μ 1 = 0.0229–2.92 cP; M 2 = 2–461 g mol−1; P c , 2 = 12.5–96.3 bar; and M 1 = 30–395 g mol−1. Both models showed good interpolation capability, however it is expected that they can also provide reasonable extrapolations.
The ML Polar Gradient Boosted and ML Nonpolar Gradient Boosted models are provided as a command line program in the Supplementary Material.

3.2. Detailed Comparison of ML Gradient Boosted and Classic Models

Four classic models for the calculation of diffusivities were adopted for comparison: two hydrodynamic equations (Wilke-Chang [5] and Tyn-Calus [7]), a correlation by Magalhães et al. [9], and the hybrid model of Zhu et al. [13]. The performance metrics of the classic models are shown in Table 4, for the polar systems, and Table 5, for the nonpolar systems. Overall, the proposed ML models outperform the classic models.
The Wilke-Chang and Tyn-Calus hydrodynamic equations provide similar performance indicators in both data sets, though the former shows much higher maximum AARDs (Table 4: 197.71% vs. 97.11%; Table 5: 172.30% vs. 64.97%). Analyzing Figure 5a,b, where the calculated versus experimental diffusivities are plotted for the polar data set over the entire range and over a low range of values, we see that the Wilke-Chang equation overestimates higher diffusivities and tends to underestimate lower ones. The Tyn-Calus equation for polar solvents provides systematic underestimation as shown in Figure S10 of the Supplementary Material. In the case of nonpolar systems, both Wilke-Chang (Figure 6a,b) and Tyn-Calus (Figure S11) models exhibit a dual biased distribution of the calculated D 12 values.
The correlation of Magalhães et al. is able to deliver the best performance among the classic models, with a unbiased distribution along the diagonal in Figure 5c,d and Figure 6c,d and an AARD only slightly above that provided by the machine learning gradient boosted models proposed in this work (5.19% and 6.19% for the polar and nonpolar sets, respectively). However, the Magalhães et al. can often be difficult to apply since it requires that data on the system of interest is available in order to fit its two parameters. In this work, data in the train sets was used to fit the a and b parameters for each system, which were then applied to the calculation of diffusivities for the test sets. For this reason, fewer points were calculated for the Magalhães et al. model, corresponding to the systems where not enough data were available in the train sets to optimize parameters a and b .
Finally, the Zhu et al. model, which was developed for nonpolar and weakly polar fluids, does not appear to provide any benefit over the much simpler Wilke-Chang and Tyn-Calus equations when applied to the nonpolar data set of this work. It provides higher AARD (Table 5: 37.93%) than both hydrodynamic equations (Table 5: 29.19% and 28.84%, respectively), although it shows lower biased dispersion along diagonal (Figure S12).
Table 6 details the results of the best machine learning (gradient boosted) and classic diffusivity models for each system of the polar database, as well as the distribution of points among train and test sets. The best results are found for the ethylbenzene/acetone system (AARD of 0.08%) and the worst for the ethylene glycol/ethanol system (76.23%). However, these two systems have only one and two points in the test set, respectively. Considering only cases where at least 10 points are available for train and test sets, the carbon dioxide/n-butanol shows the best result (1.19%) while ammonia/1-propanol has the worst (5.65%).
Table 7 presents equivalent information for the nonpolar systems. In this case, the n-decane/n-dodecane and tetraethyltin/n-decane systems show the best (0.03%) and worst (25.87%) results, respectively, but, once again, with only one point in the test set. If only systems with at least five points in the train and test sets are considered, the best result appears for 1,3,5-trimethylbenzene/n-hexane (2.98%) and the worst for toluene/n-hexane (4.58%).
The models proposed in this work can be easily retrained as new experimental data is made available, thus increasing its robustness and scope. A program that allows the estimation of diffusivities in polar and nonpolar systems is provided in the Supplementary Material, along with instructions on its use.

4. Conclusions

Two machine learning (ML) models were developed for the estimation of binary diffusivities in polar and nonpolar systems. These models were trained and tested on a database containing 20 properties for polar (90 systems and 1431 points) and nonpolar (154 systems and 1129 points) systems. Several learning algorithms were tested, including multilinear regression, k-nearest neighbors, decision tree, random forest and gradient boosted. The best ML results were obtained for the gradient boosted model, which provided global AARDs of 5.07% and 5.86% for the test set of polar and nonpolar systems, respectively. The nonpolar model relies on five input variables/properties: temperature, solvent viscosity, solute molar mass, solute critical pressure and solvent molar mass. The polar model takes the Lennard-Jones energy of solvent as an additional parameter, thus requiring six inputs totally. The classic models of Wilke-Chang, Tyn-Calus, Magalhães et al. and Zhu et al. were adopted for comparison and demonstrated worse performance for the same test sets. The 2-parameter correlation of Magalhães et al. showed results closer to the new gradient boosted models with AARD of 5.19% (polar) and 6.19% (nonpolar), however, that equation requires previous data to fit its two parameters, and thus it is impractical to apply to unknown systems. Among the remaining classic models, Wilke-Chang provided the best result for polar systems (40.92%) while Tyn-Calus performed best for nonpolar systems (28.84%). The developed models are provided as application in the Supplementary Material.

Supplementary Materials

The following are available online at https://www.mdpi.com/1996-1944/14/3/542/s1, Software, Table S1: Tested and best hyper-parameter values for each machine learning algorithm, Figure S1: Predicted versus experimental diffusivities for the test set of polar systems using the Multilinear Regression model, Figure S2: Predicted versus experimental diffusivities for the test set of polar systems using the k-Nearest Neighbors model, Figure S3: Predicted versus experimental diffusivities for the test set of polar systems using the Decision Tree model, Figure S4: Predicted versus experimental diffusivities for the test set of polar systems using the Random Forest model, Figure S5: Predicted versus experimental diffusivities for the test set of nonpolar systems using the Multilinear Regression model, Figure S6: Predicted versus experimental diffusivities for the test set of nonpolar systems using the k-Nearest Neighbors model, Figure S7: Predicted versus experimental diffusivities for the test set of nonpolar systems using the Decision Tree model, Figure S8: Predicted versus experimental diffusivities for the test set of nonpolar systems using the Random Forest model, Figure S9: y-Randomization calculations for the selected ML Gradient Boosted models for (a) polar systems and (b) nonpolar systems. The bars show the Q2 values for models based on randomized diffusivity data. The dashed horizontal lines show the Q2 values of the actual models. Figure S10: Calculated versus experimental diffusivities for the test set of polar systems for the Tyn-Calus model. (a) full D 12 range; (b) zoomed on lower D 12 range, Figure S11: Calculated versus experimental diffusivities for the test set of nonpolar systems for the Tyn-Calus model. (a) full D 12 range; (b) zoomed on lower D 12 range, Figure S12: Calculated versus experimental diffusivities for the test set of nonpolar systems for the Zhu et al. model. (a) full D 12 range; (b) zoomed on lower D 12 range.

Author Contributions

Conceptualization, J.P.S.A. and C.M.S.; Formal analysis, J.P.S.A. and C.M.S.; Funding acquisition, C.M.S.; Investigation, J.P.S.A. and B.Z.; Methodology, J.P.S.A. and C.M.S.; Project administration, C.M.S.; Resources, C.M.S.; Software, J.P.S.A. and B.Z.; Supervision, C.M.S.; Visualization, J.P.S.A.; Writing—original draft, J.P.S.A.; Writing—review & editing, C.M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was developed within the scope of the project CICECO-Aveiro Institute of Mate rials, UIDB/50011/2020 & UIDP/50011/2020, financed by national funds through the Foundation for Science and Technology/MCTES, as well as the Multibiorefinery project (POCI-01-0145-FEDER-016403). Bruno Zêzere thanks FCT for PhD grant SFRH/BD/137751/2018.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Wankat, P.C. Rate-Controlled Separations; Blackie Academic & Professional: Glasgow, UK, 1994. [Google Scholar]
  2. Oliveira, E.L.G.; Silvestre, A.J.D.; Silva, C.M. Review of kinetic models for supercritical fluid extraction. Chem. Eng. Res. Des. 2011, 89, 1104–1117. [Google Scholar] [CrossRef]
  3. Carberry, J.J. Chemical and Catalytic Reaction Engineering; McGraw-Hill: New York, NY, USA, 1971. [Google Scholar]
  4. Zêzere, B.; Portugal, I.; Gomes, J.R.B.; Silva, C.M. Revisiting Tracer Liu-Silva-Macedo model for binary diffusion coefficient using the largest database of liquid and supercritical systems. J. Supercrit. Fluids 2021, 168, 105073. [Google Scholar] [CrossRef]
  5. Wilke, C.R.; Chang, P. Correlation of diffusion coefficients in dilute solutions. AIChE J. 1955, 1, 264–270. [Google Scholar] [CrossRef]
  6. Scheibel, E.G. Liquid Diffusivities. Ind. Eng. Chem. 1954, 9, 2007–2008. [Google Scholar] [CrossRef]
  7. Tyn, M.T.; Calus, W.F. Diffusion Coefficients in Dilute Binary Liquid Mixtures. J. Chem. Eng. Data 1975, 20, 106–109. [Google Scholar] [CrossRef]
  8. Hayduk, W.; Minhas, B.S. Correlations for prediction of molecular diffusivities in liquids. Can. J. Chem. Eng. 1982, 60, 295–299. [Google Scholar] [CrossRef]
  9. Magalhães, A.L.; Lito, P.F.; Da Silva, F.A.; Silva, C.M. Simple and accurate correlations for diffusion coefficients of solutes in liquids and supercritical fluids over wide ranges of temperature and density. J. Supercrit. Fluids 2013, 76, 94–114. [Google Scholar] [CrossRef]
  10. Magalhães, A.L.; Da Silva, F.A.; Silva, C.M. Tracer diffusion coefficients of polar systems. Chem. Eng. Sci. 2012, 73, 151–168. [Google Scholar] [CrossRef]
  11. Dymond, J.H. Corrected Enskog theory and the transport coefficients of liquids. J. Chem. Phys. 1974, 60, 969–973. [Google Scholar] [CrossRef]
  12. Silva, C.M.; Liu, H. Modelling of Transport Properties of Hard Sphere Fluids and Related Systems, and its Applications. In Theory and Simulation of Hard-Sphere Fluids and Related Systems; Springer: Berlin, Germany, 2008; pp. 383–492. [Google Scholar]
  13. Zhu, Y.; Lu, X.; Zhou, J.; Wang, Y.; Shi, J. Prediction of diffusion coefficients for gas, liquid and supercritical fluid: Application to pure real fluids and infinite dilute binary solutions based on the simulation of Lennard–Jones fluid. Fluid Phase Equilib. 2002, 194–197, 1141–1159. [Google Scholar] [CrossRef]
  14. Magalhães, A.L.; Cardoso, S.P.; Figueiredo, B.R.; Da Silva, F.A.; Silva, C.M. Revisiting the liu-silva-macedo model for tracer diffusion coefficients of supercritical, liquid, and gaseous systems. Ind. Eng. Chem. Res. 2010, 49, 7697–7700. [Google Scholar] [CrossRef]
  15. Liu, H.; Silva, C.M.; Macedo, E.A. New Equations for Tracer Diffusion Coefficients of Solutes in Supercritical and Liquid Solvents Based on the Lennard-Jones Fluid Model. Ind. Eng. Chem. Res. 1997, 36, 246–252. [Google Scholar] [CrossRef]
  16. Gharagheizi, F.; Sattari, M. Estimation of molecular diffusivity of pure chemicals in water: A quantitative structure-property relationship study. SAR QSAR Environ. Res. 2009, 20, 267–285. [Google Scholar] [CrossRef] [PubMed]
  17. Khajeh, A.; Rasaei, M.R. Diffusion coefficient prediction of acids in water at infinite dilution by QSPR method. Struct. Chem. 2011, 23, 399–406. [Google Scholar] [CrossRef]
  18. Beigzadeh, R.; Rahimi, M.; Shabanian, S.R. Developing a feed forward neural network multilayer model for prediction of binary diffusion coefficient in liquids. Fluid Phase Equilib. 2012, 331, 48–57. [Google Scholar] [CrossRef]
  19. Eslamloueyan, R.; Khademi, M.H. A neural network-based method for estimation of binary gas diffusivity. Chemom. Intell. Lab. Syst. 2010, 104, 195–204. [Google Scholar] [CrossRef]
  20. Abbasi, A.; Eslamloueyan, R. Determination of binary diffusion coefficients of hydrocarbon mixtures using MLP and ANFIS networks based on QSPR method. Chemom. Intell. Lab. Syst. 2014, 132, 39–51. [Google Scholar] [CrossRef]
  21. Mirkhani, S.A.; Gharagheizi, F.; Sattari, M. A QSPR model for prediction of diffusion coefficient of non-electrolyte organic compounds in air at ambient condition. Chemosphere 2012, 86, 959–966. [Google Scholar] [CrossRef]
  22. Rahimi, M.R.; Karimi, H.; Yousefi, F. Prediction of carbon dioxide diffusivity in biodegradable polymers using diffusion neural network. Heat Mass Transf. Stoffuebertragung 2012, 48, 1357–1365. [Google Scholar] [CrossRef]
  23. Lashkarbolooki, M.; Hezave, A.Z.; Bayat, M. Thermal diffusivity of hydrocarbons and aromatics: Artificial neural network predicting model. J. Thermophys. Heat Transf. 2017, 31, 621–627. [Google Scholar] [CrossRef]
  24. Chudzik, S. Measurement of thermal diffusivity of insulating material using an artificial neural network. Meas. Sci. Technol. 2012, 23, 065602. [Google Scholar] [CrossRef]
  25. Aniceto, J.P.S.; Zêzere, B.; Silva, C.M. Machine learning models for the prediction of diffusivities in supercritical CO2 systems. J. Mol. Liq. 2021, 115281. [Google Scholar] [CrossRef]
  26. Yaws, C.L. Chemical Properties Handbook: Physical, Thermodynamic, Environmental, Transport, Safety, and Health Related Properties for Organic and Inorganic Chemicals; McGraw-Hill Professional: New York, NY, USA, 1998. [Google Scholar]
  27. Cibulka, I.; Ziková, M. Liquid densities at elevated pressures of 1-alkanols from C1 to C10: A critical evaluation of experimental data. J. Chem. Eng. Data 1994, 39, 876–886. [Google Scholar] [CrossRef]
  28. Cibulka, I.; Hnědkovský, L.; Takagi, T. P−$ρ$−T data of liquids: Summarization and evaluation. 4. Higher 1-alkanols (C11, C12, C14, C16), secondary, tertiary, and branched alkanols, cycloalkanols, alkanediols, alkanetriols, ether alkanols, and aromatic hydroxy derivatives. J. Chem. Eng. Data 1997, 42, 415–433. [Google Scholar] [CrossRef]
  29. Cibulka, I.; Takagi, T.; Růžička, K. P−ρ−T data of liquids: Summarization and evaluation. 7. Selected halogenated hydrocarbons. J. Chem. Eng. Data 2000, 46, 2–28. [Google Scholar] [CrossRef]
  30. Cibulka, I.; Takagi, T. P−ρ−T data of liquids: Summarization and evaluation. 8. Miscellaneous compounds. J. Chem. Eng. Data 2002, 47, 1037–1070. [Google Scholar] [CrossRef]
  31. Reid, R.C.; Prausnitz, J.M.; Poling, B.E. The Properties of Gases and Liquids, 4th ed.; Company, M.-H.B., Ed.; McGraw-Hill International Editions: New York, NY, USA, 1987. [Google Scholar]
  32. Viswanath, D.S.; Ghosh, T.K.; Prasad, D.H.; Dutt, N.V.K.; Rani, K.Y. Viscosity of Liquids: Theory, Estimation, Experiment, and Data; Springer: Dordrecht, The Netherlands, 2007; ISBN 978-1-4020-5482-2. [Google Scholar]
  33. Lucas, K. Ein einfaches verfahren zur berechnung der viskosität von Gasen und Gasgemischen. Chem. Ing. Tech. 1974, 46, 157–158. [Google Scholar] [CrossRef]
  34. Assael, M.J.; Dymond, J.H.; Polimatidou, S.K. Correlation and prediction of dense fluid transport coefficients. Fluid Phase Equilib. 1994, 15, 189–201. [Google Scholar] [CrossRef]
  35. Cano-Gómez, J.J.; Iglesias-Silva, G.A.; Rico-Ramírez, V.; Ramos-Estrada, M.; Hall, K.R. A new correlation for the prediction of refractive index and liquid densities of 1-alcohols. Fluid Phase Equilib. 2015, 387, 117–120. [Google Scholar] [CrossRef]
  36. Pádua, A.A.H.; Fareleira, J.M.N.A.; Calado, J.C.G.; Wakeham, W.A. Density and viscosity measurements of 2,2,4-trimethylpentane (isooctane) from 198 K to 348 K and up to 100 MPa. J. Chem. Eng. Data 1996, 41, 1488–1494. [Google Scholar] [CrossRef]
  37. Tyn, M.T.; Calus, W.F. Estimating liquid molar volume. Processing 1975, 21, 16–17. [Google Scholar]
  38. ChemSpider—Building Community for Chemists. Available online: http://www.chemspider.com (accessed on 22 August 2020).
  39. Korea Thermophysical Properties Data Bank (KDB). Available online: http://www.cheric.org/research/kdb/hcprop/cmpsrch.php (accessed on 22 August 2020).
  40. Design Institute for Physical Properties (DIPPR). Available online: http://dippr.byu.edu/ (accessed on 22 August 2020).
  41. Yaws, C.L. Thermophysical Properties of Chemicals and Hydrocarbons; William Andrew Inc.: New York, NY, USA, 2008. [Google Scholar]
  42. LookChem.com—Look for Chemicals. Available online: http://www.lookchem.com (accessed on 22 August 2020).
  43. AspenTech. Aspen Physical Property System—Physical Property Methods; AspenTech: Cambridge, MA, USA, 2007. [Google Scholar]
  44. Cordeiro, J. Medição e Modelação de Difusividades em CO2 Supercrítico e Etanol; Universidade de Aveiro: Aveiro, Potugal, 2015. [Google Scholar]
  45. Joback, K.G.; Reid, R.C. A Unified Approach to physical Property Estimation Using Multivariate Statistical Techniques; Massachusetts Institute of Technology: Cambridge, MA, USA, 1984. [Google Scholar]
  46. Joback, K.G.; Reid, R.C. Estimation of pure-component properties from group-contributions. Chem. Eng. Commun. 1987, 57, 233–243. [Google Scholar] [CrossRef]
  47. Somayajulu, G.R. Estimation Procedures for Critical Constants. J. Chem. Eng. Data 1989, 34, 106–120. [Google Scholar] [CrossRef]
  48. Klincewicz, K.M.; Reid, R.C. Estimation of critical properties with group contribution methods. AIChE J. 1984, 30, 137–142. [Google Scholar] [CrossRef]
  49. Ambrose, D. Correlation and estimation of vapour-liquid critical properties. I: Critical temperatures of organic compounds. In NPL Technical Report Chem. 92; National Physical Lab.: London, UK, 1978. [Google Scholar]
  50. Ambrose, D. Correlation and Estimation of Vapour-Liquid Critical Properties. II: Critical Pressure and Critical Volume. In NPL Technical Report. Chem. 92; National Physical Lab.: London, UK, 1979. [Google Scholar]
  51. Green, D.W.; Perry, R.H. Perry’s Chemical Engineers’ Handbook, 8th ed.; McGraw-Hill Professional: New York, NY, USA, 2008. [Google Scholar]
  52. Wen, X.; Qiang, Y. A new group contribution method for estimating critical properties of organic compounds. Ind. Eng. Chem. Res. 2001, 40, 6245–6250. [Google Scholar] [CrossRef]
  53. Valderrama, J.O.; Rojas, R.E. Critical properties of ionic liquids. Revisited. Ind. Eng. Chem. Res. 2009, 48, 6890–6900. [Google Scholar] [CrossRef]
  54. Lee, B.I.; Kesler, M.G. A generalized thermodynamic correlation based on three-parameter corresponding states. AIChE J. 1975, 21, 510–527. [Google Scholar] [CrossRef]
  55. Pitzer, K.S.; Lippmann, D.Z.; Curl, R.F.; Huggins, C.M.; Petersen, D.E. The Volumetric and Thermodynamic Properties of Fluids. II. Compressibility Factor, Vapor Pressure and Entropy of Vaporization. J. Am. Chem. Soc. 1955, 77, 3433–3440. [Google Scholar] [CrossRef]
  56. Zêzere, B.; Magalhães, A.L.; Portugal, I.; Silva, C.M. Diffusion coefficients of eucalyptol at infinite dilution in compressed liquid ethanol and in supercritical CO2/ethanol mixtures. J. Supercrit. Fluids 2018, 133, 297–308. [Google Scholar] [CrossRef]
  57. Leite, J.; Magalhães, A.L.; Valente, A.A.; Silva, C.M. Measurement and modelling of tracer diffusivities of gallic acid in liquid ethanol and in supercritical CO2 modified with ethanol. J. Supercrit. Fluids 2018, 131, 130–139. [Google Scholar] [CrossRef]
  58. Catchpole, O.J.; Von Kamp, J.C. Phase equilibrium for the extraction of squalene from shark liver oil using supercritical carbon dioxide. Ind. Eng. Chem. Res. 1997, 36, 3762–3768. [Google Scholar] [CrossRef]
  59. Liu, H.; Silva, C.M.; Macedo, E.A. Unified approach to the self-diffusion coefficients of dense fluids over wide ranges of temperature and pressure-hard-sphere, square-well, Lennard-Jones and real substances. Chem. Eng. Sci. 1998, 53, 2403–2422. [Google Scholar] [CrossRef]
  60. Cordeiro, J.; Magalhães, A.L.; Valente, A.A.; Silva, C.M. Experimental and theoretical analysis of the diffusion behavior of chromium(III) acetylacetonate in supercritical CO2. J. Supercrit. Fluids 2016, 118, 153–162. [Google Scholar] [CrossRef]
  61. Burkov, A. The Hundred-Page Machine Learning Book; Andriy Burkov: Quebec City, QC, Canada, 2019; ISBN 978-1-99-957950-0. [Google Scholar]
  62. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
  63. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, 2nd ed.; Springer: New York, NY, USA, 2009; ISBN 978-0-38-784857-0. [Google Scholar]
  64. Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 1992, 46, 175–185. [Google Scholar] [CrossRef][Green Version]
  65. Mitchell, J.B.O. Machine learning methods in chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2014, 4, 468–481. [Google Scholar] [CrossRef][Green Version]
  66. Quinlan, J.R. Simplifying decision trees. Int. J. Man. Mach. Stud. 1987, 27, 221–234. [Google Scholar] [CrossRef][Green Version]
  67. Müller, A.C.; Guido, S. Introduction to Machine Learning with Python: A Guide for Data Scientists; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2016; ISBN 978-1-449-36941-5. [Google Scholar]
  68. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef][Green Version]
  69. Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  70. Svetnik, V.; Wang, T.; Tong, C.; Liaw, A.; Sheridan, R.P.; Song, Q. Boosting: An ensemble learning tool for compound classification and QSAR modeling. J. Chem. Inf. Model. 2005, 45, 786–799. [Google Scholar] [CrossRef]
  71. Cooper, E. Diffusion Coefficients at Infinite Dilution in Alcohol Solvents at Temperatures to 348 K and Pressures to 17 MPa; University of Ottawa: Ottawa, ON, Canada, 1992. [Google Scholar]
  72. Pratt, K.C.; Wakeham, W.A. The mutual diffusion coefficient for binary mixtures of water and the isomers of propanol. Proc. R. Soc. Lond. A 1975, 342, 401–419. [Google Scholar] [CrossRef]
  73. Sun, C.K.J.; Chen, S.-H. Tracer diffusion in dense methanol and 2-propanol up to supercritical region: Understanding of solvent molecular association and development of an empirical correlation. Ind. Eng. Chem. Res. 1987, 24, 815–819. [Google Scholar] [CrossRef]
  74. Man, C.W. Limiting Mutual Diffusion of Nonassociated Aromatic Solutes; The Hong Kong Polytechnic University: Hong Kong, China, 2001. [Google Scholar]
  75. Tyn, M.T.; Calus, W.F. Temperature and concentration dependence of mutual diffusion coefficients of some binary liquid systems. J. Chem. Eng. Data 1975, 20, 310–316. [Google Scholar] [CrossRef]
  76. Sarraute, S.; Gomes, M.F.C.; Pádua, A.A.H. Diffusion coefficients of 1-alkyl-3-methylimidazolium ionic liquids in water, methanol, and acetonitrile at infinite dilution. J. Chem. Eng. Data 2009, 54, 2389–2394. [Google Scholar] [CrossRef]
  77. Hurle, R.L.; Woolf, L.A. Tracer diffusion in methanol and acetonitrile under pressure. J. Chem. Soc. Faraday Trans. 1982, 78, 2921–2928. [Google Scholar] [CrossRef]
  78. Wong, C.-F.; Hayduk, W. Molecular diffusivities for propene in 1-butanol, chlorobenzene, ethylene glycol, and n-octane at elevated pressures. J. Chem. Eng. Data 1990, 35, 323–328. [Google Scholar] [CrossRef]
  79. Wong, C.-F. Diffusion Coefficients of Dissolved Gases in Liquids; University of Ottawa: Ottawa, ON, Canada, 1989. [Google Scholar]
  80. Kopner, A.; Hamm, A.; Ellert, J.; Feist, R.; Schneider, G.M. Determination of binary diffusion coefficients in supercritical chlorotrifluoromethane and sulfurhexafluoride with supercritical fluid chromatography (SFC). Chem. Eng. Sci. 1987, 42, 2213–2218. [Google Scholar] [CrossRef]
  81. Han, P.; Bartels, D.M. Temperature dependence of oxygen diffusion in H2O and D2O. J. Phys. Chem. 1996, 100, 5597–5602. [Google Scholar] [CrossRef]
  82. Tominaga, T.; Matsumoto, S. Diffusion of polar and nonpolar molecules in water and ethanol. Bull. Chem. Soc. Jpn. 1990, 63, 533–537. [Google Scholar] [CrossRef][Green Version]
  83. Sun, C.K.J.; Chen, S.H. Tracer diffusion in dense ethanol: A generalized correlation for nonpolar and hydrogen-bonded solvents. AIChE J. 1986, 32, 1367–1371. [Google Scholar] [CrossRef]
  84. Suárez-Iglesias, O.; Medina, I.; Pizarro, C.; Bueno, J.L. Diffusion of benzyl acetate, 2-phenylethyl acetate, 3-phenylpropyl acetate, and dibenzyl ether in mixtures of carbon dioxide and ethanol. Ind. Eng. Chem. Res. 2007, 46, 3810–3819. [Google Scholar] [CrossRef]
  85. Lin, I.-H.; Tan, C.-S. Diffusion of benzonitrile in CO2—Expanded ethanol. J. Chem. Eng. Data 2008, 53, 1886–1891. [Google Scholar] [CrossRef]
  86. Kong, C.Y.; Watanabe, K.; Funazukuri, T. Measurement and correlation of the diffusion coefficients of chromium(III) acetylacetonate at infinite dilution in supercritical carbon dioxide and in liquid ethanol. J. Chem. Thermodyn. 2017, 105, 86–93. [Google Scholar] [CrossRef]
  87. Zêzere, B.; Cordeiro, J.; Leite, J.; Magalhães, A.L.; Portugal, I.; Silva, C.M. Diffusivities of metal acetylacetonates in liquid ethanol and comparison with the transport behavior in supercritical systems. J. Supercrit. Fluids 2019, 143, 259–267. [Google Scholar] [CrossRef]
  88. Funazukuri, T.; Yamasaki, T.; Taguchi, M.; Kong, C.Y. Measurement of binary diffusion coefficient and solubility estimation for dyes in supercritical carbon dioxide by CIR method. Fluid Phase Equilib. 2015, 420, 7–13. [Google Scholar] [CrossRef]
  89. Kong, C.Y.; Sugiura, K.; Natsume, S.; Sakabe, J.; Funazukuri, T.; Miyake, K.; Okajima, I.; Badhulika, S.; Sako, T. Measurements and correlation of diffusion coefficients of ibuprofen in both liquid and supercritical fluids. J. Supercrit. Fluids 2020, 159, 104776. [Google Scholar] [CrossRef]
  90. Snijder, E.D.; te Riele, M.J.M.; Versteeg, G.F.; van Swaaij, W.P.M. Diffusion Coefficients of CO, CO2, N2O, and N2 in ethanol and toluene. J. Chem. Eng. Data 1995, 40, 37–39. [Google Scholar] [CrossRef][Green Version]
  91. Kong, C.Y.; Watanabe, K.; Funazukuri, T. Diffusion coefficients of phenylbutazone in supercritical CO2 and in ethanol. J. Chromatogr. A 2013, 1279, 92–97. [Google Scholar] [CrossRef][Green Version]
  92. Zêzere, B.; Iglésias, J.; Portugal, I.; Gomes, J.R.B.; Silva, C.M. Diffusion of quercetin in compressed liquid ethyl acetate and ethanol. J. Mol. Liq. 2020, 114714. [Google Scholar] [CrossRef]
  93. Pratt, K.C.; Wakeham, W.A. The mutual diffusion coefficient of ethanol-water mixtures: Determination by a rapid, new method. Proc. R. Soc. Lond. A 1974, 336, 393–406. [Google Scholar]
  94. Zêzere, B.; Silva, J.M.; Portugal, I.; Gomes, J.R.B.; Silva, C.M. Measurement of astaxanthin and squalene diffusivities in compressed liquid ethyl acetate by Taylor-Aris dispersion method. Sep. Purif. Technol. 2020, 234, 116046. [Google Scholar] [CrossRef]
  95. Heintz, A.; Ludwig, R.; Schmidt, E. Limiting diffusion coefficients of ionic liquids in water and methanol: A combined experimental and molecular dynamics study. Phys. Chem. Chem. Phys. 2011, 13, 3268–3273. [Google Scholar] [CrossRef] [PubMed]
  96. Liu, Q.; Takemura, F.; Yabe, A. Solubility and diffusivity of carbon monoxide in liquid methanol. J. Chem. Eng. Data 1996, 41, 589–592. [Google Scholar] [CrossRef]
  97. Lin, I.-H.; Tan, C.-S. Measurement of diffusion coefficients of p-chloronitrobenzene in CO2-expanded methanol. J. Supercrit. Fluids 2008, 46, 112–117. [Google Scholar] [CrossRef]
  98. Funazukuri, T.; Sugihara, T.; Yui, K.; Ishii, T.; Taguchi, M. Measurement of infinite dilution diffusion coefficients of vitamin K3 in CO2 expanded methanol. J. Supercrit. Fluids 2016, 108, 19–25. [Google Scholar] [CrossRef]
  99. Lee, Y.E.; Li, F.Y. Binary diffusion coefficients of the methanol water system in the temperature range 30–40 °C. J. Chem. Eng. Data 1991, 36, 240–243. [Google Scholar] [CrossRef]
  100. Fan, Y.Q.; Qian, R.Y.; Shi, M.R.; Shi, J. Infinite dilution diffusion coefficients of several aromatic hydrocarbons in octane and 2,2,4-trimethylpentane. J. Chem. Eng. Data 1995, 40, 1053–1055. [Google Scholar] [CrossRef]
  101. Sun, C.K.J.; Chen, S.H. Diffusion of benzene, toluene, naphthalene, and phenanthrene in supercritical dense 2,3-dimethylbutane. AIChE J. 1985, 31, 1904–1910. [Google Scholar] [CrossRef]
  102. Toriurmi, M.; Katooka, R.; Yui, K.; Funazukuri, T.; Kong, C.Y.; Kagei, S. Measurements of binary diffusion coefficients for metal complexes in organic solvents by the Taylor dispersion method. Fluid Phase Equilib. 2010, 297, 62–66. [Google Scholar] [CrossRef]
  103. Sun, C.K.J.; Chen, S.H. Tracer diffusion of aromatic hydrocarbons in n-hexane up to the supercritical region. Chem. Eng. Sci. 1985, 40, 2217–2224. [Google Scholar]
  104. Funazukuri, T.; Nishimoton, N.; Wakao, N. Binary diffusion coefficients of organic compounds in hexane, dodecane, and cyclohexane at 303.2-333.2 K and 16.0 MPa. J. Chem. Eng. Data 1994, 39, 911–915. [Google Scholar] [CrossRef]
  105. Chen, S.H.; Davis, H.T.; Evans, D.F. Tracer diffusion in polyatomic liquids. II. J. Chem. Phys. 1981, 75, 1422–1426. [Google Scholar] [CrossRef]
  106. Sun, C.K.J.; Chen, S.H. Tracer diffusion of aromatic hydrocarbons in liquid cyclohexane up to its critical temperature. AIChE J. 1985, 31, 1510–1515. [Google Scholar] [CrossRef]
  107. Chen, B.H.C.; Sun, C.K.J.; Chen, S.H. Hard sphere treatment of binary diffusion in liquid at high dilution up to the critical temperature. J. Chem. Phys. 1985, 82, 2052–2055. [Google Scholar] [CrossRef]
  108. Noel, J.M.; Erkey, C.; Bukur, D.B.; Akgerman, A. Infinite dilution mutual diffusion coefficients of 1-octene and 1-tetradecene in near-critical ethane and propane. J. Chem. Eng. Data 1994, 39, 920–921. [Google Scholar] [CrossRef]
  109. Chen, H.C.; Chen, S.H. Tracer diffusion of crown ethers in n-decane and n-tetradecane: An improved correlation for binary systems involving normal alkanes. Ind. Eng. Chem. Fundam. 1985, 24, 187–192. [Google Scholar] [CrossRef]
  110. Chen, S.H.; Davis, H.T.; Evans, D.F. Tracer diffusion in polyatomic liquids. III. J. Chem. Phys. 1982, 77, 2540–2544. [Google Scholar] [CrossRef]
  111. Pollack, G.L.; Kennan, R.P.; Himm, J.F.; Stump, D.R. Diffusion of xenon in liquid alkanes: Temperature dependence measurements with a new method. Stokes–Einstein and hard sphere theories. J. Chem. Phys. 1990, 92, 625–630. [Google Scholar] [CrossRef]
  112. Matthews, M.A.; Rodden, J.B.; Akgerman, A. High-temperature diffusion of hydrogen, carbon monoxide, and carbon dioxide in liquid n-heptane, n-dodecane, and n-hexadecane. J. Chem. Eng. Data 1987, 32, 319–322. [Google Scholar] [CrossRef]
  113. Matthews, M.A.; Akgerman, A. Diffusion coefficients for binary alkane mixtures to 573 K and 3.5 MPa. AIChE J. 1987, 33, 881–885. [Google Scholar] [CrossRef]
  114. Rodden, J.B.; Erkey, C.; Akgerman, A. High-temperature diffusion, viscosity, and density measurements in n-eicosane. J. Chem. Eng. Data 1988, 33, 344–347. [Google Scholar] [CrossRef]
  115. Qian, R.Y.; Fan, Y.Q.; Shi, M.R.; Shi, J. Predictive equation of tracer liquid diffusion coefficient from viscosity. Chin. J. Chem. Eng. 1996, 4, 203–208. [Google Scholar]
  116. Li, S.F.Y.; Wakeham, W.A. Mutual diffusion coefficients for two n-octane isomers in n-heptane. Int. J. Thermophys. 1989, 10, 995–1003. [Google Scholar] [CrossRef]
  117. Grushka, E.; Kikta, E.J. Diffusion in liquids. II. Dependence of diffusion coefficients on molecular weight and on temperature. J. Am. Chem. Soc. 1976, 98, 643–648. [Google Scholar] [CrossRef]
  118. Lo, H.Y. Diffusion coefficients in binary liquid n-alkane systems. J. Chem. Eng. Data 1974, 19, 236–241. [Google Scholar] [CrossRef]
  119. Alizadeh, A.A.; Wakeham, W.A. Mutual diffusion coefficients for binary mixtures of normal alkanes. Int. J. Thermophys. 1982, 3, 307–323. [Google Scholar] [CrossRef]
  120. Padrel de Oliveira, C.M.; Fareleira, J.M.N.A.; Nieto de Castro, C.A. Mutual diffusivity in binary mixtures of n-heptane with n-hexane isomers. Int. J. Thermophys. 1989, 10, 973–982. [Google Scholar] [CrossRef]
  121. Li, S.F.Y.; Yue, L.S. Composition dependence of binary diffusion coefficients in alkane mixtures. Int. J. Thermophys. 1990, 11, 537–554. [Google Scholar] [CrossRef]
  122. Matthews, M.A.; Rodden, J.B.; Akgerman, A. High-temperature diffusion, viscosity, and density measurements in n-hexadecane. J. Chem. Eng. Data 1987, 32, 317–319. [Google Scholar] [CrossRef]
  123. Awan, M.A.; Dymond, J.H. Transport properties of nonelectrolyte liquid mixtures. X. Limiting mutual diffusion coefficients of fluorinated benzenes in n-hexane. Int. J. Thermophys. 1996, 17, 759–769. [Google Scholar] [CrossRef]
  124. Okamoto, M. Diffusion coefficients estimated by dynamic fluorescence quenching at high pressure: Pyrene, 9,10-dimethylanthracene, and oxygen in n-hexane. Int. J. Thermophys. 2002, 23, 421–435. [Google Scholar] [CrossRef]
  125. Dymond, J.H.; Woolf, L.A. Tracer diffusion of organic solutes in n-hexane at pressures up to 400 MPa. J. Chem. Soc. Faraday Trans. 1 1982, 78, 991–1000. [Google Scholar] [CrossRef]
  126. Safi, A.; Nicolas, C.; Neau, E.; Chevalier, J.L. Measurement and correlation of diffusion coefficients of aromatic compounds at infinite dilution in alkane and cycloalkane solvents. J. Chem. Eng. Data 2007, 52, 977–981. [Google Scholar] [CrossRef]
  127. Leffler, J.; Cullinan, H.T. Variation of liquid diffusion coefficients with composition. Dilute ternary systems. Ind. Eng. Chem. Fundam. 1970, 9, 88–93. [Google Scholar] [CrossRef]
  128. Harris, K.R.; Pua, C.K.N.; Dunlop, P.J. Mutual and tracer diffusion coefficients and frictional coefficients for systems benzene-chlorobenzene, benzene-n-hexane, and benzene-n-heptane at 25 °C. J. Phys. Chem. 1970, 74, 3518–3529. [Google Scholar] [CrossRef]
  129. Bidlack, D.L.; Kett, T.K.; Kelly, C.M.; Anderson, D.K. Diffusion in the solvents hexane and carbon tetrachloride. J. Chem. Eng. Data 1969, 14, 342–343. [Google Scholar] [CrossRef]
  130. Grushka, E.; Kikta, E.J. Extension of the chromatographic broadening method of measuring diffusion coefficients to liquid systems. I. Diffusion coefficients of some alkylbenzenes in chloroform. J. Phys. Chem. 1974, 78, 2297–2301. [Google Scholar] [CrossRef]
  131. Holmes, J.T.; Olander, D.R.; Wilke, C.R. Diffusion in mixed Solvents. AIChE J. 1962, 8, 646–649. [Google Scholar] [CrossRef]
  132. Funazukuri, T.; Ishiwata, Y. Diffusion coefficients of linoleic acid methyl ester, Vitamin K3 and indole in mixtures of carbon dioxide and n-hexane at 313.2 K, and 16.0 MPa and 25.0 MPa. Fluid Phase Equilib. 1999, 164, 117–129. [Google Scholar] [CrossRef]
  133. Moore, J.W.; Wellek, R.M. Diffusion coefficients of n-heptane and n-decane in n-alkanes and n-alcohols at several temperatures. J. Chem. Eng. Data 1974, 19, 136–140. [Google Scholar] [CrossRef]
  134. Márquez, N.; Kreutzer, M.T.; Makkee, M.; Moulijn, J.A. Infinite dilution binary diffusion coefficients of hydrotreating compounds in tetradecane in the temperature range from (310 to 475) K. J. Chem. Eng. Data 2008, 53, 439–443. [Google Scholar] [CrossRef]
  135. Debenedetti, P.G.; Reid, R.C. Diffusion and mass transfer in supercritical fluids. AIChE J. 1986, 32, 2034–2046. [Google Scholar] [CrossRef][Green Version]
Figure 1. Correlation heat map for all properties and variables in the database of polar compounds. Colormap shows the absolute value of the Pearson correlation from zero (light green) to one (dark blue).
Figure 1. Correlation heat map for all properties and variables in the database of polar compounds. Colormap shows the absolute value of the Pearson correlation from zero (light green) to one (dark blue).
Materials 14 00542 g001
Figure 2. Correlation heat map for all properties and variables in the database of nonpolar compounds. Colormap shows the absolute value of the Pearson correlation from zero (light green) to one (dark blue).
Figure 2. Correlation heat map for all properties and variables in the database of nonpolar compounds. Colormap shows the absolute value of the Pearson correlation from zero (light green) to one (dark blue).
Materials 14 00542 g002
Figure 3. Predicted versus experimental diffusivities for the test set of polar systems for the best machine learning model (Gradient Boosted): (a) plot including all calculated results; (b) plot zooming on lower D 12 range.
Figure 3. Predicted versus experimental diffusivities for the test set of polar systems for the best machine learning model (Gradient Boosted): (a) plot including all calculated results; (b) plot zooming on lower D 12 range.
Materials 14 00542 g003
Figure 4. Predicted versus experimental diffusivities for the test set of nonpolar systems for the best machine learning model (Gradient Boosted) showing (a) plot including all calculated results; (b) plot zooming on lower D 12 range.
Figure 4. Predicted versus experimental diffusivities for the test set of nonpolar systems for the best machine learning model (Gradient Boosted) showing (a) plot including all calculated results; (b) plot zooming on lower D 12 range.
Materials 14 00542 g004
Figure 5. Calculated versus experimental diffusivities for the test set of polar systems for: (a) and (b) Wilke-Chang (Equation (1)) [5] and (c) and (d) Magalhães et al. (Equation (4)) [9] models. Note the distinct scale between plots.
Figure 5. Calculated versus experimental diffusivities for the test set of polar systems for: (a) and (b) Wilke-Chang (Equation (1)) [5] and (c) and (d) Magalhães et al. (Equation (4)) [9] models. Note the distinct scale between plots.
Materials 14 00542 g005aMaterials 14 00542 g005b
Figure 6. Calculated versus experimental diffusivities for the test set of nonpolar systems for: (a) and (b) Wilke-Chang (Equation (1)) [5] and (c) and (d) Magalhães et al. (Equation (4)) [9] models. Note the distinct scale between plots.
Figure 6. Calculated versus experimental diffusivities for the test set of nonpolar systems for: (a) and (b) Wilke-Chang (Equation (1)) [5] and (c) and (d) Magalhães et al. (Equation (4)) [9] models. Note the distinct scale between plots.
Materials 14 00542 g006aMaterials 14 00542 g006b
Table 1. Properties and variables available for each system in the database.
Table 1. Properties and variables available for each system in the database.
PropertyUnitsDescription
D 12 cm2 s−1Diffusion coefficient
T KTemperature
ρ 1 g cm−3Solvent density
μ 1 cPSolvent viscosity
M 1 g mol−1Molar mass of solvent
M 2 g mol−1Molar mass of solute
T c , 1 KCritical temperature of solvent
T c , 2 KCritical temperature of solute
T bp , 1 KNormal boiling point temperature of solvent
T bp , 2 KNormal boiling point temperature of solute
P c , 1 barCritical pressure of solvent
P c , 2 barCritical pressure of solute
V c , 1 cm3 mol−1Critical volume of solvent
V c , 2 cm3 mol−1Critical volume of solute
w 1 -Acentric factor of solvent
w 2 -Acentric factor of solute
σ LJ , 1 ÅLennard-Jones diameter of solvent
σ LJ , 2 ÅLennard-Jones diameter of solute
ε LJ , 1 / k B KLennard-Jones energy constant of solvent
ε LJ , 2 / k B KLennard-Jones energy constant of solute
Table 2. Pure compounds properties and respective sources.
Table 2. Pure compounds properties and respective sources.
CompoundFormulaCAS M (g mol−1) T c (K) T b (K) P c (bar) V c (cm3 mol−1) w σ L J (Å) ε L J / k B (K)
[Bmim][bti]C10H15N3F6S2O4174899-83-3419.401269.90 a862.40 a27.60 a990.10 a0.3004 a7.59636 t982.90 t
[Emim][bti]C8H11N3F6S2O4174899-82-2391.311249.30 a816.70 a32.70 a875.90 a0.2157 a7.23444 t966.96 t
[Hmim][bti]C12H19N3F6S2O4382150-50-7447.421292.80 a908.20 a23.90 a1104.40 a0.3893 a7.90445 t1000.63 t
[Omim][bti]C14H23N3F6S2O4862731-66-6475.501317.80 a954.00 a21.00 a1218.60 a0.4811 a8.17464 t1019.98 t
1,1-dimethylferroceneC12H14Fe1291-47-0214.09514.45 b353.55 c27.41 b400.64 b0.3453 d5.88660 t398.18 t
1,2,3,5-tetrafluorobenzeneC6H2F42367-82-0150.08555.49 e375.38 f36.40 e351.05 e0.3817 d5.52349 t429.95 t
1,2,4,5-tetrafluorobenzeneC6H2F4327-54-8150.08535.25 g357.61 g37.47 g351.05 e0.3437 d5.41106 t414.28 t
1,2,4-trichlorobenzeneC6H3Cl3120-82-1181.45725.00 h486.15 h37.20 h395.00 h0.3580 h5.95446 t561.15 t
1,2,4-trifluorobenzeneC6H3F3367-23-7132.09558.22 e371.13 f38.98 e335.05 e0.3377 d5.41530 t432.06 t
1,2-butanediolC4H10O2584-03-290.12622.14 h463.46 h50.30 h291.50 h1.1410 h5.17223 t481.54 t
1,3,5-trimethylbenzeneC9H12108-67-8120.20637.30 i437.90 i31.30 i433.00 h0.3990 i6.03392 t493.27 t
1,3-dibromobenzeneC6H4Br2108-36-1235.91761.00 h491.15 h46.60 h372.00 h0.2930 h5.64056 t589.01 t
1,4-butanediolC4H10O2111-63-490.12667.00 h501.15 h48.80 h297.00 h1.1890 h5.33697 t516.26 t
12-crown-4C8H16O4294-93-9176.21780.66 e540.08 f33.59 e444.75 e0.4598 d6.27811 t604.23 t
15-crown-5C10H20O533100-27-5220.27876.80 e625.60 f28.72 e548.75 e0.5562 d6.79750 t678.64 t
18-crown-6C12H24O617455-13-9264.32970.51 e711.12 f24.95 e652.75 e0.6510 d7.26959 t751.17 t
1-butanolC4H10O71-36-374.12563.10 i390.90 i44.20 i275.00 i0.5930 i5.22056 t435.84 t
1-octeneC8H16111-66-0112.22566.70 i394.40 i26.20 i464.00 i0.3860 i6.14478 t438.63 t
1-propanolC3H8O71-23-860.10536.80 i370.30 i51.70 i219.00 i0.6230 i4.49190 u2120.83 u
1-tetradeceneC14H281120-36-1196.37691.00 j524.25 j16.27 j865.00 j0.6503 j7.44105 t534.83 t
2,2,4-trimethylpentaneC8H18540-84-1144.23543.80 h372.39 h25.70 h468.00 h0.3030 h6.10433 t420.90 t
2,3-dimethylbutaneC6H1479-29-886.18500.00 i331.10 i31.30 i358.00 i0.2470 i5.60227 t387.00 t
2-phenylethyl acetateC10H12O2103-45-7164.10712.23 k505.16 f30.12 k524.15 k0.5442 d6.31046 t551.27 t
2-propanolC3H8O67-63-060.10508.30 i355.40 i47.60 i220.00 i0.6650 i4.93749 t393.42 t
3-phenylpropyl acetateC11H14O2122-72-5178.30718.70 k518.16 f27.23 k580.37 k0.5924 d6.51801 t556.27 t
9,10-dimethylanthraceneC16H14781-43-1206.29899.22 e645.06 f26.27 e724.55 e0.5451 d7.01984 t696.00 t
acetoneC3H6O67-64-158.08508.10 i329.20 i47.00 i209.00 i0.3040 i4.67012 u332.97 u
acetonitrileC2H3N75-05-841.05545.50 i354.80 i48.30 i173.00 i0.3270 i4.02424 u652.53 u
acridineC13H9N260-94-6179.22905.00 l619.15 l36.40 l543.00 l0.4381 d6.40475 t700.47 t
ammoniaNH37664-41-717.03405.50 i239.80 i113.30 i72.50 i0.2500 i4.24397 u4.46 u
argonAr7440-37-139.95150.80 i87.30 i48.70 i74.90 i0.0010 i3.40744 u123.55 u
astaxanthinC40H52O4472-61-7596.841148.51 f1047.00 f5.30 f1877.50 f2.8439 d9.98026 t888.95 t
benzeneC6H671-43-278.11562.20 i353.20 i48.90 i259.00 i0.2120 i5.19165 u308.43 u
benzoic acidC7H6O265-85-0122.12752.00 i523.00 i45.60 i341.00 i0.6200 i5.65763 t582.05 t
benzonitrileC7H5N100-47-0103.12699.35 h464.15 h42.15 h339.00 h0.3520 h5.66827 t541.30 t
benzothiopheneC8H6S95-15-8134.20764.00 j494.05 j47.60 j379.00 j0.3071 j5.61049 t591.34 t
benzyl acetateC9H10O2140-11-4150.18699.00 h486.65 h31.80 h449.00 h0.4700 h6.17454 t541.03 t
biphenylC12H1092-52-4154.21789.00 i529.30 i38.50 i502.00 i0.3720 i6.04576 t610.69 t
carbon dioxideCO2124-38-944.01304.10 i194.70 h73.80 i93.90 i0.2390 i3.26192 u500.71 u
carbon disulfideCS275-15-076.13552.00 i319.00 i79.00 i160.00 i0.1090 i4.29901 u376.51 u
carbon monoxideCO630-08-028.01132.90 i81.70 i35.00 i93.20 i0.0660 i3.53562 t102.86 t
carbon tetrabromideCBr4558-13-4331.63724.91 h462.65 h96.31 h328.50 h0.5010 h4.41501 t561.08 t
carbon tetrachlorideCCl456-23-5153.82556.40 i349.90 i45.60 i275.90 i0.1930 i5.29240 u418.84 u
chlorobenzeneC6H5Cl108-90-7112.56632.40 i404.90 i45.20 i308.00 i0.2490 i5.56838 u207.50 u
chlorotrifluoromethaneCClF375-72-9104.46302.00 i193.20 i38.70 i180.40 i0.1980 i4.37636 u410.79 u
chromium(III) acetylacetonateCr(acac)321679-31-2349.32858.85 b613.15 m18.92 b627.04 b0.3631 d5.71650 v845.60 v
cyclohexaneC6H12110-82-784.16553.50 i353.80 i40.70 i308.00 i0.2120 i5.73075 u224.87 u
deuterium oxideD2O7789-20-020.03643.89 i374.55 i216.71 i56.26 i0.3447 d3.26304 t498.37 t
dibenzothiopheneC12H8S132-65-0184.26897.00 j604.61 j38.60 j512.00 j0.3983 j6.27791 t694.28 t
dibenzyl etherC14H14O103-50-4198.27777.00 h561.45 h25.60 h608.00 h0.5910 h6.78621 t601.40 t
dicyclohexano-18-crown-6C20H36O616069-36-6372.501177.47 e906.84 f16.24 e1002.75 e0.7675 d8.41774 t911.36 t
dicyclohexano-24-crown-8C24H44O817455-23-1460.611357.66 e1077.88 f13.48 e1210.75 e0.9120 d8.62250 t1050.83 t
disperse blue 14C16H14N2O22475-44-7266.001137.33 f881.88 f27.18 f765.50 f1.1790 d7.41187 t880.29 t
disperse orange 11C15H11NO282-28-0237.251103.62 f831.19 f31.17 f670.00 f0.9859 d7.08580 t854.20 t
ethaneC2H674-84-030.07305.40 i184.60 i48.80 i148.30 i0.0990 i4.17587 u213.99 u
ethanolC2H6O64-17-546.07513.90 i351.40 i61.40 i167.10 i0.6440 i4.23738 u1291.41 u
ethyl acetateC4H8O2141-78-688.11523.20 i350.30 i38.30 i286.00 i0.3620 i5.33606 t404.96 t
ethylbenzeneC8H10100-41-4106.17617.20 i409.30 i36.00 i374.00 i0.3020 i5.72572 t477.71 t
ethyleneC2H474-85-128.05282.40 i169.30 i50.40 i130.40 i0.0890 i4.04838 u169.08 u
ethylene glycolC2H6O2107-21-162.07645.00 h470.45 h75.30 h191.00 h1.1907 d4.60221 t499.23 t
ethylferroceneC12H14Fe1273-89-8214.08554.21 b381.75 n27.41 b400.64 b0.3556 d6.02127 t428.96 t
eucalyptolC10H18O470-82-6154.25695.50 o449.50 f31.40 o509.50 o0.6490 b6.18868 t538.32 t
ferroceneC10H10Fe102-54-5186.04786.27 b522.15 n32.07 b317.77 b0.2638 d6.37838 t608.57 t
gallic acidC7H6O5149-91-7170.121136.70 p789.90 p34.90 p276.20 p0.4984 d6.92304 t879.81 t
glycerolC3H8O356-81-592.10723.00 h563.15 h40.00 h264.00 h1.4986 d5.81929 t559.60 t
hexafluorobenzeneC6F6392-56-3186.06516.70 i353.40 i33.00 i335.00 i0.3960 i5.56763 t399.93 t
hydrogenH21333-74-02.0233.00 i20.30 i12.90 i64.30 i−0.2160 i5.94111 u0.00 u
IbuprofenC13H18O215687-27-1206.29769.63 e580.45 q22.85 e686.35 e0.8512 d6.98841 t595.69 t
indoleC8H7N204-420-7117.15790.00 h526.15 h43.40 h431.00 h0.4293 y5.83184 t611.46 t
kryptonKr7439-90-983.80209.40 i119.90 i55.00 i91.20 i0.0050 i2.89870 u511.92 u
linoleic acid methyl esterC19H34O2112-63-0294.48870.78 r700.66 f12.54 r1070.95 r0.9952 d8.34769 t673.98 t
methaneCH474-82-816.04190.40 i111.60 i46.00 i99.20 i0.0110 i3.58484 u167.15 u
methanolCH4O67-56-132.04512.60 i337.70 i80.90 i118.00 i0.5560 i3.79957 u685.96 u
m-xyleneC8H10108-38-3106.17617.10 i412.30 i35.40 i376.00 i0.3250 i5.75507 t477.64 t
naphthaleneC10H891-20-3128.17748.40 i491.10 i40.50 i413.00 i0.3020 i5.85874 t579.26 t
n-butanolC410O71-36-374.12563.10 i390.90 i44.20 i275.00 i0.5930 i5.22056 t435.84 t
n-decaneC10H22124-18-5142.29617.70 i447.30 i21.20 i603.00 i0.4890 i6.71395 u434.86 u
n-dodecaneC12H26112-40-3170.34658.20 i489.50 i18.20 i713.00 i0.5750 i7.00451 u672.90 u
n-eicosaneC20H42112-95-8282.56767.00 i617.00 i11.10 i1190.00 h0.9070 i8.33954 t593.66 t
n-heptaneC7H16142-82-5100.21540.30 i371.60 i27.40 i432.00 i0.3490 i5.94356 u404.05 u
n-hexadecaneC16H34544-76-3226.45722.00 i560.00 i14.10 i930.00 i0.7420 i7.36480 u1669.19 u
n-hexaneC6H14110-54-386.18507.50 i341.90 i30.10 i370.00 i0.2990 i5.61841 u434.76 u
nitrous oxideN2O10024-97-244.01309.60 i184.70 i72.40 i97.40 i0.1650 i3.67545 t239.63 t
n-octaneC8H18111-65-9114.23568.80 i398.80 i24.90 i492.00 i0.3980 i6.17328 u478.32 u
n-propylbenzeneC9H12103-65-1120.20638.20 i432.40 i32.00 i440.00 i0.3440 i5.99624 t493.97 t
n-tetradecaneC14H30629-59-4198.39693.00 i526.70 i14.40 i830.00 i0.5810 i7.68286 t536.38 t
octafluorotolueneC7F8434-64-0236.06534.47 g377.73 g27.05 g428.00 g0.4758 d5.97931 t413.68 t
o-difluorobenzeneC6H4F2367-11-3114.09554.46 h364.66 h40.67 h299.50 h0.3200 h,b5.33270 t429.15 t
oxygenO27782-44-732.00154.60 i90.20 i50.40 i73.40 i0.0250 i3.29728 t119.66 t
o-xyleneC8H1095-47-6106.17630.30 i417.60 i37.30 i369.00 i0.3100 i5.70029 t487.85 t
palladium(II) acetylacetonateC10H14O4Pd14024-61-4304.64651.12 b573.15 n4.13 b435.41 b1.0014 d4.90200 x994.14 x
p-chloronitrobenzeneC6H4ClNO2100-00-5157.56751.00 h515.15 h39.80 h432.00 h0.4910 h5.89621 t581.27 t
p-difluorobenzeneC6H4F2540-36-3114.09556.00 h362.00 h44.00 h299.50 h0.2990 h5.20720 t430.34 t
pentafluorobenzeneC6HF5363-72-4168.07530.97 g358.89 g35.31 g324.00 g0.3711 d5.49825 t410.97 t
phenanthreneC14H1085-01-8178.23873.00 i613.00 i29.00 h554.00 i0.4950 h6.77034 t675.70 t
phenylbutazoneC19H20N2O250-33-9308.38861.18 e674.85 e18.38 e933.55 e1.0126 d7.63140 t666.55 t
propaneC3H874-98-644.09369.80 i231.10 i42.50 i203.00 i0.1530 i4.50412 u457.99 u
propeneC3H6115-07-142.08364.90 i225.50 i46.00 i181.00 i0.1440 i4.49020 t282.43 t
p-xyleneC8H10106-42-3106.17616.20 i411.50 i35.10 i379.00 i0.3200 i5.76754 t476.94 t
pyreneC16H10129-00-0202.26936.00 h667.95 h26.10 h630.00 h0.5090 h7.11077 t724.46 t
quercetinC15H10O7117-39-5302.241468.74 f1187.59 f66.64 f730.50 f2.4842 d6.17951 t1136.80 t
squaleneC30H50111-02-4410.73716.50 s678.39 q7.03 s1601.00 f0.6380 d9.46409 t554.57 t
s-trioxaneC3H6O3110-88-390.08604.00 h387.65 h58.20 h206.00 h0.3340 h4.89292 t467.50 t
sulfur hexafluorideSF62551-62-4146.05318.70 i209.60 i37.60 i198.80 i0.2860 i4.76629 u271.68 u
tetrabutyltinC16H36Sn1461-25-2347.17767.97 b548.45 c17.25 b760.75 b0.3212 d7.53290 t594.41 t
tetraethyltinC8H20Sn597-64-8234.95655.92 b456.25 c25.75 b429.28 b0.3747 d6.45047 t507.68 t
tetramethyltinC4H12Sn594-27-4178.85511.77 b347.65 c34.18 b263.54 b0.3807 d5.49115 t396.11 t
tetrapropyltinC12H28Sn2176-98-9291.06759.88 b536.35 c20.66 b595.01 b0.3479 d7.16031 t588.15 t
tolueneC7H8108-88-392.14591.80 i383.80 i41.00 i316.00 i0.2630 i5.45450 u350.74 u
vitamin K3C11H8O258-27-5172.18893.85 e638.20 f31.96 e537.20 e0.6105 d6.62867 t691.84 t
waterH2 O7732-18-518.02647.30 i373.20 i221.20 i57.10 i0.3440 i3.24681 t501.01 t
xenonXe7440-63-3131.30289.70 i165.00 i58.40 i118.40 i0.0080 i3.85754 t224.23 t
a Taken from Valderrama and Rojas [53]; b Estimated by the Klincewicz method [31,48]; c Taken from ChemSpider [38]; d Estimated by the Lee-Kesler relation [54]; e Average of the values by the Joback [31,45,46] and Ambrose [31,49,50,51] methods; f Estimated by the Joback method [31,45,46]; g Taken from Korea Thermophysical Properties Data Bank (KDB) [39]; h Taken from Yaws (1998) [26]; i taken from Reid et al. [31]; j Taken from DIPPR database [40]; k Average of the values by the Joback [31,45,46] and Wen-Qiang [52] methods; l Taken from Yaws (2008) [41]; m Taken from sigma Aldrich data sheet; n Taken from LookChem [42]; o Taken from Zêzere et al. [56]; p Taken from Leite et al. [57]; q Taken from ASPEN database [43]; r Average of the values by the Joback [31,45,46] and Somayajulu [47] methods; s Taken from Catchpol et al. [58]; u Taken from Silva and Liu 2008 [59]; t Estimated by Equations (8) and (9) from reference [14]; v Taken from Cordeiro et al. [60]; x Taken from Cordeiro [44]; y Estimated by the Pitzer [55] equation.
Table 3. Required inputs for the new and classic diffusivity models.
Table 3. Required inputs for the new and classic diffusivity models.
ParametersProposed ModelsClassic Models
ML PolarML NonpolarWilke-Chang (Equation (1))Tyn-Calus (Equation (3))Magalhães et al. [9] (Equation (4))Zhu et al.
[13] (Equations (5)–(10))
T
ρ 1
μ 1
M 2
T c , 2
T bp , 2
P c , 2
V c , 2
w 2
σ LJ , 2
ε LJ , 2 / k B
M 1
T c , 1
T bp , 1
P c , 1
V c , 1
w 1
σ LJ , 1
ε LJ , 1 / k B
Fitted ----2-
Count654447
Note: The ● indicates the parameters required in each model.
Table 4. Performance of several machine learning (ML)models for the prediction of diffusivities in polar systems (test set) and comparison with classic predictive and correlation models.
Table 4. Performance of several machine learning (ML)models for the prediction of diffusivities in polar systems (test set) and comparison with classic predictive and correlation models.
ModelNSysNDPGlobal AARD (%)AARDarith
(%)
AARDmin
(%)
AARDmax
(%)
RMSEQ2 (R2)
***
ML Polar Multilinear Regression7943084.6580.654.00899.663.33 × 10−50.7215 (0.7504)
ML Polar k-Nearest Neighbors794308.9417.550.22317.431.20 × 10−50.9641 (1.0000)
ML Polar Decision Tree794307.1412.680.22229.697.83 × 10−60.9846 (1.0000)
ML Polar Random Forest794305.679.440.0482.926.67 × 10−60.9889 (1.0000)
ML Polar Gradient Boosted794305.078.000.0876.235.68 × 10−60.9919 (0.9998)
Wilke-Chang7943040.9241.351.37197.713.15 × 10−50.7519 (0.6790)
Tyn-Calus7943046.4938.412.8897.112.30 × 10−50.8672 (0.8399)
Magalhães et al. 76 *4195.196.230.1592.775.81 × 10−60.9917 (0.9977)
Zhu et al.****************
* Magalhães et al. correlation cannot be applied in three systems of the database due to the low number of points. ** Model of Zhu et al. is not applicable to polar systems. NSys: number of systems; NDP: number of data points; Global AARD: weighted deviation of all systems; AARDarith: arithmetic average of all systems; AARDmin: minimum AARD; and AARDmax: maximum AARD. *** Q2 (R2): R2 is the coefficient of determination for training and Q2 is the corresponding value for testing, in the case of ML models. For the Wilke-Chang, Tyn-Calus and Zhu et al. models all values are predicted.
Table 5. Performance of several machine learning (ML) models for the prediction of diffusivities in nonpolar systems (test set) and comparison with classic predictive and correlation models.
Table 5. Performance of several machine learning (ML) models for the prediction of diffusivities in nonpolar systems (test set) and comparison with classic predictive and correlation models.
ModelNSysNDPGlobal AARD (%)AARDarith(%)AARDmin(%)AARDmax(%)RMSEQ2 (R2) **
ML Nonpolar Multilinear Regression13034296.65111.950.911731.528.37 × 10−50.5590 (0.5779)
ML Nonpolar k-Nearest Neighbors13034213.6413.860.0063.052.93 × 10−50.9461 (0.9998)
ML Nonpolar Decision Tree13034213.2914.080.0090.965.08 × 10−50.8380 (0.9998)
ML Nonpolar Random Forest1303429.9410.290.0062.041.83 × 10−50.9789 (0.9998)
ML Nonpolar Gradient Boosted1303425.866.020.0325.871.39 × 10−50.9879 (0.9866)
Wilke-Chang13034229.1928.200.26172.306.66 × 10−50.7214 (0.5546)
Tyn-Calus13034228.8427.820.1864.977.01 × 10−50.6909 (0.7465)
Magalhães et al.125 *3246.196.210.04128.381.82 × 10−50.9801 (0.9890)
Zhu et al.13034237.9345.191.40222.456.35 × 10−50.7466 (0.8343)
* Magalhães et al. correlation cannot be applied in five systems of the database due to the low number of points. NSys: number of systems; NDP: number of data points; Global AARD: weighted deviation of all systems; AARDarith: arithmetic average of all systems; AARDmin: minimum AARD; and AARDmax: maximum AARD. ** Q2 (R2): R2 is the coefficient of determination for training and Q2 is the corresponding value for testing, in the case of ML models. For the Wilke-Chang, Tyn-Calus and Zhu et al. models all values are predicted.
Table 6. Calculated deviations of the individual systems of the polar database (divided into test and train sets) achieved by the best machine learning model of this work (Gradient Boosted) and classic equations adopted for comparison.
Table 6. Calculated deviations of the individual systems of the polar database (divided into test and train sets) achieved by the best machine learning model of this work (Gradient Boosted) and classic equations adopted for comparison.
NDPAARD (%)Data Ref.
ML Gradient BoostedWilke-ChangTyn-CalusMagalhães et al.
SolventSoluteTotalTestTrainTestTrainTestTrainTestTrainTestTrain
1-propanolammonia3114175.650.6033.9331.2519.4921.114.532.23[71]
1-propanolcarbon dioxide2711161.740.6954.3457.1271.2973.033.572.73[71]
1-propanolpropane369274.040.8748.2653.1162.7666.254.844.87[71]
1-propanolpropene3612242.661.2251.8256.3766.0169.223.844.81[71]
1-propanolwater52338.770.16153.58119.3046.1926.4218.770.93[72]
2-propanolbenzene10281.610.1819.828.2635.3726.1628.536.52[73]
2-propanolnaphthalene10370.930.237.6413.0224.7224.059.0610.74[73]
2-propanoln-decane10370.740.2011.6820.4523.0930.723.8115.80[73]
2-propanoln-tetradecane9546.360.7215.4414.8820.8521.6024.452.49[73]
2-propanolphenanthrene93610.060.4623.855.4634.6613.5392.771.72[73]
2-propanoltoluene10197.160.1918.919.8736.9426.7713.698.03[73]
2-propanolwater51441.120.44130.88143.0233.1040.104.570.83[72]
acetone1,2,4-trichlorobenzene6245.850.4810.5311.9527.1028.263.591.08[74]
acetone1,3,5-trimethylbenzene5230.750.0618.8119.1032.7733.010.150.61[74]
acetonebenzene6 6 0.19 13.32 34.40 0.36[74]
acetonebiphenyl6154.350.6318.7918.7930.9930.990.460.40[74]
acetonechlorobenzene6 6 0.14 13.57 32.58 0.85[74]
acetoneethylbenzene6150.080.2318.7619.0734.4434.680.170.43[74]
acetonenaphthalene5 5 0.28 18.33 32.93 0.42[74]
acetonen-propylbenzene5410.980.0021.0921.1434.4734.52 [74]
acetonetoluene5 5 0.12 16.89 34.87 0.38[74]
acetonewater4135.940.0683.5385.646.607.820.800.87[75]
acetonitrile[Bmim][bti]5142.190.5150.2749.1048.6347.430.601.19[76]
acetonitrile[Emim][bti]5141.630.0247.8346.6447.2546.061.101.35[76]
acetonitrile[Hmim][bti]5 5 0.29 48.77 46.06 1.94[76]
acetonitrile[Omim][bti]5141.120.2648.9949.2245.3645.610.181.04[76]
acetonitrilecarbon disulfide53216.393.7622.6428.6441.9146.4210.72 [77]
acetonitrilemethanol206146.490.9620.2815.7943.2540.051.441.78[77]
chlorobenzenepropene329230.950.259.439.8832.7732.491.011.12[78,79]
chlorotrifluoromethane1,3-dibromobenzene12399.311.21147.23148.4875.1876.066.854.14[80]
chlorotrifluoromethaneacetone102816.170.7893.8793.6624.1824.058.003.55[80]
chlorotrifluoromethanep-xylene8177.050.6575.6198.4024.8441.042.313.68[80]
deuterium oxideoxygen187115.430.2720.3316.5738.8735.994.647.55[81]
ethanol1,2-butanediol52337.201.2730.6527.4113.2715.422.610.24[82]
ethanol1,3,5-trimethylbenzene13584.090.5413.2218.9521.1319.421.651.92[83]
ethanol1,4-butanediol44 63.79 48.40 2.88 [82]
ethanol1-butanol43120.443.6417.2517.9522.9622.49 [82]
ethanol2-phenylethyl acetate154112.641.3816.8917.8038.8639.532.981.97[84]
ethanol3-phenylpropyl acetate153122.590.9114.3013.4935.8235.213.931.76[84]
ethanolammonia185133.842.0036.2442.9229.1125.635.323.18[71]
ethanolbenzene218133.421.1627.3524.3725.7435.546.1612.16[82,83]
ethanolbenzonitrile16881.860.9724.9725.3448.8649.110.831.02[85]
ethanolbenzyl acetate155104.430.9817.9713.9341.2738.383.362.89[84]
ethanolcarbon dioxide279184.822.2149.7446.5672.6470.905.083.73[71]
ethanolchromium(III) acetylacetonate9187.170.7720.7916.818.3111.332.992.24[86,87]
ethanoldibenzyl ether155103.001.5222.4725.9041.4744.064.261.37[84]
ethanoldisperse blue 148262.755.2322.1022.7338.7739.265.6110.24[88]
ethanoldisperse orange 11122100.440.1720.4215.1738.8934.866.152.75[88]
ethanolethylene glycol52376.230.0561.0357.904.282.655.061.36[82]
ethanoleucalyptol12487.021.0610.5813.8534.5536.940.480.65[56]
ethanolgallic acid245195.140.61134.06132.7153.9253.041.570.79[57]
ethanolglycerol5 5 1.08 52.51 4.59 3.28[82]
ethanolIbuprofen16794.871.074.975.5119.0518.630.920.81[89]
ethanolnaphthalene132118.860.1621.4314.2530.8820.3611.331.13[83]
ethanolnitrous oxide5 5 0.26 44.94 69.83 0.68[90]
ethanolpalladium(II) acetylacetonate4134.840.0315.5218.8517.7415.360.670.80[87]
ethanolphenanthrene1321111.230.064.2511.3422.5617.302.831.26[83]
ethanolphenylbutazone8177.872.0210.2710.7210.269.892.012.13[91]
ethanolpropane307234.311.9343.0642.5664.5264.217.488.90[71]
ethanolpropene305251.781.5243.3045.7465.3766.867.807.72[71]
ethanolquercetin166107.151.7940.5840.599.609.610.861.11[92]
ethanoltoluene14775.020.1220.4517.5424.1420.868.930.70[83]
ethanolwater1521315.260.90131.04145.2015.3122.374.864.30[75,82,93]
ethyl acetateastaxanthin12571.500.5611.4414.298.8311.611.512.85[94]
ethyl acetatequercetin164122.900.5244.6950.1719.7824.313.181.80[92]
ethyl acetatesqualene122102.010.577.708.8612.3413.441.540.98[94]
ethylene glycolpropene319221.360.8648.9448.8164.2464.141.411.70[78,79]
methanol[Bmim][bti]11563.461.6942.5841.6554.5653.825.002.15[76,95]
methanol[Emim][bti]11475.070.2340.2541.8553.7254.965.241.64[76,95]
methanol[Hmim][bti]5234.250.5436.5739.0448.8350.823.910.74[76]
methanol[Omim][bti]5 5 0.61 39.02 49.96 1.33[76]
methanol1,3,5-trimethylbenzene4 4 1.15 15.85 42.38 3.25[73]
methanolacetonitrile269172.941.3027.8826.5057.9457.132.191.63[77]
methanolammonia246180.931.78106.11114.443.677.784.253.93[71]
methanolbenzene4132.790.551.8812.4938.5945.233.284.28[73]
methanolcarbon dioxide2510153.800.7930.7230.8663.7063.774.053.84[71]
methanolcarbon monoxide8174.880.5223.3614.7859.8955.028.753.60[96]
methanoldisperse blue 148263.700.5957.6951.9767.9963.668.221.11[88]
methanoldisperse orange 11165112.650.2651.2552.7163.9765.053.011.96[88]
methanolnaphthalene4227.590.0817.6015.9844.0442.9419.90 [73]
methanolp-chloronitrobenzene187111.470.6022.4622.6646.9347.061.081.07[97]
methanolphenanthrene41313.710.4812.1521.2937.1943.734.443.14[73]
methanolpropane2711162.141.5024.0827.0054.4756.222.432.45[71]
methanoltoluene4 4 0.25 14.12 44.35 3.66[73]
methanolvitamin K34 4 0.45 25.59 47.09 0.45[98]
methanolwater52328.860.74310.36281.3597.1183.1811.090.18[99]
n-butanolammonia6417472.631.7538.1238.4120.7320.565.365.81[71]
n-butanolcarbon dioxide6619471.191.0647.2645.5168.3367.285.986.27[71]
n-butanolpropane9833651.861.5149.7049.5265.4365.312.583.15[71]
n-butanolpropene13545902.831.6650.4848.5366.6465.335.153.90[71]
Table 7. Calculated deviations of the individual systems of the nonpolar database (divided into test and train sets) achieved by the best machine learning model of this work (Gradient Boosted) and classic equations adopted for comparison.
Table 7. Calculated deviations of the individual systems of the nonpolar database (divided into test and train sets) achieved by the best machine learning model of this work (Gradient Boosted) and classic equations adopted for comparison.
SolventSoluteNDPAARD (%)Data Ref.
ML Gradient
Boosted
Wilke-ChangTyn-CalusMagalhães et al.Zhu et al.
TotalTestTrainTestTrainTestTrainTestTrainTestTrainTestTrain
2,2,4-trimethylpentane1,3,5-trimethylbenzene4 4 2.11 21.44 17.70 0.64 171.90[100]
2,2,4-trimethylpentanebenzene4132.491.3611.3114.9831.0528.780.042.33128.60119.69[100]
2,2,4-trimethylpentaneethylbenzene4 4 3.68 19.42 21.11 1.79 157.43[100]
2,2,4-trimethylpentaneo-xylene4 4 1.96 16.19 23.43 2.78 147.48[100]
2,2,4-trimethylpentanep-xylene4136.046.7615.575.1123.4833.844.272.74116.04126.93[100]
2,2,4-trimethylpentanetoluene4 4 2.21 10.10 29.38 2.07 126.50[100]
2,3-dimethylbutanebenzene11293.223.1014.7413.2940.8539.841.781.749.457.59[101]
2,3-dimethylbutanenaphthalene9271.281.6818.3519.0238.5339.040.612.181.802.59[101]
2,3-dimethylbutanephenanthrene11290.650.6320.7520.5137.1936.992.441.632.395.87[101]
2,3-dimethylbutanetoluene10282.523.3615.8917.5339.5840.752.842.174.974.77[101]
cyclohexane1,1′-dimethylferrocene5231.071.649.738.3017.4018.482.410.26192.52197.96[102]
cyclohexane1,3,5-trimethylbenzene121119.043.826.7314.1328.8314.338.288.3216.0759.79[103,104]
cyclohexaneacetone4222.310.0120.9619.7746.9146.100.96 106.9992.31[104]
cyclohexaneargon7349.784.636.892.5443.3244.855.542.0540.3366.48[105]
cyclohexanebenzene1221012.002.9624.5517.5713.1318.7812.408.0592.0561.13[104,106]
cyclohexanecarbon tetrachloride7250.501.0215.0423.3518.8813.023.280.9653.23103.63[105]
cyclohexaneethane51413.531.233.432.2234.5737.180.291.23183.3386.88[107]
cyclohexaneethylene5141.931.060.261.7437.9937.801.601.0866.73110.83[107]
cyclohexaneethylferrocene6150.680.495.538.1820.5618.561.180.75178.04169.05[102]
cyclohexaneferrocene5322.840.0815.2413.6216.7017.871.370.2049.7958.60[102]
cyclohexanekrypton6339.012.6016.3215.1632.4233.093.071.2754.8578.43[105]
cyclohexanemethane64213.800.419.749.0846.7846.397.63 49.3022.59[105]
cyclohexanem-xylene4 4 1.01 21.96 41.90 1.29 94.56[104]
cyclohexanenaphthalene124810.333.6414.6410.8714.9118.189.986.9041.9439.98[104,106]
cyclohexanephenanthrene8355.641.434.824.2719.0223.034.822.494.347.53[106]
cyclohexanep-xylene8 8 2.31 4.13 28.00 3.63 28.67[106]
cyclohexanetetrabutyltin72510.031.3820.8725.567.519.583.791.6411.6414.39[105]
cyclohexanetetraethyltin7250.611.7824.3724.297.918.241.432.1757.4357.01[105]
cyclohexanetetramethyltin7254.770.4829.9033.829.137.472.311.0690.3795.59[105]
cyclohexanetetrapropyltin6423.961.0121.8930.997.238.082.03 21.9921.49[105]
cyclohexanetoluene122108.493.0616.2210.8418.7520.2111.657.3151.7656.57[104,106]
cyclohexanexenon7615.420.0225.1714.3223.8830.48 83.09150.96[105]
ethane1-octene6246.961.273.275.541.865.435.881.9617.169.15[108]
ethane1-tetradecene9456.060.2820.1020.8413.6714.483.843.7121.7813.41[108]
n-decane12-crown-44138.734.4420.7723.6417.5215.560.664.9942.6740.09[109]
n-decane15-crown-54138.771.2441.3922.030.1813.5422.170.6928.3121.11[109]
n-decane18-crown-64132.172.0530.5825.534.638.3114.504.293.544.36[109]
n-decaneargon32111.890.1112.7910.5644.3255.28 26.3591.32[110]
n-decanecarbon tetrachloride3 3 5.74 17.09 26.45 1.24 71.87[110]
n-decanedicyclohexano-18-crown-64 4 0.68 25.60 2.44 1.27 83.59[109]
n-decanedicyclohexano-24-crown-84317.950.1325.8233.283.158.46 119.04192.36[109]
n-decanekrypton3 3 2.49 23.85 35.90 3.27 69.93[110]
n-decanes-trioxane4 4 2.24 24.60 25.63 0.91 50.71[109]
n-decanetetrabutyltin4133.530.7129.4129.092.913.221.570.9622.5420.34[110]
n-decanetetraethyltin41325.8724.861.636.6633.2330.980.591.9819.1513.44[110]
n-decanetetramethyltin4224.387.9037.5936.5614.2514.902.61 75.1068.10[110]
n-decanetetrapropyltin4130.831.6526.8129.978.876.600.682.0026.7224.34[110]
n-decanexenon81715.122.571.4618.7646.6135.665.993.19137.7182.50[110,111]
n-dodecane1,3,5-trimethylbenzene4224.640.316.991.7039.2335.473.02 130.38107.10[104]
n-dodecaneacetone5146.180.825.134.6445.4445.150.901.3798.06103.24[104]
n-dodecanebenzene4223.420.694.973.8843.2542.601.570.00122.78121.55[104]
n-dodecanecarbon dioxide9365.852.8661.8388.2519.089.1611.391.5630.1422.67[112]
n-dodecanecarbon monoxide93615.152.8773.1352.0613.5524.077.287.6924.5729.78[112]
n-dodecanehydrogen9547.786.8425.1321.1764.9763.1110.129.6747.7249.66[112]
n-dodecanelinoleic acid methyl ester4 4 1.10 13.54 13.08 0.37 42.50[104]
n-dodecanem-xylene4 4 1.39 10.17 42.74 0.62 108.09[104]
n-dodecanenaphthalene5234.990.755.6410.1138.8641.753.550.9379.4081.82[104]
n-dodecanen-decane5140.032.2156.6145.008.434.591.983.7111.3034.63[113]
n-dodecanen-hexadecane51410.771.0865.5957.2823.6817.475.920.8319.2119.32[113]
n-dodecanen-octane9632.310.1647.8750.946.183.7310.341.1733.868.63[113]
n-dodecanen-tetradecane5142.941.2339.8959.702.4216.9216.341.5320.3516.40[113]
n-dodecanetoluene4225.840.797.9011.5743.0545.332.72 95.30125.51[104]
n-dodecanevitamin K34130.190.2210.3111.5939.1440.010.220.9834.6338.39[104]
n-eicosanecarbon dioxide52316.170.01172.30147.9521.7912.930.711.138.0629.92[114]
n-eicosanecarbon monoxide52310.884.29114.69136.898.767.810.550.5450.1519.59[114]
n-eicosanehydrogen5143.25110.788.54252.3261.72129.43128.3873.304.1299.16[114]
n-eicosanen-dodecane52313.721.97138.63134.4952.1949.551.740.9167.8255.95[114]
n-eicosanen-hexadecane54116.161.99141.26144.6861.1963.48 61.9630.75[114]
n-eicosanen-octane5237.530.63134.16124.8339.9534.372.541.6754.7657.07[114]
n-heptane1,3,5-trimethylbenzene4220.870.754.115.3323.5522.651.43 7.929.31[115]
n-heptane2,2,4-trimethylpentane4224.540.521.392.8524.5323.440.580.1023.4021.87[116]
n-heptanebenzene11473.622.154.506.1429.8628.761.913.078.7112.97[115,117]
n-heptaneethylbenzene4 4 5.10 8.27 22.51 0.23 14.85[115]
n-heptanen-decane6154.292.7915.136.9933.9624.478.692.4210.945.52[113,118]
n-heptanen-dodecane6335.280.144.4114.0019.2031.0959.492.216.1324.60[113,118]
n-heptanen-hexadecane9366.510.655.885.3917.1416.551.001.3826.0925.64[119,120,121]
n-heptanen-hexane11385.030.798.4510.0234.5935.722.670.7716.4410.75[113,119,121]
n-heptanen-octane133107.321.727.369.2927.5331.882.941.284.502.94[113,118]
n-heptanen-tetradecane6332.731.127.659.4521.4622.012.561.5128.6233.71[113,118]
n-heptaneo-xylene4224.352.027.082.4929.0329.160.62 3.703.50[115]
n-heptanep-xylene4136.862.705.357.2232.1033.440.700.661.401.36[115]
n-heptanetoluene4315.711.043.635.0033.0027.03 4.305.86[115]
n-hexadecanecarbon dioxide10462.021.8192.63112.9913.7515.197.114.5337.4832.03[112]
n-hexadecanecarbon monoxide10373.493.5980.6391.3216.7613.823.774.8352.9349.96[112]
n-hexadecanehydrogen10736.891.0424.7718.4359.1954.0012.880.9938.6634.17[112]
n-hexadecanen-decane51410.891.4062.6379.235.7616.555.161.59152.4839.94[122]
n-hexadecanen-dodecane5 5 1.16 75.79 17.71 2.72 55.76[122]
n-hexadecanen-octane10196.820.5788.9776.4218.5910.711.453.0022.1368.70[122]
n-hexadecanen-tetradecane5231.591.0970.9778.3317.5722.632.132.3950.4236.41[122]
n-hexane1,1′-dimethylferrocene4130.960.2715.9216.7413.2012.581.010.0845.2846.51[102]
n-hexane1,2,3,5-tetrafluorobenzene7254.313.2420.2117.5241.6239.641.225.137.1410.04[123]
n-hexane1,2,4,5-tetrafluorobenzene7251.981.2520.4416.2541.7838.723.094.4413.9316.22[123]
n-hexane1,2,4-trifluorobenzene7254.610.8724.2814.8845.0438.225.761.4612.966.40[123]
n-hexane1,3,5-trimethylbenzene207132.981.5910.098.3431.6630.425.245.798.524.45[103,104]
n-hexane9,10-dimethylanthracene84413.893.0012.7919.0227.5632.736.220.32116.0283.34[124]
n-hexaneacetone5232.701.085.053.7336.5534.675.601.0510.754.24[104]
n-hexaneacetonitrile7 7 2.40 5.79 39.09 2.70 22.16[125]
n-hexanebenzene4818303.482.396.167.8631.0731.349.046.6015.6625.65[103,104,107,123,125,126,127,128]
n-hexanecarbon disulfide10464.493.582.3210.1635.2429.757.203.5244.8876.81[125]
n-hexanecarbon tetrabromide8177.971.0230.3419.555.7216.141.958.28168.32115.24[124]
n-hexaneethylferrocene4 4 0.61 18.11 11.55 0.12 35.49[102]
n-hexaneferrocene4133.840.4031.1122.975.7211.570.410.1517.2816.84[123]
n-hexanehexafluorobenzene7252.231.897.4610.5031.3034.962.193.7615.6621.90[123]
n-hexaneindole2 2 0.62 10.64 32.22 13.24[104]
n-hexanelinoleic acid methyl ester2 2 2.02 2.08 12.90 95.99[104]
n-hexanem-xylene5231.770.049.328.0132.8431.871.822.565.264.57[104]
n-hexanenaphthalene215163.432.4412.2311.9533.9233.714.954.328.1910.88[103,104,125,126]
n-hexanen-heptane11564.881.1613.0012.3529.0333.497.300.9313.252.53[119,120,121,129]
n-hexanen-octane7252.011.1412.7612.6432.2832.191.050.302.051.68[119,129]
n-hexaneoctafluorotoluene7160.230.2621.398.5340.4530.164.302.9213.4015.97[123]
n-hexaneo-difluorobenzene7252.250.869.7212.6435.7537.834.292.443.5716.35[123]
n-hexanep-difluorobenzene7342.620.4419.679.7342.8335.7624.350.7927.692.93[123]
n-hexanepentafluorobenzene7162.790.391.8412.1126.5236.581.963.786.0617.91[123]
n-hexanephenanthrene15693.331.6014.0714.2531.8932.044.375.7214.1811.93[103]
n-hexanep-xylene174136.352.4415.8910.6237.6233.724.154.569.328.04[103,104]
n-hexanepyrene82610.5410.5162.0350.2731.3521.818.624.72153.51103.03[124,126]
n-hexanetoluene3214184.582.658.468.1432.1930.744.983.7212.5619.33[103,104,130,131]
n-hexanevitamin K35143.321.0111.3116.3630.0934.070.780.7334.1937.24[104,132]
n-octane1,3,5-trimethylbenzene8352.311.747.216.9923.5123.670.470.6223.9722.64[103,104]
n-octaneargon4131.765.826.6914.0944.0141.091.931.3023.8118.74[110]
n-octanebenzene8261.900.792.802.8734.6435.640.290.2015.4517.40[100,115]
n-octanecarbon tetrachloride4 4 1.24 15.76 23.67 1.05 34.90[110]
n-octaneethylbenzene8446.656.503.657.2128.1425.443.221.1924.9422.80[100,115]
n-octanekrypton41314.061.7522.3130.1133.5629.323.720.3940.3636.12[110]
n-octanemethane4139.772.4610.463.3450.6445.686.320.4045.7910.22[110]
n-octanen-heptane7436.340.5511.4311.6036.8436.961.420.1220.8717.70[119,133]
n-octanen-hexane6424.660.295.476.9534.3935.423.46 34.5022.19[119]
n-octaneo-xylene8 8 1.39 1.30 31.53 0.73 14.39[100,115]
n-octanep-xylene8175.493.859.508.8036.9236.430.990.834.217.47[100,115]
n-octanetetrabutyltin4130.141.8521.4233.044.4010.5614.343.935.2214.77[110]
n-octanetetraethyltin5 5 4.29 34.09 14.16 3.79 17.89[110]
n-octanetetramethyltin4 4 1.78 44.76 15.82 8.09 35.98[110]
n-octanetetrapropyltin4132.180.3222.1135.537.9012.770.1010.736.736.87[110]
n-octanetoluene8170.280.531.253.0831.6433.551.921.2812.4317.20[100,115]
n-octanexenon8357.172.4214.6518.8434.8133.033.505.9543.5948.17[110,111]
n-tetradecaneacridine8446.320.9925.6219.8618.6721.215.127.8550.2848.90[134]
n-tetradecaneargon41313.853.814.2124.1355.4860.173.714.3544.6676.49[110]
n-tetradecanebenzothiophene7349.792.2737.1540.8315.3513.082.673.25112.4181.45[134]
n-tetradecanecarbon tetrachloride4 4 2.36 16.38 32.05 2.54 181.74[110]
n-tetradecanedibenzothiophene83512.202.9331.2840.5214.598.587.342.4358.1073.29[134]
n-tetradecanekrypton4 4 4.58 17.50 49.11 6.70 102.06[110]
n-tetradecanemethane42217.291.8017.8841.8659.6171.6258.01 34.8492.68[110]
n-tetradecanenaphthalene7 7 2.83 14.51 28.99 2.67 74.98[134]
n-tetradecanetetrabutyltin42217.371.9540.2736.0111.385.164.94 116.75115.45[110]
n-tetradecanetetraethyltin4 4 3.09 29.87 18.07 5.56 143.59[110]
n-tetradecanetetramethyltin42213.080.0629.0540.6825.2518.526.64 202.70152.93[110]
n-tetradecanetetrapropyltin41313.990.2153.6125.952.6015.876.511.6967.58126.86[110]
n-tetradecanexenon8170.642.427.1116.2253.2347.681.985.78222.45179.79[110,111]
propane1-octene8170.070.8818.4119.5227.3428.330.061.687.429.54[108]
propane1-tetradecene8353.540.3836.4030.9736.8431.453.520.9848.4831.59[108]
sulfur hexafluoride1,3,5-trimethylbenzene10 10 0.86 90.68 28.87 4.43 14.17[80]
sulfur hexafluoridebenzene9271.083.6585.9386.2714.8517.6210.256.775.627.82[80]
sulfur hexafluoridebenzoic acid63322.484.26150.51144.3662.3858.393.110.1122.7011.88[135]
sulfur hexafluoridecarbon tetrachloride7252.811.6995.35134.5822.0146.522.711.8633.2312.98[80]
sulfur hexafluoridenaphthalene5234.511.5462.5374.748.9417.129.700.3816.107.74[135]
sulfur hexafluoridep-xylene5214384.092.1688.2888.4424.3224.422.514.625.618.54[80]
sulfur hexafluoridetoluene11474.371.9588.4383.3520.5217.274.953.504.668.66[80]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Aniceto, J.P.S.; Zêzere, B.; Silva, C.M. Predictive Models for the Binary Diffusion Coefficient at Infinite Dilution in Polar and Nonpolar Fluids. Materials 2021, 14, 542. https://doi.org/10.3390/ma14030542

AMA Style

Aniceto JPS, Zêzere B, Silva CM. Predictive Models for the Binary Diffusion Coefficient at Infinite Dilution in Polar and Nonpolar Fluids. Materials. 2021; 14(3):542. https://doi.org/10.3390/ma14030542

Chicago/Turabian Style

Aniceto, José P. S., Bruno Zêzere, and Carlos M. Silva. 2021. "Predictive Models for the Binary Diffusion Coefficient at Infinite Dilution in Polar and Nonpolar Fluids" Materials 14, no. 3: 542. https://doi.org/10.3390/ma14030542

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop