Predictive Models for the Binary Diffusion Coefficient at Infinite Dilution in Polar and Nonpolar Fluids

Experimental diffusivities are scarcely available, though their knowledge is essential to model rate-controlled processes. In this work various machine learning models to estimate diffusivities in polar and nonpolar solvents (except water and supercritical CO2) were developed. Such models were trained on a database of 90 polar systems (1431 points) and 154 nonpolar systems (1129 points) with data on 20 properties. Five machine learning algorithms were evaluated: multilinear regression, k-nearest neighbors, decision tree, and two ensemble methods (random forest and gradient boosted). For both polar and nonpolar data, the best results were found using the gradient boosted algorithm. The model for polar systems contains 6 variables/parameters (temperature, solvent viscosity, solute molar mass, solute critical pressure, solvent molar mass, and solvent Lennard-Jones energy constant) and showed an average deviation (AARD) of 5.07%. The nonpolar model requires five variables/parameters (the same of polar systems except the Lennard-Jones constant) and presents AARD = 5.86%. These results were compared with four classic models, including the 2-parameter correlation of Magalhães et al. (AARD = 5.19/6.19% for polar/nonpolar) and the predictive Wilke-Chang equation (AARD = 40.92/29.19%). Nonetheless Magalhães et al. requires two parameters per system that must be previously fitted to data. The developed models are coded and provided as command line program.


Introduction
Diffusivities are important properties for the proper design, simulation and scale-up of rate-controlled separations and chemical reactions, where they are required for the estimation of dispersion coefficients, convective mass transfer coefficients, and catalysts effectiveness factors [1][2][3]. However, diffusivity data is scarce both in terms of compounds and operating conditions, leading to the need of accurate models capable of predicting diffusivities when no experimental data is available [4].
Currently the Wilke-Chang model [5], proposed in 1955, remains the most widely used equation to estimate binary diffusivities mainly due to its simplicity. It requires only knowledge of solvent viscosity, solute molar mass, solute volume at normal boiling point and operating conditions like temperature. Other hydrodynamic equations have been proposed such as Scheibel [6], Tyn-Calus [7], and Hayduk and Minhas [8]. Correlative models validated for both polar and nonpolar systems have been put forward by Magalhães et al. [9,10], and one may also cite the 2-parameter correlation of Dymond-Hildebrand-Batschinski (DHB) [11,12], based on the free-volume theory, for nonpolar and weakly polar systems at moderate densities. However, these correlations require that data of a given system is available in order to interpolate and extrapolate diffusivities for the desired condition. Hybrid models are also available, such as the predictive Zhu et al. [13] and the predictive Tracer Liu-Silva-Macedo (TLSM) and its 1-parameter correlations (TLSM d and TLSM en ) [4,14,15]. These are Lennard-Jones fluid models and comprehend two contributions: a free-volume part and an energy component.     Polar and nonpolar systems were separated into two databases based on the polarity of the solvent and, for each, data were split randomly 70/30% into training and testing sets. The training set was used for model learning and fitting, and the testing set was used to evaluate the performance of the fitted model after learning. Information from the testing set is never known during learning. In order to guarantee that all models are fed the same data, these data sets were also used for the evaluation of the classic models.
Most learning algorithms benefit from scaling input variables in order to improve model robustness and training speed [61]. The most common scaling strategies are normalization or standardization. Normalization consists in transforming the real range of values into a standard range (e.g., [0, 1] or [−1, 1]). Standardization consists in transforming variables so that they follow a standard normal distribution (mean of zero and standard deviation of one). In this work, variables were normalized to the [0, 1] range before passing them to training.

Variable Selection and Hyper-Parameter Optimization
Model variables were selected from the ones shown in Table 1 while removing collinear variables systematically. For each pair of variables with collinearity above a defined threshold of 0.50, the one with lower correlation with D 12 was removed from the model. The simplicity of obtaining a variable for a given system was also considered if both show similar correlation with D 12 . This was done to improve the simplicity and ease of use of the final model.
Besides the model parameters discussed thus far, each learning algorithm possesses a set of parameters, which can be seen as configuration options, that specify how the algorithm behaves. These variables are often called hyper-parameters and are not fitted to data but rather must be set before training. Hyper-parameters were optimized for each learning algorithm using a grid search method with 4-fold cross validation implemented using GridSearchCV of scikit-learn (version 0.22.1). This method performs an exhaustive search for the best hyper-parameter values in a predefined grid by evaluating the model performance by 4-fold cross-validation. The k-fold cross-validation approach divides the training set into k subsets and trains the model with data from k − 1 of the folds while testing it on the fold. This process is repeated using every different k − 1 fold combination and the best model (best combination of hyper-parameters) emerges as that with the best average performance while avoiding both overfitting and underfitting of the models. The evaluated hyper-parameters for each learning algorithm are showed in Table S1 of the Supplementary Material.

Machine Learning Algorithms
Five ML algorithms were evaluated for the prediction of binary diffusivities: A multilinear regression, a k-nearest neighbors model, a decision tree algorithm, and two ensemble methods (random forest and gradient boosted). Models were implemented using the Python machine learning library scikit-learn version 0.22.1 [62].
A simple ordinary least squares multilinear regression was used as a baseline model for the prediction of binary diffusivities. In a multilinear regression [63], the target value, y, is a linear combination of explanatory variables, x i , weighted by coefficients b i . The coefficients are optimized to minimize the residual sum of squares between the observed and the calculated values. It was implemented using the LinearRegression class in scikit-learn.
The k-nearest neighbors (kNN) [64,65] is one of the simplest machine learning algorithms. Its prediction is the average of its k closest neighbors in the input space. Neighbors are selected from a set of examples for which the target property is known. This can be seen as the training set, although unlike other algorithms, kNN does not require an explicit training phase. The nearest neighbors are identified by position vectors in the multidimensional input space, usually in terms of Euclidean distance, nonetheless other distance measures could be applied. The kNN algorithm was implemented using the KNeighborsRegressor class in scikit-learn.
Decision tree [65,66] models take the training data and create a set of decision rules that are applied to the explanatory variables. Prediction is performed by following these tree-like rule graphs and selecting the paths that return the best metric, usually lowest entropy or largest information gain, until an output node is reached. The decision tree algorithm was implemented using the DecisionTreeRegressor class in scikit-learn.
Finally, ensemble methods are a combination of a large number of simple models, thus improving generalizability and robustness over a single model [63]. They can be divided into averaging ensemble methods, as the random forest algorithm, and boosting ensemble methods, such as the gradient boosted model, and have proven to be effective for regression learning [67].
Random forests [65,68] are comprised by several strong models, such as decision trees, trained independently. For the construction of each tree a random subset of training data is selected, while the remaining subset is used for testing. The final prediction is obtained as an average of the ensemble. Random forests are fast and simple to apply as they have simpler hyper-parameters settings than other methods, can be applied in cases with a large amount of noise and are less prone to overfitting [65]. The random forest model was implemented in scikit-learn using the RandomForestRegressor class.
Gradient boosted [69] models combine several learners, which are not independently trained but combined so that each new learner mitigates the bias of the previous one. The gradient boosted model also uses decision trees which are fitted to the gradient of a loss function, for instance, the squared error. The gradient is calculated for every sample of the training set but only a random subset of those gradients is used at by each learner. Gradient boosted has showed to provide very good predictions at least on par with random forests and usually superior to other methods [70]. The gradient boosted algorithm was implemented using the GradientBoostingRegressor class.

Classic Models
Several classic D 12 models were used as a benchmark for the proposed ML models, including the still extensively used Wilke-Chang equation [5], the Tyn-Calus equation [7], one of the Magalhães et al. correlations [9], and the Zhu et al. hybrid model [13]. Bellow, these models are briefly presented.
The Wilke-Chang equation [5] is an empirical modification of the Stokes-Einstein relation and is given by: where φ (dimensionless) is the association factor of the solvent (1.9 for the case of methanol, 1.5 for ethanol and 1.0 if it is unassociated [31]), and V bp,2 (cm 3 mol −1 ) is the solute molar volume at normal boiling temperature, which can be estimated using the critical volume (V c,i ) by the Tyn-Calus relation [31,37]: The Tyn-Calus equation [7] is another commonly used hydrodynamic equation, which is described by: Magalhães et al. [9] proposed nine correlations for D 12 , and four of them depend explicitly on solvent viscosity and temperature. Here we adopt the following: where a and b are fitted parameters for each system. This equation consists of a modification of the Stokes-Einstein theory [31]. Zhu et al. [13] developed a hybrid model containing a component related with the free volume and another related with energy. It was devised for the estimation of D 12 of real nonpolar fluids. It is described by: where the subscripts 1 and 2 denominate solvent and solute, respectively, m 1 is the mass of the solvent, and ρ * 12 and T * 12 are the density and temperature reduced using binary Lennard-Jones (LJ) parameters ε LJ,12 and σ LJ,12 as described by: The binary LJ parameters are calculated by the following combining rules: and the interaction parameter k d 12 is estimated through: Finally, the LJ parameters ε LJ /k B and σ LJ for the solute are calculated by: and for the solvent: where ρ n,c,1 is the number critical density (cm −3 ) and ρ r,1 and T r,1 are the reduced density and reduced temperature of the solvent, calculated with the corresponding critical constants: P r,1 = P 1 /P c,1 and T r,1 = T 1 /T c,1 .

Machine Learning Models
The first step towards model development was the choice of relevant variables for the model. Selection was conducted on the basis of the collinearities between the available variables/properties and their level of correlation with the diffusivity. Figures 1 and 2 show the correlation matrix (in the form of a heat map) for the polar and nonpolar data sets, where the values represent the absolute Pearson correlation. When two variables presented collinearities above a defined threshold of 0.50, only one was kept in the model, namely the one providing of the best correlation with diffusivity. Following this procedure, six variables were selected for the polar diffusivity model: temperature, solvent viscosity, solute molar mass, solute critical pressure, solvent molar mass, and the Lennard-Jones energy constant of solvent. For the nonpolar diffusivity model, temperature, solvent viscosity, solute molar mass, solute critical pressure, and solvent molar mass were chosen, totaling five variables. A summary of the variables required for the machine learning models for polar (ML Polar) and nonpolar (ML Nonpolar) systems is presented in Table 3, together with the required inputs for the classic models of Wilke-Chang, Tyn-Calus, Magalhães et al., and Zhu et al. The two hydrodynamic equations require four input variables, the same number as the Magalhães et al. correlation although, in this later case, two of the four parameters must be fitted to experimental data, thus reducing the model applicability. The Zhu et al. hybrid model requires the larger number of parameters (seven) and is only applicable to nonpolar systems.
The performance of all models was evaluated by calculating the average absolute relative deviation (AARD) of each system: where superscripts calc and exp denote calculated and experimental values, and NDP is the number of data points of a system. For the whole database, the global deviation (i.e., weighted AARD) and the arithmetic systems average (AARD arith ) were calculated. The minimum and maximum system AARD are reported as an indication of the performance of the best and worst systems. The root mean square error (RMSE) was also calculated and is defined as: The coefficient of determination, R 2 , which is calculated for the training set, and the Q 2 value, which corresponds to R 2 value obtained when applying the model to the test set, are also reported for all models. Table 3. Required inputs for the new and classic diffusivity models.

Proposed Models
Classic Models

Zhu et al. [13] (Equations (5)-(10))
Note: The • indicates the parameters required in each model.   A final validation of the best machine learning models was conducted by performing a y-randomization test (also called y-scrambling). This test compares the performance of the original model with that of models built for a scrambled (randomly shuffled) response while still following the original model building procedure. The randomization process eliminates the relation between the independent variables and target response. If the performance of the models when using scrambled data is much lower than when using original data, one can be confident of the relevance of the original model. Five algorithms were tested to develop the supervised learning models including a multilinear regression, k-nearest neighbors, decision tree, random forest (an averaging ensemble method), and gradient boosted (a boosting ensemble method). The performance of the several machine learning algorithms when applied to the test set of polar data, covering 79 systems and 430 points, is shown in Table 4. The gradient boosted algorithm presents the best performance for the test set (pure prediction) with an AARD of 5.07% followed by the random forest, decision tree, k-nearest neighbors, and multilinear regression (from lower to higher AARD). Similar trends are present when analyzing the arithmetic average of 79 systems AARD, as well as the minimum and the maximum AARD. As expected, the multilinear regression exhibits much worse results than the other four algorithms for all the AARD metrics. The gradient boosted algorithm also presents the lowest RMSE and highest Q 2 . The Q 2 value is also close to R 2 indicating that the model works well independently of its training data.        Table 5 presents the results obtained using each ML algorithm for the test set of nonpolar compounds (130 systems and 342 points). Once again, the gradient boosted algorithm presents the best global AARD for the 130 systems of the test set (5.86%), followed by the random forest, then by the decision tree and k-nearest neighbors with similar results, and lastly by the multilinear regression with significantly worst results. A similar trend is visible when calculating a simple arithmetic average of systems AARD. The gradient boosted algorithm shows the lowest RMSE and highest .The calculated versus experimental diffusivities for the test set of nonpolar compounds using the Gradient Boosted model are plotted in Figure 4, showing unbiased distribution along the diagonal over all range of experimental points. Figures S5-S8 of the Supplementary Material provide the calculated against experimental plots for the remaining four algorithms. As in the  Table 5 presents the results obtained using each ML algorithm for the test set of nonpolar compounds (130 systems and 342 points). Once again, the gradient boosted algorithm presents the best global AARD for the 130 systems of the test set (5.86%), followed by the random forest, then by the decision tree and k-nearest neighbors with similar results, and lastly by the multilinear regression with significantly worst results. A similar trend is visible when calculating a simple arithmetic average of systems AARD. The gradient boosted algorithm shows the lowest RMSE and highest Q 2 . The calculated versus experimental diffusivities for the test set of nonpolar compounds using the Gradient Boosted model are plotted in Figure 4, showing unbiased distribution along the diagonal over all range of experimental points. Figures S5-S8 of the Supplementary Material provide the calculated against experimental plots for the remaining four algorithms. As in the case of the polar data, the multilinear regression model once again presents significant deviations. The k-nearest neighbors, decision tree, and random forest algorithms provide better scattering around the diagonal. Few outliers may be observed, particularly in the case of the decision tree model. case of the polar data, the multilinear regression model once again presents significant deviations. The k-nearest neighbors, decision tree, and random forest algorithms provide better scattering around the diagonal. Few outliers may be observed, particularly in the case of the decision tree model.   As a final validation of the gradient boosted models selected for polar and nonpolar systems, a y-randomization test was performed by scrambling the diffusivity vector. This process was repeated 200 times and always returned random models with performances much lower than the original ones, thus confirming the significance of the proposed models. Figure S9 of the Supplementary Material shows the contrast between the values of our models (0.9919 for polar and 0.9879 for nonpolar) and the lower ones obtained for the permutations. It is worth noting that: (i) the best possible score of (and ) is 1.0; (ii) As a final validation of the gradient boosted models selected for polar and nonpolar systems, a y-randomization test was performed by scrambling the diffusivity vector. This process was repeated 200 times and always returned random models with performances much lower than the original ones, thus confirming the significance of the proposed models. Figure S9 of the Supplementary Material shows the contrast between the Q 2 values of our models (0.9919 for polar and 0.9879 for nonpolar) and the lower ones obtained for the permutations. It is worth noting that: (i) the best possible score of Q 2 (and R 2 ) is 1.0; (ii) for a constant model that always predicts the expected value of the response, both indicators are zero; (iii) Q 2 (and R 2 ) can be negative for arbitrarily worse model. Summarily, the ML Polar Gradient Boosted model showed good performance for the prediction of diffusivities of multiple solutes in polar solvents in the following train and test domain: T = 268-554 K; µ 1 = 0.0241-17.6 cP; M 2 = 17-674 g mol −1 ; P c,2 = 4.1-221.2 bar; M 1 = 20-113 g mol −1 ; and ε LJ,1 /k B = 208-2121 K. Likewise the ML Nonpolar Gradient Boosted can be applied over: T = 213-567 K; µ 1 = 0.0229-2.92 cP; M 2 = 2-461 g mol −1 ; P c,2 = 12.5-96.3 bar; and M 1 = 30-395 g mol −1 . Both models showed good interpolation capability, however it is expected that they can also provide reasonable extrapolations.
The ML Polar Gradient Boosted and ML Nonpolar Gradient Boosted models are provided as a command line program in the Supplementary Material.

Detailed Comparison of ML Gradient Boosted and Classic Models
Four classic models for the calculation of diffusivities were adopted for comparison: two hydrodynamic equations (Wilke-Chang [5] and Tyn-Calus [7]), a correlation by Magalhães et al. [9], and the hybrid model of Zhu et al. [13]. The performance metrics of the classic models are shown in Table 4, for the polar systems, and Table 5, for the nonpolar systems. Overall, the proposed ML models outperform the classic models.
The Wilke-Chang and Tyn-Calus hydrodynamic equations provide similar performance indicators in both data sets, though the former shows much higher maximum AARDs (Table 4: 197.71% vs. 97.11%; Table 5: 172.30% vs. 64.97%). Analyzing Figure 5a,b, where the calculated versus experimental diffusivities are plotted for the polar data set over the entire range and over a low range of values, we see that the Wilke-Chang equation overestimates higher diffusivities and tends to underestimate lower ones. The Tyn-Calus equation for polar solvents provides systematic underestimation as shown in Figure S10 of the Supplementary Material. In the case of nonpolar systems, both Wilke-Chang (Figure 6a,b) and Tyn-Calus ( Figure S11) models exhibit a dual biased distribution of the calculated D 12 values.
The correlation of Magalhães et al. is able to deliver the best performance among the classic models, with a unbiased distribution along the diagonal in Figure 5c,d and Figure 6c,d and an AARD only slightly above that provided by the machine learning gradient boosted models proposed in this work (5.19% and 6.19% for the polar and nonpolar sets, respectively). However, the Magalhães et al. can often be difficult to apply since it requires that data on the system of interest is available in order to fit its two parameters. In this work, data in the train sets was used to fit the a and b parameters for each system, which were then applied to the calculation of diffusivities for the test sets. For this reason, fewer points were calculated for the Magalhães et al. model, corresponding to the systems where not enough data were available in the train sets to optimize parameters a and b. Both models showed good interpolation capability, however it is expected that they can also provide reasonable extrapolations.
The ML Polar Gradient Boosted and ML Nonpolar Gradient Boosted models are provided as a command line program in the Supplementary Material.

Detailed Comparison of ML Gradient Boosted and Classic Models
Four classic models for the calculation of diffusivities were adopted for comparison: two hydrodynamic equations (Wilke-Chang [5] and Tyn-Calus [7]), a correlation by Magalhães et al. [9], and the hybrid model of Zhu et al. [13]. The performance metrics of the classic models are shown in Table 4, for the polar systems, and Table 5, for the nonpolar systems. Overall, the proposed ML models outperform the classic models.
The Wilke-Chang and Tyn-Calus hydrodynamic equations provide similar performance indicators in both data sets, though the former shows much higher maximum AARDs (  Figure 5a,b, where the calculated versus experimental diffusivities are plotted for the polar data set over the entire range and over a low range of values, we see that the Wilke-Chang equation overestimates higher diffusivities and tends to underestimate lower ones. The Tyn-Calus equation for polar solvents provides systematic underestimation as shown in Figure  S10  The correlation of Magalhães et al. is able to deliver the best performance among the classic models, with a unbiased distribution along the diagonal in Figure 5c,d and Figure  6c,d and an AARD only slightly above that provided by the machine learning gradient boosted models proposed in this work (5.19% and 6.19% for the polar and nonpolar sets, respectively). However, the Magalhães et al. can often be difficult to apply since it requires that data on the system of interest is available in order to fit its two parameters. In this work, data in the train sets was used to fit the and parameters for each system, which were then applied to the calculation of diffusivities for the test sets. For this reason, fewer points were calculated for the Magalhães et al. model, corresponding to the systems where not enough data were available in the train sets to optimize parameters and .
(a) (b) Finally, the Zhu et al. model, which was developed for nonpolar and weakly polar fluids, does not appear to provide any benefit over the much simpler Wilke-Chang and Tyn-Calus equations when applied to the nonpolar data set of this work. It provides higher AARD (Table 5: 37.93%) than both hydrodynamic equations (Table 5: 29.19% and 28.84%, respectively), although it shows lower biased dispersion along diagonal ( Figure S12).
(c) (d) The correlation of Magalhães et al. is able to deliver the best performance among the classic models, with a unbiased distribution along the diagonal in Figure 5c,d and Figure  6c,d and an AARD only slightly above that provided by the machine learning gradient boosted models proposed in this work (5.19% and 6.19% for the polar and nonpolar sets, respectively). However, the Magalhães et al. can often be difficult to apply since it requires that data on the system of interest is available in order to fit its two parameters. In this work, data in the train sets was used to fit the and parameters for each system, which were then applied to the calculation of diffusivities for the test sets. For this reason, fewer points were calculated for the Magalhães et al. model, corresponding to the systems where not enough data were available in the train sets to optimize parameters and .
(a) (b) Finally, the Zhu et al. model, which was developed for nonpolar and weakly polar fluids, does not appear to provide any benefit over the much simpler Wilke-Chang and Tyn-Calus equations when applied to the nonpolar data set of this work. It provides higher AARD (Table 5: 37.93%) than both hydrodynamic equations (Table 5: 29.19% and 28.84%, respectively), although it shows lower biased dispersion along diagonal ( Figure  S12). Table 6 details the results of the best machine learning (gradient boosted) and classic diffusivity models for each system of the polar database, as well as the distribution of points among train and test sets. The best results are found for the ethylbenzene/acetone system (AARD of 0.08%) and the worst for the ethylene glycol/ethanol system (76.23%). However, these two systems have only one and two points in the test set, respectively. Considering only cases where at least 10 points are available for train and test sets, the carbon dioxide/n-butanol shows the best result (1.19%) while ammonia/1-propanol has the worst (5.65%). Table 7 presents equivalent information for the nonpolar systems. In this case, the ndecane/n-dodecane and tetraethyltin/n-decane systems show the best (0.03%) and worst (25.87%) results, respectively, but, once again, with only one point in the test set. If only systems with at least five points in the train and test sets are considered, the best result appears for 1,3,5-trimethylbenzene/n-hexane (2.98%) and the worst for toluene/n-hexane (4.58%).
The models proposed in this work can be easily retrained as new experimental data is made available, thus increasing its robustness and scope. A program that allows the estimation of diffusivities in polar and nonpolar systems is provided in the Supplementary Material, along with instructions on its use.  Table 6 details the results of the best machine learning (gradient boosted) and classic diffusivity models for each system of the polar database, as well as the distribution of points among train and test sets. The best results are found for the ethylbenzene/acetone system (AARD of 0.08%) and the worst for the ethylene glycol/ethanol system (76.23%). However, these two systems have only one and two points in the test set, respectively. Considering only cases where at least 10 points are available for train and test sets, the carbon dioxide/n-butanol shows the best result (1.19%) while ammonia/1-propanol has the worst (5.65%). Table 7 presents equivalent information for the nonpolar systems. In this case, the n-decane/n-dodecane and tetraethyltin/n-decane systems show the best (0.03%) and worst (25.87%) results, respectively, but, once again, with only one point in the test set. If only systems with at least five points in the train and test sets are considered, the best result appears for 1,3,5-trimethylbenzene/n-hexane (2.98%) and the worst for toluene/nhexane (4.58%).
The models proposed in this work can be easily retrained as new experimental data is made available, thus increasing its robustness and scope. A program that allows the estimation of diffusivities in polar and nonpolar systems is provided in the Supplementary Material, along with instructions on its use.

Conclusions
Two machine learning (ML) models were developed for the estimation of binary diffusivities in polar and nonpolar systems. These models were trained and tested on a database containing 20 properties for polar (90 systems and 1431 points) and nonpolar (154 systems and 1129 points) systems. Several learning algorithms were tested, including multilinear regression, k-nearest neighbors, decision tree, random forest and gradient boosted. The best ML results were obtained for the gradient boosted model, which provided global AARDs of 5.07% and 5.86% for the test set of polar and nonpolar systems, respectively. The nonpolar model relies on five input variables/properties: temperature, solvent viscosity, solute molar mass, solute critical pressure and solvent molar mass. The polar model takes the Lennard-Jones energy of solvent as an additional parameter, thus requiring six inputs totally. The classic models of Wilke-Chang, Tyn-Calus, Magalhães et al. and Zhu et al. were adopted for comparison and demonstrated worse performance for the same test sets. The 2parameter correlation of Magalhães et al. showed results closer to the new gradient boosted models with AARD of 5.19% (polar) and 6.19% (nonpolar), however, that equation requires previous data to fit its two parameters, and thus it is impractical to apply to unknown systems. Among the remaining classic models, Wilke-Chang provided the best result for polar systems (40.92%) while Tyn-Calus performed best for nonpolar systems (28.84%). The developed models are provided as application in the Supplementary Material.

Supplementary Materials:
The following are available online at https://www.mdpi.com/1996-1 944/14/3/542/s1, Software, Table S1: Tested and best hyper-parameter values for each machine learning algorithm, Figure S1: Predicted versus experimental diffusivities for the test set of polar systems using the Multilinear Regression model, Figure S2: Predicted versus experimental diffusivities for the test set of polar systems using the k-Nearest Neighbors model, Figure S3: Predicted versus experimental diffusivities for the test set of polar systems using the Decision Tree model, Figure S4: Predicted versus experimental diffusivities for the test set of polar systems using the Random Forest model, Figure S5: Predicted versus experimental diffusivities for the test set of nonpolar systems using the Multilinear Regression model, Figure S6: Predicted versus experimental diffusivities for the test set of nonpolar systems using the k-Nearest Neighbors model, Figure S7: Predicted versus experimental diffusivities for the test set of nonpolar systems using the Decision Tree model, Figure S8: Predicted versus experimental diffusivities for the test set of nonpolar systems using the Random Forest model, Figure S9

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.