3.1. Model Calibration and Evaluation
Figure 1 depicts the workflow chart followed in the present QSPR modeling, which was mostly carried out using our recently developed tool, QSAR-Mx. As can be seen, all the involved steps and methodology employed to cope with the major goal of this work are shown, i.e., to build reliable predictive QSPR regression models from the compiled data that could be used to estimate the surface tension of DESs.
In total, 258 models were set up by varying data splitting schemes, descriptor calculation methods (Method-1 or Method-2) and SFS-MLR modeling. Among these, 136 models pertained to the MO-based division scheme, whereas the remaining 112 models were generated with the CO-based division scheme. The overall predictive quality of each of these regression models was judged by means of the average value computed for the statistical parameters
Q2LOO,
Q2LCO and
R2Pred. Essentially, the two parameters—
Q2LOO and
R2Pred—account for the internal and external predictivity of the QSPR models, respectively. Nevertheless, the parameter
Q2LCO was also included to ensure that the most predictive models do not suffer overfitting due to bias towards some specific components of the binary DES mixtures. Naturally, the higher the average value obtained from these three parameters is, the more predictive the model is. Considering this, we selected the top 15 unique models for further processing. A summary of the statistical results of these models is given in
Table 1. Interestingly, out of these 15 models, 14 were derived from MO-based data distributions, and only one model arose from CO-based distributions. Undoubtedly, that clearly shows that MO-based data distributions are more likely to produce more predictive models in comparison to the CO-based data distributions since the latter provides a more rigorous validation strategy.
As referred to before, in the entire model-building process, the data distributions were varied for the selection of the most predictive models. Therefore, the generated test sets serve as a validation set to estimate the external predictivity of the models but, at the same time, as calibration sets for the selection of the best models. In contrast, the external validation sets (containing 84 data points) were treated as the ‘true validation set’ for assessing the external predictivity of the models. The latter was built with the CO-based data-distribution scheme, thus holding a significant challenge to the generated models as far as their external predictivity is concerned. A comparison of the predictivity of the top 15 models is shown in
Table 2.
As can be clearly observed from
Table 2, only a few models show satisfactory predictions against the external validation set. Nevertheless, six of these models had R
2Pred values greater than 0.50, as well as average %AARD values lower than 12. Moreover, three models, namely M09, M10 and M12, supplied the most satisfactory predictivity towards such external validation set with
R2Pred > 0.65 and %AARD < 10. Therefore, these three models were considered the best models obtained for predicting the surface tension of DESs. Remarkably, M10, the only CO-based model included in the top 15, emerged as one of the most predictive models. Still, on the basis of overall predictivity, M12 was selected as the best individual QSPR model, even taking into account its slightly lower internal predictivity, compared to M10, and its slightly lower external predictivity towards the test set, as compared to M09. Even so, M12 afforded a balanced prediction against all three sets with an average %AARD value of 7.126, which is lower than that obtained for the other two models. At the same time, model M12 provides the best solution if the average value of
Q2LOO,
Q2LCO and
R2Pred (against the two validation sets) is considered. In fact, for M12, this average value was found to be 0.859, while for M09 and M10, the average values were estimated as 0.820 and 0.831, respectively.
In summary, the best predictive model found for the DESs’ surface tension (a six-variable equation, model M12) can be expressed as detailed below, while the meaning of the selected WM descriptors is given in
Table 3.
In this equation, Xpmix and Xnmix stand for WM descriptors of the type Dpmix in line with Equation (1) and Dnmix following Equation (2), respectively, T is the temperature (in K) under which the surface tension has been measured, and σ is the surface tension (in mN/m).
A summary of the extended statistical results for model M12 is given in
Table 4. The determination coefficient values (
R2 = 0.916 and
R2Adj = 0.915), the sample size (
Ntr = 360), the Fisher ratio (
F = 642.4), but especially the high ratio between the number of data points to adjustable variables (
ρ = 60) [
59] are indicative of the model’s statistical significance and fitness. Model M12 also provides a satisfactory internal and external predictivity as follows from the cross-validation,
rm2 and
R2Pred metrics values (see
Table 4). Moreover, built with only six descriptors, it led to %AARD values of 5.805, 11.155 and 4.418 against the training, test and validation sets, respectively. The model prediction ability was further checked by analyzing the relative deviations (%RD = 100*(σ
Pred − σ
exp)/σ
exp) between the predicted and experimental DES surface tension values for all three sets. As
Figure 2 shows, model M12 performs more accurately regarding the training and external validation sets than the test set. Yet, the latter also demonstrates a normal behavior considering the shape of the RD distribution according to the proposed model, also displayed in
Figure 2. This histogram plot clearly depicts that most of the RD error values are within ±20% and that those are normally distributed, suggesting that the model estimations are not biased.
Figure 3 shows the plot of the predicted surface tensions obtained from model M12 vs. the observed experimental ones. As seen, the majority of the data points are sufficiently close to the diagonal line, denoting the model’s reliability and soundness of its predictions. Indeed, the model’s performance is even better than that of the previously developed thermodynamic model for the DES surface tension [
14], which, despite having fewer data points (a total of 530 data points, considering only the 99 unique binary DES), led to %AARD values of 8.87 and 14.81 for the training and test sets, respectively. However, the purpose and outcomes of the current QSPR modeling are different from that of any thermodynamic model, as the former demands several different conditions to be satisfied, apart from validation, to establish the statistical robustness of the model. For example, so far, we have demonstrated the acceptable results on the reliability of the QSPR model M12, but it is also important to inspect the non-intercollinearity among any two of its descriptors. The latter was found to be 0.238, indicating that the variables included in the model are indeed not interrelated to each other. Furthermore, the model was itself checked for its uniqueness by the
Y-based randomization technique, which was performed by scrambling the endpoint responses for the training set. The high value obtained for
cRP2 (=0.908) implies that the model is not correlated by chance. Another crucial aspect is related to the applicability domain of the model that here was assessed by analyzing the Williams plot (plot of standardized residual vs. leverages). As seen in
Figure 3, eight data points from the training set and thirteen from the test set can be considered structural outliers of the model, but no structural outliers were found in the external validation set. Interestingly, most of these structural outliers were well predicted by the model and were thus retained, as previously suggested by Gramatica et al. [
49]. In addition, only twelve data points of the entire dataset were found to be response outliers, which also proves the high predictive accuracy of the model [
60].
3.2. Model Interpretation
In our previous investigation on density [
28] we observed that, in spite of providing less mechanistic interpretability, graph-based topological descriptors often help in characterizing the physicochemical properties. In the present work, a number of topological descriptors were also proven to be significant for describing the surface tension of DESs.
Figure 4 shows the relative importance of each descriptor of model M12, estimated on the basis of the absolute value of its regression coefficients.
As can be observed, the WM descriptor MATS5s
nmix was found to have the highest importance and besides, it is the only
Dnmix type descriptor in the model. Being derived from graph-based topological descriptors, MATS5d
nmix points out that the differences in topological geometry of the DESs’ components may play a significant role in the surface tension of these solvents. The
Dpmix type WM descriptor CATS2D_02_AN
pmix is the second most influencing descriptor of the model. Chemically Advanced Template Search (CATS) descriptors are a useful group of descriptors that account for the topological distance among scaffold features in the molecules [
58]. CATS2D_02_AN, in particular, means that the acceptor and negatively charged groups are separated by a small topological distance (=2). In this case, higher values of this descriptor are found to be negatively correlated to the surface tension. Descriptor BLTF96
pmix appears as the third most important descriptor in the model. Unlike the first two descriptors of topological nature, this descriptor is based on an important molecular property—lipophilicity [
55,
56]. Since this descriptor belongs to the
Dpmix type, it may be inferred that higher lipophilicity of the components would trigger higher surface tension for the DESs. Apart from lipophilicity, another well-known physicochemical property—dipole moment—was also found to have important contributions in ascertaining the DES surface tension. The importance of the dipole moment is derived from the presence of descriptor Eig02_EA(dm)
pmix. The fifth most important descriptor belongs to the class of P_VSA descriptors, which represent the amount of van der Waals surface area (VSA) having a property (P) in a certain range [
56]. In the case of the descriptor P_VSA_MR_6
pmix, the property is the molar refractivity (MR) at a larger range (bin size 6). The positive relation of P_VSA_MR_6
pmix with the dependent property is highly significant as it suggests that increased MR (i.e., polarizability) within the van der Waals surface of each component contributes towards a higher surface tension for the respective DESs. Finally, the last descriptor of the model is the temperature of surface tension measurements,
T. As expected, with increasing temperature the surface tension is found to decrease, which fits well with the experimental findings. Still, to further check how model M12 actually addresses the influence of temperature, we randomly selected six DESs with a range of surface tension values. From
Figure 5, it can be clearly seen that both experimental and predicted properties followed the same trend, i.e., the surface tension gradually decreases as the temperature is increased.
3.4. Consensus Modeling
Finally, we applied the intelligent consensus modeling [
54] to see whether the surface tension predictions for the external validation set could be improved. To do so, sets of the three most predictive linear models—M09, M10 and M12—were subjected to consensus predictions in different combinations, namely: (a) C1-based using models M09, M10 and M12; (b) C2-based using models M10 and M12; (c) C3-based using models M09 and M10; and (d) C4-based with models M09 and M12. In each case, the modeling dataset containing 535 data points was treated as the training set whereas the external validation set was used to check the external predictivity of the consensus model. The results of all consensus modeling attempts are presented in
Table 6.
Interestingly, the resulting models C1 and C4 lead to similar predictivities. Yet, none of the later consensus models display an external predictivity considerably better than that of the best individual model, M12. The
R2Pred and %AARD values obtained for consensus model C1 are 0.864 and 4.459, respectively, and similarly for C3 (i.e., 0.854 and 4.393). As can be seen, both C1 and C3 may therefore be projected as alternative models to M12. However, let us mainly focus on C3, since it reveals that M09 and M10 may indeed work as complementary models for each other towards improving the external predictivity. Details about the M09 and M10 models are provided in
Table S3 of the SI.
Since M09 was developed with the same data distribution as M12, these two models have four descriptors in common, namely: CATS2D_02_AN
pmix, P_VSA_MR_6
pmix, BLTF96
pmix, and
T. Obviously, these four descriptors have a high significance in describing the surface tension of DESs. Significantly, CATS2D_02_AN
pmix, which was found to be the second most important descriptor of M12, appears to be the most influential descriptor of M09. It undoubtedly indicates that this descriptor may be considered the most crucial descriptor in predicting the surface tension of DESs. Presumably, due to the similarity between M12 and M09, consensus modeling with these two models failed to provide any better solution. Model M10, in contrast, is established as a unique model because, save for
T, none of its descriptors is found either in M09 or in M12. Most likely due to this reason, its combination with the other two models produces good consensus models. Unlike models M09 and M12, model M10 yields are slightly higher but still have acceptable intercollinearity between descriptors, with a maximum
R2 value of 0.713. The selected descriptors for models M09 and M10 are described in detail in
Table S4 of the SI.