Estimating Equivalent Alkane Carbon Number Using Abraham Solute Parameters
Round 1
Reviewer 1 Report
In the present manuscript, the authors study the Equivalent Alkane Carbon Number (EACN) for surfactant/oil/water (SOW) systems using a simple linear model based on Abraham solute parameters. Using a 115 dataset of measured EACN values and Abraham solute parameters collected from the literature, the authors propose a 4-parameter based equation to estimate EACN. The idea of predicting EACN of oils using Abramam parameters is interesting, although many parameters are not available for molecules with measured EACN. The text is well written and understandable, and the data used is readily available in a supplementary file. However, It is unfortunate that no figures or tables of results are provided to give an overall view of the effectiveness of the proposed model. Similarly, no solid comparison is made with the models cited in the introduction.
p2: why do the authors include simple alkanes in the dataset since enough molecules are available? The dataset is therefore unbalanced since half of the alkanes have very large EACN values.
p4: there is a problem for the Abraham solute parameters of decylcyclohexane and dodecylcyclohexane since V is much too small compare to those of the homologous series methyl-, ethyl-, propyl-, butylcyclohexane. A quick look up in the referenced UFZ-LSER database didn't return any result when searching for the experimental values of the parameters for decylcyclohexane and dodecylcyclohexane, contrary to what the authors say. Using the same database to obtain the predicted parameters for these 2 molecules gave higher values for V (2.25 an 2.54 respectively). Therefore, unless more realistic values can be found in the litterature for decylcyclohexane and dodecylcyclohexane, they cannot be included in the dataset.
p7: I do not understand why the authors do not use descriptor A, a fact that is detrimental to the estimation of the EACN of alkynes. (see below)
p7: a detailed explanation would help to understand how the authors use the cross-validation results to conclude on the accuracy of the prediction. In my opinion a leave-one-out validation (k=1) would also be as effective and would identify outliers.
p7: unless I'm wrong, the predicted EACN values given by equation (5) are in fact estimated values, since all 115 data are used to determine the coefficients by linear regression.
p7: no prediction is given for other molecules not belonging to the 115-dataset. So, at this point, the title of the paper should be "Estimated of equivalent alkane carbon number...".
p7: showing a scatter plot (predicted vs measured EACNs) would be more indicative than giving the RMSE for the 5 different categories.
p7: using the equation (5) given, it is easy to detect that: 1. the EACN estimates of alkynes are not correct. 2. the EACN values of decylcyclohexane and dodecylcyclohexane are underestimated, as expected by the errors listed in their V parameter value. 3 other molecules namely bis(2-ethylhexyl) adipate, tristearin and methyl dihydrojasmonate are outliers responsible for the average model performances.
p7: again, I don't understand why the authors include the 31 alkanes in their model since they have enough data to derive their fitting equation. Indeed when the alkanes are removed from the dataset, a better RMSE is obtained (2.37 vs 2.52) with the 84 compounds left, indicating better estimations for those molecules than estimations performed when alkanes are members of the set. In addition the results are underestimated for alkanes, the authors themselves saying that the model should not be used to estimate the EACN of alkanes. It is true that a better R2 is calculated when alkanes are included, but this does not mean that the results are better, which the lower RMSE indicates.
p7: why do the authors choose to add BP as an additional descriptor? I noticed a significant error for the boiling point of tristearin reported with a value equal to 813 °C in the supplementary file.
p7: I would then recommend to study a model using a dataset without the alkanes and the 5 abovementioned molecules (79 left molecules) and using the A descriptor as a fifth input.
p7-8: consequently, adding a small paragraph dealing with the 3 outliers in the discussion section would be much appreciated.
p7-8: again, since no results are shown for any cross-validation or external test set, the authors cannot claim a demonstration of prediction.
p9: I suggest adding the 2015 reference from Lukowicz et al. (10.1021/acs.langmuir.5b02545)
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
The paper presents a new and simple way to predict Equivalent Alkane Carbon Numbers (EACN) from liquids structure using Abraham’s solute parameters. It represents an innovative use of Abraham’s solute parameters and is suitable for publication after some minor revisions described below:
Some minor typos were detected:
Line 140: “refraction” instead of “re-fraction”.
Line 142: “and B the overall” instead of “and are the overall”.
Line 149: There’s an extra space at the end of the sentence.
Additionally, some methodological aspects should/could be improved:
Equations 5, 6 and 7 should include the associated errors.
In addition to equation 5, authors could provide a model which: i) excludes the simple alkanes, since EACN for those compounds is simply equal to their carbon number (as the authors well pointed out); ii) complies with prior to model establishment simple procedures such as orthogonality check; iii) only includes significant parameters (with significance level higher than 95%); and iv) excludes outliers.
Usually, collinearity between used parameters is previously checked. For each pair, R2 should be lower than 0.50 and the determination coefficient of one parameter against all others should be lower than 0.80. As so, and after the simple alkane’s removal, S vs B parameters report an R2= 0.61. Removing compounds Tristearin, Methyl dihydrojasmonate and Bis(2-ethylhexyl) aidpate lowers the R2 to an appropriate value of 0.46. If the remaining 81 liquids are used in the correlation (at least applying excel’s regression tools), 2 compounds (Decylcyclohexane and Dodecylcyclohexane) are clear outliers, and the E parameter shows a low significance level. Removing both compounds and the E parameter should provide a suitable three parameter model (EACN = f(S,B,V), N=78) which is quite consistent with the model presented in equation 2 increasing the value of the publication.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 3 Report
This work presents a useful, convenient empirical relationship that can predict EACN values based on easily obtainable Abraham solute parameters. The work shows that the proposed relationship can be used for different types of chemical compounds and analyze results on each compound type. The authors also provide the open dataset that can be used in future studies by other people. For all these, I recommend this work for publication in Liquids after the following issues are clarified:
Major:
1. How does the proposed model compare to the existing models (the model by Bouton et al. and the one by Lukowicz et al.)? What are the reported prediction errors of these existing models? Are parameters used in other models not as readily available as the Abraham solute parameters? A bit more comparison with the existing models would be helpful.
2. Some of the modeling method part is unclear and need more detailed explanation. How is the dataset split into test (or validation) and training sets (e.g. randomly, ratio of the split)? Are the coefficients in Eq. 5 determined as the average values of those that are fitted from all repeats and cross-validations? If so, what are the standard deviations of the averaged coefficients?
3. How are the errors computed? Are they residual errors from linear regression? Or are the errors computed using the separate test data that are withheld from the training dataset used to fit the coefficients? The authors should specify whether the reported errors are residual fitting errors or actual test errors that are evaluated against a withheld test set. The latter one is more challenging than the former one.
4. It is difficult to tell whether the reported errors are considered good or bad. R2 error metric is more straight-forward to understand, but it’s not clear what values are considered “good” for other metrics like MAE and RMSE. Can the authors provide information on what the target prediction errors are for this work?
Minor:
1. On line 67, “This works develops” should be “This work develops”
2. On line 131, the RMSE is missing for the alkenes, terpenes, alkynes, and aromatics.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 4 Report
Ms.; liquids 189789
General comment
- In the work carried out the authors propose the use of a multilinear model for the estimation of the equivalent alkane carbon number (EACN), using Abraham's solubility descriptors as variables. Compared to other methods available in literature, this proposal is based on experimental values instead of molecular simulation tools to establish the descriptors. This provides a greater reliability to the model, but also limits its application to chemical compounds for which the Abraham solubility descriptors are known, or require their determination.
- The authors elaborate the model checking the independence of the different proposed descriptors. The coefficients to determine the "pairwise" correlation confirm that independence, although to achieve it the authors have had to eliminate some compounds from their database, whose "S" values ​​are considered anomalous. In particular, those compounds with S >1 are discarded. What is this criterion based on? Are these values ​​incorrect or should they be considered as a limit to the applicability of the model? Explain
- The final form of the model produces an acceptable correlation, with R2=0.92, which is adequate given the different nature of the database. However, to achieve this correlation, again the authors remove some compounds from the database. In this case, the authors limit themselves to indicating that they are “clearly anomalous values”. What criteria have the authors followed to remove those compounds? Are they really anomalous values or could they be values that are outside the validity range of the linear behavior of the model? Explain
- The presentation of the manuscript is clear and well organized, facilitating the reading and understanding of the content.
- Minor comments
- P-1, line 41. The descriptor S should be defined here, not later in the text.
- P. 2, lines 88-89. Are the data for dodecylcyclohexane new? In such a case, the authors should indicate how these values have been obtained.
- Table 1. The number of significant figures for each variable should always be the same.
- Figure 1. To clarify the figure, it would be convenient the dimensions of the axes were the same, since this provides a clear image of the correlation between the calculated and experimental values.
Conclusion
The authors have developed a model for estimating the EACN based on experimental values of the Abraham descriptors, with acceptable regression indices. The contribution is interesting, so the article should be accepted for publication. However, it would be convenient that the authors consider the suggestions made in this report to improve the content of the manuscript.
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
The authors did a very good job of improving their paper, and figure 1 is very informative. I am less convinced by the addition of table 2 (see below).
I have a few more corrections:
- In equation (2) M0 and M2 should be M0 and M2 same thing in line 66
- l75 change model to model:
- l100 change regression. to regression:
- l105 adipate must be spelled correctly; same thing in table 1, l147 and in table 2, and also in the figshare file.
- l37 and l39 (in the equations) minus sign should be – and not -, or not a mixture of both. Changes also need to be made in the 2 tables.
- in figure 1 change Beta to beta (like alpha), or use the greek letters.
- Despite my efforts, I could not find any data concerning the molecule bis(2-ethylhexyl) adipate, even with its CAS number or SMILES code in the UFZ-LSER database. The predicted value for S is not 1.1 but rather 0.92 which is also rather large. The authors have to provide a reference for bis(2-ethylhexyl) adipate (unless it is the paper by Chung et al.).
- I am not sure that giving a table (table 2) with the estimated EACN for the outliers is of much interest to readers, even if these values are important to give somewhere. In my opinion, it would be more useful to present a few predictions (not estimations), either for molecules with new measured EACNs which have measured Abraham solute parameters (isohexadecane (13.9), 3-methylpentane (5.2), 2,3-dimethylbutane (4.8), 2,2,4-trimethylpentane (8.3) and dipropylether (0.4) for example) or for molecules with well-predicted Abraham solute parameters (diheptyl ether, hexyl dodecanoate, myristyl propionate, rose oxide, and decyl propionate), with a short discussion.
- The estimated EACN values with equation (5) for all 86 molecules could then be given as an additional column in the figshare file. This would also address the previous point concerning the outlier estimations.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 4 Report
The authors have responded appropriately to the suggestions made by this reviewer.
Round 3
Reviewer 1 Report
The authors have responded to all of the reviewers' requests and the document is now much more impactful and engaging. Bravo.