Multi-Output Machine-Learning Prediction of Volatile Organic Compounds (VOCs): Learning from Co-Emitted VOCs
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper presents the very relevant and current topic of predicting volatile organic compound (VOCs) concentrations using multi-output machine models. The authors analyzed data on VOCs at a specific workplace that pose a human health problem. The work is methodologically well structured and the results are solidly interpreted, but there are certain aspects of the work that need additional clarification and methodological refinement, and there is a lack of analysis of the extent to which the models provide socially useful insights (e.g. identification of high-risk groups).
The manuscript has potential for publication and a scientific contribution, but requires an expansion of the methodological information and clarification of the limitations as well as a more comprehensive evaluation of the model.
The current version of the draft needs major revision before it can be considered for publication. Also consider the following specific comments.
A more detailed description of the population of respondents and sampling is missing: what is the number of respondents (gender, age), how was the sample conducted, how many samples? Dimensionality of the data set?
The number of observations is not specified (except that 38 VOCs are mentioned). It is not clear how many samples were used to test the model. Which 33 VOCs were identified?
Recommendation: detailed presentation of results for each target VOC individually (e.g. graphs).
Edit Figure 2 more clearly.
Line 384, Figure 1 or Figure 2?
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsI believe the work has potential and interest, but there are some aspects that need to be reviewed.
The article focuses mainly on mathematical work, which is outside my area of ​​expertise. I will focus my comments only on the part about exposure to VOCs.
I present next some comments and I hope to contribute to improve the interest and clearness of the article.
The introduction is a bit confusing and needs to be improved. In the introduction, lines 48 to 71, the authors mention different types of exposure monitoring, but I think they should organize the ideas better. First, saying that active sampling is the method generally used, and then mentioning alternative sampling methods: passive sampling and biomonitoring (hair, urine, blood, etc.), citing the advantages and difficulties of each option.
In line 172-174, the authors wrote: “The data set includes VOC measurements taken from people located in or around a carpentry workshop, an area recognized for high indoor air pollution levels due to the regular use of paint-based materials.” – The data includes measurements taken from exposure studies of people, using the blood levels or the air levels? Please explain.
Table 1 – I suggest the authors add a column with the CAS number for all the compounds, as it is a unique identifier.
In line 211 it is written: “The chemical structures and systematic names of the 33 VOCs are provided in Appendix 1.” – The table presented in Appendix 1 presents more information than chemical structures and systematic names. Please explain what information is presented in that table. Another comment is related with source of this information. This table was copied from an article from the authors, but I think this information should be added.
In Table 2 the values of MAE are presented, but we can not evaluate if it is a low or high value, because we don’t know the magnitude of the concentrations. I think the authors could prepare and insert a table or a graphic with the real value and predicted value of the 5 VOCs obtained with the 3 models.
In line 399-402, the authors wrote that “VOCs such as Trichloroethene, o-Xylene, 1,2-Dichlorobenzene, and 2,5-Dimethylfuran appeared among the top influential factors in all models, which shows their strong association with the target VOC concentrations. Further, age and smoking status also showed importance.” But in figure 3 we see that age is the most important factor. What is the explanation for this? I suppose that blood levels have some influence on those results, no? Please add more specific information.
Again, in line 431-433, the authors wrote “While demographic factors such as age and smoking status moderately influenced VOC concentrations, the overriding signal came from the environmental exposure to paint emissions. “ Figure 3 shows age is the most important factor for the 3 models. Please correct.
Author Response
Please see the attachment
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsRevised manuscript is acceptable for publication.
Author Response
We would like to sincerely thank Reviewer 1 for the time, and the valuable feedback throughout the review process, and for recommending our revised manuscript for publication.
We truly appreciate the support and constructive comments, which helped us improve the quality and clarity of our work.
Reviewer 2 Report
Comments and Suggestions for AuthorsI believe that the authors have responded to the comments and the text is clearer. In my opinion, the article can be accepted without restrictions. I have only one doubt, which I will present below, and which I leave to the editor to correct, if necessary.
In table 3 it is written: “Comparison of mean and standard deviation of predicted concentrations (ng/L)”. These values are concentration in air or blood? If it is in air, usually the units used are ug/m3 (that it is equivalent to ng/l). Please change if applicable.
Author Response
We would like to sincerely thank Reviewer 2 for the supportive recommendation for acceptance, and for the time and effort devoted to evaluating our work.
Regarding the following comment:
Comment 1: In table 3 it is written: “Comparison of mean and standard deviation of predicted concentrations (ng/L)”. These values are concentration in air or blood? If it is in air, usually the units used are ug/m3 (that it is equivalent to ng/l). Please change if applicable.
Response 1:
Thank you very much for your careful reading.
We confirm that the concentrations reported in Table 3 represent VOC levels measured in blood, not in air. The appropriate unit for such biological measurements is ng/L, which is widely used in human biomonitoring studies, and we have followed this standard convention throughout our analysis and reporting.
As correctly noted, 1 ng/L is equivalent to 1 part per trillion (ppt) in aqueous biological samples like blood. This equivalence is well accepted in environmental health and toxicology research. If the reviewer recommended changing it to ug/m3 , then we have to multiply it by 0.001. We prefer to leave it as it is since most researchers and previous related studies refer to it as ng/L.
We sincerely appreciate the reviewer’s close attention to this detail, and we hope this clarification resolves any remaining uncertainty.