Extensible Database of Validated Biomass Smoke Events for Health Research
Reviewer 1 Report
Hanigan et al. present a database they comprised to encompass biomass smoke events. They also include varying levels of criteria used to identify biomass smoke events to include in the database. In general, the paper is well written and should be accepted for publication after a few minor revisions.
I have 3 suggestions to improve the manuscript. The first being that the paper should be proof read as their were a handful of typos (e.g. "Re.sults") and minor grammatical errors. The second is that the authors should make it more clear why they're describing the different protocols. It wasn't until after I read through each protocol that I inferred the reason they were described - so that studies entered into the database could use a protocol as part of the inclusion. Finally, the paper would be enhanced by a specific example of a study that could make use of the database. The authors describe some general types of studies, but are they aware of specific studies that would have benefited from this database?
1. Thank you we have proof read and corrected the typos and grammatical errors.
2. We agree and have added text to the abstract ("..., to ensure standardization across datasets") and to the beginning of the methods section to make it clearer that the different protocols were described so that a) studies entered into the database could use a standardised protocol as part of their review and b) so that future users can assess different events on the basis of the amount and type of information used to validate them.
3. We agree and have added specific examples of two studies that 1) made use of the database and 2) would benefit from this database in future. The first is a paper (Horsley et al. 2018) that recently came out after having extended the validated events history and estimated the burden of premature deaths attributable to bushfire smoke. The second is a recent report of the Australian Countdown on health and climate change (Zhang et al. 2018) that noted a limitation in data available to calculate the indicator of ‘lethal weather-related disasters’, especially for bushfires. The database we describe has the potential to resolve that problem by providing high quality validated historical information alongside the most recent events for tracking any trends in bushfire impacts due to climate change.
Reviewer 2 Report
This paper explains that the Biomass Smoke Validated Events Database, designed to collect data on air pollution in Australian states specifically from vegetation fires, is now available for contributions from other researchers using a variety of protocols requiring varying numbers of reference documents to validate events. Contributors can also create their own protocols. The authors note possible issues with differential sampling intensity, but argue the strength of alternate protocols in theoretically allowing for more contributions of more biomass smoke events, which will make the database more useful for more researchers.
What do the QC checks by the DM involve? This seems important because without knowing this, it is unclear how much the outside contributor is really doing beyond essentially emailing the DM and letting them know that there is a smoke event they might want to check out to see if it meets criteria. Does the QC involve adding more layers of data to confirm the event? (In other words, if someone contributes using Bare Minimum, does it get Salimi 2017 or the like as a QC check?)
Is it clear from the database which protocol was used for identifying events?
If contributors create their own protocol, do they then share it? How do others know what protocol was followed?
Discussion section- The authors note potential issues with differential sampling intensity and exposure misclassification, but argue that these issues are unlikely to influence results and further discussion is outside the scope of the paper as it is a brief database description. This seems to me to be an unsatisfactory explanation to what seem (especially the sampling) to potentially be quite problematic. I recommend re-wording some of this section to include a bit more detail about why these issues should not be red flags for the efficacy of these data without adding too much to the word count.
Conclusion section- Consider re-iterating why these data matter to public health researchers—it is clear up front, but the point of the database gets lost at the end of the piece. (In other words, why should we care if these data exist? Who can and should contribute, and who might benefit?)
Figure 1 seems unnecessary, and is not particularly easy to follow compared to the in-text description (and I am a fairly visual person who likes diagrams and figures).
Line 125- period in the middle of the ‘Results’ heading
Line 166- “miss-classification”
Q1: What do the QC checks by the DM involve? … Does the QC involve adding more layers of data
A1: Thank you for this question. No there is no additional review of data by the DM. The QC by the DM is only to ensure the minimal adequate amount of information is provided by the outside contributor. This includes a check that the protocol is described and that the new events data is easy to merge with the old database (for example place names are spelt the same).
Q2: Is it clear from the database which protocol was used for identifying events? How do others know what protocol was followed?
A2: Yes, every event in the database has a link to the protocol used to identify it. An extraction from the database will show both the event and protocol used. We added this to the start of the methods section.
Q3: If contributors create their own protocol, do they then share it?
A3: Yes. If they create their own protocol then this must be shared along with the events data contributed for inclusion back into the master copy of the database. We now include this at the end of the methods section.
Q4: I recommend re-wording some of this section to include a bit more detail about why these issues should not be red flags for the efficacy of these data without adding too much to the word count.
A4: We agree that a bit more discussion will help readers to understand this caveat and have added text to the end of the discussion section that explains that the database clearly identifies the amount and type of information sources used for each event and the validation protocol used in each review and so this serves as a flag to communicate to the user the amount of certainty there is about each event.
Q5: Conclusion should re-iterate the public health benefits.
A5: We have added to the end of the conclusion more text to explain how this database will support public health research on this important topic.
Q6: Fig1 seems unnecessary.
A6: Thank you for this suggestion and we considered removing this, however we feel that including the visual representation does assist the understanding of the workflow and have decided to include Fig1.
A7: Thank you we have corrected the typos.
Reviewer 3 Report
Review of Manuscript ID: fire-391843
Title: Extensible database of validated biomass smoke events for health research
This paper presents a unique database to the air quality research community that provides information on bushfires in Australia in one convenient location and can incorporate new bushfire events from other researchers/observers in the field using free provided software and protocols. The information contained in the data base would be useful to scientists developing air quality models and also for health care researchers determining the contribution of bushfire events to overall public health.
Issues need addressing:
Overall the paper was well written for the subject area and details were provided for accessing the database by readers through Github. The following typos were found:
Line 125: change Re.sults to Results
Line 144: change imputed to inputed
Finally, it was unclear what the phrase “not ‘missing at random’” meant in lines 118 and 165, so you might want to rephrase that sentence.
Q1: Several typographical errors were found and a lack of clarity around the phrase "missing at random".
A1: Thank you we have corrected the typos. The word imputed was not incorrect, but we have changed it for clarity to “missing data gaps were filled using imputation”. We have edited the phrase and changed it to “… and may not be 'missing at random' and therefore contain systematic biases, which is a problem for statistical analysis”.