4.1. Examples of Studies That Will Benefit from This Database
The availability of this database makes it easier to do new research that builds on top of the old database to incorporate new events and add to knowledge of the health impacts of biomass fires. This is shown by a recent paper on the health burden associated with fire smoke in Sydney, New South Wales (NSW), Australia’s largest city [18
]. In that study our events database was extended by the new authors from the original end-point in 2007 to include six more years of data and the time series now finishes in 1 January 2014 for that city. These new data have been added to the database. Using the extended dataset the authors estimated that around 200 premature deaths were attributable to fire smoke over the 13 years studied, compared to just 77 deaths in NSW previously estimated to have been directly attributable to bushfires during the entire 110 years between 1901 and 2011 [19
]. This indicates a previously underappreciated burden of disease from biomass fire smoke.
In a related example the recently published Australian ‘Countdown on health and climate change’ [20
] was limited for the indicator of lethal weather-related disasters because it is known that estimates of the deaths attributable to bushfires and biomass smoke in Australia based on historical data is underestimated. This was noted as a problem in the international ‘EM-DAT’ database of disasters whereby the estimate of deaths from bushfires was found to have large discrepancies with estimates from other data sources. For example, Blanchi et al. [19
] identified 825 deaths from direct exposure to bushfires in Australia between 1901 and 2011, compared to just 501 between 1900 and 2017 from EM-DAT, and so the results in the Countdown report are likely underestimated. As the Countdown report aims to release annual updates which will track these indicators of the health impacts of weather-related disasters such as bushfires due to climate change, the future work will be able to combine multiple data sources, including our biomass smoke events database, to mitigate the major limitations of the EM-DAT database in terms of identifying bushfire events in Australia. Such a data integration task would have been much more difficult without the development of this extensible validated events database.
4.2. Differential Sampling Intensity and Potential Exposure Misclassification Bias
Providing a number of Event Validation Protocols options for contributors to follow is important. In addition to the database’s open format the flexibility of following a protocol that meets the resources available to a contributor increases the likelihood of data being contributed. This will add to the utility of the database for those researching the health effects from ambient outdoor air pollution relating to smoke from biomass burning.
The Event Validation Protocols described in this paper are all conceptually appealing because they allow a collection of events from times and places if evidence is available from the sources. Unfortunately, the end result of combining these data into a single database is that the derived dataset is made up of components which have had unequal amounts of research effort expended on finding evidence (e.g., differential sampling intensity), as well as different search criteria used for finding the references to support events, and may not be ‘missing at random’ and therefore contain systematic biases, which is a problem for statistical analysis.
This raises the potential for bias by exposure misclassification, which would occur by classifying actual fire smoke/dust days as non-fire smoke/dust days, or classifying non-fire/dust days as actual fire/dust days. The impact of exposure misclassification will of course be related to the particular study design implemented with the fire smoke database. For time series studies the issue is discussed briefly in Morgan et al. [15
]. They explain that missing some bushfire days would reduce the power of the analysis to find an effect (if one is present), but it would be unlikely to bias the result. Because fire smoke/dust incidents are rare and PM is usual relatively low in Sydney (and in most other Australian cities) it is possible to categorize any day as having either “Biomass Smoke Event” PM or “background” PM.
Morgan et al. [15
] included this background PM explicitly in their model to capture differences with the Biomass Smoke Event days. It is possible such an approach will include a small number of extra bushfire days with days categorized as background days.
Morgan et al. argue that any such inclusions would be unlikely to influence the background PM results due to the large number of non-bushfire days in a multi-year study period. The sensitivity analysis they conducted did not categorize daily PM into bushfire PM and background PM. They found results similar to those reported for background PM. This suggests that including additional bushfire days with non-bushfire days in the background PM analysis would not bias their PM results.
As part of our database design we aimed to minimize this risk because the database clearly identifies the amount and type of information sources used for each event and the validation protocol used in each review. This serves as a flag to communicate to the user the amount of certainty there is about each event being from biomass smoke (and whether from wildfires, prescribed burns, woodheaters or dust, or some combination). Future users can then assess if the set of events they extract from the database meet the level of certainty required for their study, which will be based on their research questions and the inferences they aim to make from their results. A more detailed discussion of these issues is outside the scope of this brief database description paper.