1. Introduction
Fine particulate matter
is a leading contributor to the global disease burden examined by the Global Burden of Disease 2023
program [
1]. Estimating this burden requires, in part, characterization of both the magnitude and shape of the association between exposure and a health response. For the case of mortality, this association is determined using meta-data (slope estimate from a survival model using cohort data, its standard error and its cohort-specific exposure distribution) [
1,
2,
3]. Cohort studies of outdoor
observed exposures well below
, while policy analyses involving exposure scenarios can be more than 10-fold that of the figure [
4].
The very first attempt at constructing a relative risk model for the association between
and mortality over the global exposure range was part of the very first
program in 2004 [
5]. At the time, only two cohort studies existed, both conducted in the United States [
6,
7], with the highest observed exposure concentration of
[
6]. A linear-in-concentration relative risk model was assumed up to
with no increase in risk above this level. This extreme form of bounding was used based on concerns that this linear-in-concentration model would yield unacceptably high risks at the concentrations observed globally [
5].
The next attempt was the
, where risks associated with much higher particulate exposure sources (second-hand smoke, household pollution, and active smoking) of
were integrated with outdoor sources by transforming total inhaled dose of
to the equivalent ambient concentration [
8]. Several assumptions, however, had to be made, including assuming that the toxicity of
is the same for each source at the same inhaled dose, and that toxicity is not a function of the timing of exposure (i.e., smoking a cigarette versus living with a smoker) [
8]. An algebraic function was used to model relative risk predictions. Since the equivalent
exposure from active smoking yielded concentrations much larger than any possible outdoor concentration
, predicting the risk at even very high outdoor levels was a matter of interpolation.
dropped active smoking and second-hand smoking from the
, leaving household pollution as the highest
exposure source [
1]. They also replaced the algebraic function with a monotonically increasing concave smoothing spline [
1]. They extrapolated the risk beyond the observed exposure data with a linear function [
1]. Although this method offered a mathematical framework for risk extrapolation and enabled simultaneous assessment of the mortality burden from multiple sources [
1], its reliability relied on strong assumptions that have not been empirically tested or verified. In addition, since the exposure distributions of the cohorts in each of the sources can overlap, a common fitted model will not necessarily provide an adequate fit to any single source, as it is highly influenced by the differences in the magnitude in risk between sources [
3]. This is a limitation of the
approach if one is interested in modeling exposure specifically from outdoor sources.
The third attempt, the
, focused on the case where only risk assessment from outdoor sources of
was of interest [
2], and as such, only information from cohort studies of outdoor-sourced
was used. The product of a generalized logarithmic function and a sigmodal function was used to characterize the shape of the concentration–response relationship. The algebraic form of the function formed the basis for predictions beyond the cohort exposure range.
The fourth approach, the
model [
3], used a parametric spline with two knots composed as a fusion of three algebraic forms of the derivative of the logarithm of the relative risk model. The first form was a constant up to a concentration, the first knot, and then a function that declines with concentration up to the upper limits of the observed cohort exposure distribution, the second knot. A simplistic algebraic form was suggested as a means to extrapolate risk beyond the observable exposure range, where the derivative declines as the inverse of concentration [
3]. This assumption on the derivative implies that the relative risk increases as the logarithm of concentration beyond the observed cohort exposure range. Although this assumption was conservative in the sense that the derivative of the model is always greater over the cohort exposure range than a logarithm model predicts beyond that range, relative risk predictions are unbounded, and at concentrations much higher than observed, the relative risk estimate could be extreme, so much so that they are reasonably questioned [
9].
In this paper, we propose a specific criterion to guide extrapolation of the relative risk beyond concentrations observed in health studies of outdoor
. We suggest that the range in the population attributable fraction (proportion of total deaths due to
) [
5] based on concentrations above the cohort exposure range should not be greater than that observed over the cohort exposure range itself. In addition, we require a relative risk estimate at the end of the cohort exposure range to be insensitive to the shape and magnitude of the relative risk predictions above this range. Our approach does not require additional assumptions on the extrapolated risk except that it is bounded from above. We illustrated our model using meta-data from 44 cohort mortality studies of outdoor
conducted globally.
3. Results
We illustrated our method with an analysis of the association between non-accidental mortality and
using meta-data on 28 cohorts that were previously examined [
3], with the addition of 16 cohorts published since 2021 (one in Canada [
12]; one in China [
13]; one in England [
14]; seven in Europe [
14]; and six in Asia [
15]), totaling 44 cohorts.
We defined two exposure ranges based on observed outdoor-monitored
concentrations [
16] and specific high-exposure situations [
4]. For the purposes of this example, we define the global outdoor exposure range as
based on the highest global annual concentration in cities in 2024 (Byrnihat, India, at 128
and Delhi, India, at 108
) [
16]. We also define an extended exposure range as
in order to more clearly illustrate the differences in model predictions over high exposures that have been observed [
4].
The meta-data are displayed in
Figure 1, when the logarithm of the hazard ratio is used
(Panel (A)) or when the hazard ratio is used directly
(Panel (B)) as red dots, along with the fifth and 95th percentiles of the cohort-specific exposure distributions (red horizontal lines). A great deal of scatter was observed for the meta-data, for either
or
, and there was no apparent pattern with concentration, suggesting that a constant over concentration maybe an appropriate derivative model.
If and , then , defining the model. Our estimate of yields a relative risk of 3.0 at and 9897 at . Estimates as large as 9897 appear biologically implausible.
We thus require a model that is flexible within the cohort exposure range, yet yields not such extreme values as the
model above the cohort exposure range. One such model is the
[
2], as it was specifically designed to model a variety of shapes (near linear, sub-linear, supra-linear, and sigmodal) within the cohort exposure range, yet approximate a logarithmic model for concentrations above the range. This is accomplished by limiting the
parameters
to lie within the cohort exposure range [
2].
We fit the
to our meta-data and displayed the relative risk predictions (blue line) and uncertainty interval (grey shaded area) over the global (
Figure 2, Panel (A)) and extended (
Figure 2, Panel (B)) exposure ranges. In addition, the cohort meta-distribution is displayed in Panel (A) (green line) along with our estimate of
(see
Appendix A) along with the
model predictions (black line). The
and
predictions are similar when concentrations are less than
. There is some departure between the model predictions between
and 120
(
Figure 2, Panel (A)), with the
displaying more curvature than the
model. Thus, the
is performing as it was designed. Concerns are raised, however, when the
predictions are extrapolated well beyond the global concentration range (Panel (B)) with
and an upper uncertainty limit of 22. The
is thus limited by the use of a single algebraic function over both the cohort and extended exposure ranges.
Burnett and collogues [
3] approached this problem by specifying two separate functions; one over the cohort exposure range (see
Appendix A) and another, a logarithmic function, used to extrapolate the risk beyond the cohort exposure range, denoted as the
model.
All three—the
(blue line:
Figure 3),
(black line:
Figure 3), and
(brown line:
Figure 3) relative risk predictions—were similar over the cohort exposure range (Panel (A)), but differed over the extended exposure range (Panel (B)), with the
predictions being greater than the
predictions, which were in turn greater than the
model predictions. We noted that the
uncertainty interval was clearly wider than the
model’s uncertainty interval (Panel (B)). However, the
model prediction at
, 4.7, was still very large, implying that 79% of all deaths are attributable to
exposure. In addition,
is an unbounded function.
We next considered applying our extrapolation criterion with the implication that
. The
model [
3] specified
, with the derivative declining at a rate of the inverse of the concentration above the cohort exposure range. We propose an alternative derivative model of the form
and
with the property
being bounded from above for all concentrations. We propose using the same algebraic form of the derivative for concentrations within the cohort exposure range as the
model. The restriction that the derivative of
be continuous at
in order for a smooth transition between different algebraic forms below and above
implies that
, when
, must be a function of the decay parameter
(see
Appendix A). We term this new model the
model with a comparison to the
model presented in
Figure 4. Methods for selecting the value of
and characterizing the magnitude and shape of
are provided in
Appendix A.
The
and Bounded Fusion relative risk predictions were similar over the cohort exposure range (
Figure 4, Panel (A)), with the
being slightly less than the
model predictions between the cohort and global concentration ranges (Panel (A)). However, the
model predictions were clearly less than those of the
model over the extended concentration range, with narrower uncertainty intervals (Panel (B)). The property of similar risk predictions over the cohort exposure range between the
and
models indicates that our method meets our fourth assumption that the extrapolation model does not influence risk predictions within the cohort exposure range.
We next compared the
and
model predictions between the two specifications of the link function
in
Figure 5. The predictions were larger when the
form was used compared to the
form due to taking exponentials. The difference between the two link specifications increased with the relative risk estimate at
, with a larger difference observed for the
model compared to the
model specification.
Setting an Upper Bound on Risk
Suppose one assumes a biological bound in risk
. The closer the value of
is to
, the larger the value of
, resulting in more curvature in the model predictions for higher concentrations (see
Appendix A). If
is only slightly greater than
,
can be very large. In the extreme case,
and
. If
is much larger than
,
would be only slightly greater than unity and the extrapolated relative risk would be approximately logarithmic. In this case, the
and
relative risk predictions would be similar.
It is not clear, however, how to select such a bound in practice. Relative risks due to active smoking were used to bound the risk in the original
[
8]. Using the active smoking risk as a guide, we noted that the relative risk of a current smoker to a never smoker for all causes of death was 2.78 [
17]. Our upper bounds of
and
are less than the active-smoking-related relative risk. The resulting relative risk estimates for concentrations above
would be larger than those displayed in
Figure 4 if this smoking-based upper bound was used.
4. Discussion
Exposure to fine particulate matter is a leading contributor to mortality and morbidity globally [
1]. As such, there is great interest in characterizing the association between exposure and health over the global exposure range [
1,
2,
3]. Although there have been several cohort studies examining this association, only a few have been conducted in situations where the subject’s exposure is similar to the highest global outdoor concentrations. There are cases of interest, however, where people are exposed to much higher levels than in any of the cohorts [
4]. Even if new cohorts are established in regions with high exposures, they are not likely to completely cover all cases of policy interest. Relative risk models are thus required to not only cover the global range of outdoor exposure to the general population, but also to much more extreme exposure scenarios that are observed today and for potential cases of changing future atmospheric conditions leading to even higher levels [
18]. Methods are thus required to extrapolate the risk potentially well beyond the risks observed in epidemiological studies, yet meet the requirements of biological plausibility when the extrapolated risks may be potentially so large that they are viewed with skepticism [
9].
We proposed a method of risk extrapolation for which the
over the extrapolated concentration range [
is equal to that over the observed exposure range
. This approach does not require strong additional assumptions like those needed for the
[
8]. We do not specifically bound the magnitude of the relative risk in absolute terms, but in relative terms with
. The larger the value of
, the larger the magnitude of the bound
.
We suggest that the link function be used when predictions of risk at very high concentrations are of interest, as it yields lower relative risk predictions and narrower uncertainty intervals compared to using the link function. The difference between the two forms increases as risk predictions increase. We suggest that it is unnecessary to inflate these predictions at high concentrations, given that we have no data to provide guidance at concentrations well beyond the cohort data exposure concentrations.
The cohort studies typically use a multi-year average of estimates of ambient concentration at a subject’s home address [
1,
2,
3]. However, cases of interest may involve situations where a subject is exposed to potentially very high concentrations for short periods of time, such as riding a motorcycle in Deli, India [
4]. In order to use our relative risk function for burden analyses, one would have to convert their estimated time-integrated exposure to a multi-year average equivalent.
We also suggest an alternative approach to bounding relative risk predictions by fixing the value of an upper bound that is not related to the prediction of risk over the cohort exposure range. This approach allows for the policy analyst to decide on what is an acceptable upper bound on risk.
The motivation of developing our model is also its major limitation, in that we have no observed evidence of risk at these extremely high concentrations. Ideally, one should conduct studies in these cases. In the absence of such evidence, ancillary information can provide some guidance as to the reliability of our model predictions. These include information on risk from other particulate sources at very high concentrations or a comparison of our high concentration model predictions with those of the 88 mortality risk factors reported by the
[
1].