# What to Do When Accumulated Exposure Affects Health but Only Its Duration Was Measured? A Case of Linear Regression

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Theoretical Analysis of Impact on Estimate of Effect of Cumulative Exposure

_{i}on the i

^{th}of n persons, the outcome model is assumed to be:

_{i}= β

_{0}+ β

_{1}log C

_{i}+ e

_{i},

_{i}is the cumulative exposure, e

_{i}is the error term distributed as N(0, σ

^{2}), and σ

^{2}, β

_{0}and β

_{1}are the parameters. The cumulative exposure of the i

^{th}person is defined as the product of duration of exposure (D

_{i}) and intensity (I

_{i}), such that the outcome models can be re-written as: (Y|D, I)~N(β

_{0}+β

_{1}(log D

_{i}+ log I

_{i}), σ

^{2}). There is theoretical and empirical evidence that many occupational exposures are well-described by the lognormal distribution [24,25] and emerging evidence that age up to an event, such as either development of illness or selection into an epidemiologic study, can follow the lognormal distribution [25,26]. Consequently, we focus on situation where (log I

_{i}, log D

_{i}) follows a bivariate normal distribution N

_{2}(

**µ**,

**Ʃ**), with means µ

_{I}and µ

_{D}, variances σ

_{I}

^{2}and σ

_{D}

^{2}, respectively, and a correlation ρ. This assumption is not necessary to linear regression in general, so we are considering a special case where such an assumption is defensible. Mathematical details pertinent to the rest of this section are in Appendix A, while the R [27] code needed to reproduce Figure 1, Figure 2 and Figure 3 is provided in Supplemental Material 1.

## 3. Naïve Analysis

_{0}+α

_{1}log(D), λ

^{2}), where expressions for (α

_{0}, α

_{1}, λ

^{2}) in terms of the original parameters are given in Appendix A. When the investigators have no information about intensity of exposure and naively regresses outcome on log(D) to estimate β

_{1}with ${\widehat{\alpha}}_{1}$, we show that they incur bias:

_{1}− β

_{1}= ρkβ

_{1},

_{I}/σ

_{D}. (In such an analysis, when the model in Equation (1) is assumed to be true, any interpretation of ${\widehat{\alpha}}_{1}$ must be a reflection of the true causal association mediated by non-zero intensity of exposure.) Outside of some uncommon settings (particular combinations of parameter values paired with a very small sample size), this estimator has a root-mean-squared-error (RMSE) greater than that obtained in the complete-data case by the regressing outcome on log(C) exposure to obtain $\widehat{\beta}$

_{1}(estimate of slope with complete data). In the special case where ρ = 0, bias is not incurred but variance of the estimator is inflated: Var(${\widehat{\alpha}}_{1}$) = n

^{−1}(σ

^{2}+β

_{1}

^{2}σ

^{2}

_{I})/σ

^{2}

_{D}> Var($\widehat{\beta}$

_{1}) = n

^{−1}σ

^{2}/(σ

^{2}

_{D}+ σ

^{2}

_{I}) (general expressions for estimator variances are in Appendix A). This is the same as Berkson-type error when log(D) is used as a surrogate of log(C) with error term log(I)~N(µ

_{I}, σ

^{2}

_{I}) [28]. When ρk < −1, the naïve analysis will estimate a target (tend to yield an estimate) that is in the opposite direction from the true effect (Figure 1). In other words, this situation can only occur when (a) intensity and duration are inversely related with sufficiently high correlation and (b) intensity is more variable than duration to a large enough degree to produce ρk < −1, leading to the case highlighted by Johnson [1]. Clearly, in such circumstances, as well as when bias is expected to be substantial, there is a motivation to either collect data on exposure intensity, or use knowledge about the joint distribution of intensity and duration to account for it in data analysis. Furthermore, when the RMSE of a naïve analysis is much worse than that obtainable with cumulative exposure, either further data collection, or adjustment are motivated, such as when duration and intensity are noticeably correlated (e.g., Figure 2). We develop intuition as to whether the adjustment can achieve worthwhile improvements in the next section; it is important to consider this because, where possible, the resources involved in additional statistical analyses and validation studies are less than the cost of full-scale assessment of intensity of exposure.

## 4. Adjusted Analysis: The Limit of What We can Learn when Only D is Available, but ρ and k are Known

**µ**and

**Ʃ**), then it is possible to remove bias but not possible to recover all the precision achievable with complete data. We remove the bias via the relationship implied by equation (2), so the adjusted estimator is:

_{1}= 0 and note that when −2 < ρk < 0, the RMSE of the adjusted estimate is worse than that of the naïve one: although there is no bias, precision deteriorates. This arises when the intensity and duration are inversely related. This is illustrated in Figure 3, that compares RMSE of adjusted and naïve estimators for ρk < 0: the red line indicates where RMSE’s are equal, such that values above the line indicate a situation where adjusted estimators outperform naïve ones. As the strength of the association with cumulative exposure increases (denoted by solid lines in Figure 3, each associated with different β

_{1}), the range of ρk values that result in worse RMSE in adjusted analysis declines. However, it is noteworthy that the degree to which the naïve estimator can outperform the adjusted estimator is small relative to the advantage of the adjustment under most conditions. The exact shape of solid lines in Figure 3 depends on parameters for which the figure is generated, but Figure 3 depicts the expected general pattern of interdependence of the ratio of RMSE, β

_{1}, and ρk. Furthermore, the relative magnitude of RMSE grows less favorable for the adjusted estimate for small sample size, because the variance contributes disproportionately to the RMSE, and dwarfs the contribution of bias that plagues the naïve estimator. Conversely, for large sample sizes, variances make little contribution to the RMSE whereas bias remains constant, leading to smaller RMSE for the unbiased adjusted estimator relative to the biased naïve estimator.

## 5. Bayesian Analysis when Information of Exposure Duration and Intensity is Disjointed

#### 5.1. Models

_{1}. We use a common default prior for the regression parameters (the g-prior [29], see Hoff [30] for an accessible description). We presume that the investigator uses a scaled beta distribution on [−1, 1] to set the prior on ρ, and a log-normal distribution to set the prior on k. As described in Appendix A, posterior computation is straightforward since the posterior distribution can be shown to be a truncated version of a distribution itself composed of standard distributions. Thus, simple Monte Carlo samples can be drawn from the posterior distribution and Markov chain Monte Carlo methods are not required. The general flavor of this analysis is in keeping with probabilistic bias analysis [19], including the need to discard some samples that violate a constraint imposed on β

_{1}by the residual variance of naïve analysis (λ

^{2}); the proportion of samples that violate the constraint grows as ρk nears −1 (details are in Appendix A).

#### 5.2. Synthetic Example

_{0}and β

_{1}to be weaker yet consistent with the original work (see Supplemental Material 2 for details, including R code for implementation of all analyses). The value of k consistent with the original paper is in the order of 2.6, implying that bias in duration-only analysis can be substantial according to Equation (2). We imagined two plausible values of ρ: −0.5 (e.g., assuming selection of highly exposed workers out of sample available for study due to their deteriorating health) and +0.5 (e.g., assuming a stable workforce with higher exposures in the past); this leads to ρk values of about −1.3 and 1.3, respectively. Both situations are common in occupational and environmental epidemiology and cannot be discounted a priori, but these situations are not meant to be all-encompassing of possible correlations. Having generated synthetic datasets using these parameters, we analyzed them via

- the naïve approach (duration only);
- four wide priors on ρ (two of which admit uncertainty about the sign of the correlation, when the prior mean is one standard deviation below) and k (Priors 1);
- four narrow priors on ρ and k (Priors 2);
- assuming known ρ and k; and
- complete data.

#### 5.3. Real-World Application

## 6. Discussion

## 7. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Theory

_{0}+ β

_{1}(log D + log I), σ

^{2}) and (log I, log D) ~ N

_{2}(

**µ**,

**Ʃ**) where $\mathsf{\mu}=({\mu}_{I},{\mu}_{D}{)}^{\prime}$ and

_{1}from regression of Y on log C, with estimator variance given as $nVar({\widehat{\beta}}_{1})={\sigma}^{2}/Var(\mathrm{log}C)$. For the sake of comparison with later expressions, using k = σ

_{I}/σ

_{D}, this can be re-expressed as:

_{0}+ α

_{1}log D, λ

^{2}), where α

_{0}= β

_{0}+ β

_{1}(μ

_{I}− ρkμ

_{D}), α

_{1}= (1+ρk)β

_{1}, and λ

^{2}= σ

^{2}+ ${\mathsf{\beta}}_{1}^{2}$ ${\sigma}_{I}^{2}$ (1 − ρ

^{2}). Thus the naïve estimator can be viewed as $\widehat{\alpha}$

_{1}obtained from regressing Y on logD, which targets α

_{1}rather than β

_{1}. The bias incurred is then ρkβ

_{1}, while the estimator variance is:

_{1}. The estimator variance is $Var({\widehat{\beta}}_{1,A})={\left(1+\rho k\right)}^{-2}Var({\widehat{\alpha}}_{1})$, which in fact can be written as

^{2}induces a constraint in the parameters governing (Y|D) and (D), namely that ${\mathsf{\beta}}_{1}^{2}$ < λ

^{2}/ {${\sigma}_{I}^{2}$(1 − ρ

^{2})} (to see this more clearly, consider that that ${\mathsf{\beta}}_{1}^{2}$ = λ

^{2}/ {${\sigma}_{I}^{2}$(1 − ρ

^{2})} would imply the impossible condition of σ

^{2}= 0). This is relevant to the special case that the known (ρ, k) values satisfy ρk = −1. Clearly ${\widehat{\beta}}_{1,A}$ does not exist in this case, and indeed β

_{1}is not a point identified by (Y,D) data. However, β

_{1}would be interval-identified, in that all quantities in the upper-bound for ${\beta}_{1}^{2}$ are either known, or estimable.

^{2}is that in the case that (ρ, k) are unknown and described by prior distributions, we must a priori rule out parameter values that violate the constraint. Expressed purely in the Y|D and D parameterization, the inequality takes the form:

_{1}() to be the g-prior with default hyper-parameters g = n, υ

_{0}= 1, σ

_{0}= 1 (as parameterized, for instance, in Hoff PD. Linear regression A first course in Bayesian statistical methods., New York: Springer-Verlag 2009;149–170). Similarly, g

_{2}() is specified as inverse gamma with shape and scale parameters both set to 0.5. As a convenient form for the investigator to specify prior information about ρ, g

_{3}() is specified as the scaled-beta distribution on [−1, 1], which can be simply parameterized via mean and standard deviation. Further, given the definition of k as a ratio of variances, we take g

_{4}() to be a log-normal distribution.

^{2}) and ${\sigma}_{D}^{2}$ along with the independent prior distributions for ρ and k (since neither ρ nor k appears in the likelihood function). Consequently, independent Monte Carlo draws from the joint posterior without the constraint are easily taken. The constraint can then be enforced simply by discarding those sampled (α, λ

^{2}, ${\sigma}_{D}^{2}$, ρ, k) draws that violate it. Markov Chain Monte Carlo methods are not required.

_{3}() and g

_{4}() to be point mass priors, we obtain a Bayesian version of the known (ρ, k) adjustment procedure. In doing so, if the dataset is such that there is little to no posterior truncation, then the resulting posterior mean and standard deviation of β

_{1}will closely approximate ${\widehat{\beta}}_{1,A}$ and $SE[{\widehat{\beta}}_{1,A}]$, as arises from Bayesian linear regression with a default prior. However, for datasets leading to considerable truncation, this approximate equivalence is no longer guaranteed. In particular, the Bayesian version should be more trustworthy when ρk is close to −1, with the possibility of achieving more precision than stated in (A.2).

## References

- Johnson, E.S. Duration of exposure as a surrogate for dose in the examination of dose response relations. Br. J. Ind. Med.
**1986**, 43, 427–429. [Google Scholar] [CrossRef] [PubMed] - Blair, A.; Thomas, K.; Coble, J.; Sandler, D.P.; Hines, C.J.; Lynch, C.F.; Knott, C.; Purdue, M.P.; Zahm, S.H.; Alavanja, M.C.; et al. Impact of pesticide exposure misclassification on estimates of relative risks in the Agricultural Health Study. Occup. Environ. Med.
**2011**, 68, 537–541. [Google Scholar] [CrossRef][Green Version] - Westberg, H.B.; Hardell, L.O.; Malmqvist, N.; Ohlson, C.G.; Axelson, O. On the use of different measures of exposure-experiences from a case-control study on testicular cancer and PVC exposure. J. Occup. Environ. Hyg.
**2005**, 2, 351–356. [Google Scholar] [CrossRef] - de Vocht, F.; Burstyn, I.; Sanguanchaiyakrit, N. Rethinking cumulative exposure in epidemiology, again. J. Expo. Sci. Environ. Epidemiol.
**2015**, 25, 467. [Google Scholar] [CrossRef] - Preller, L.; Burstyn, I.; De, P.N.; Kromhout, H. Characteristics of peaks of inhalation exposure to organic solvents. Ann. Occup. Hyg.
**2004**, 48, 643–652. [Google Scholar] - Nieuwenhuijsen, M.J.; Lowson, D.; Venables, K.M.; Newman-Taylor, A.J. Correlation between different measures of exposure in a cohort of bakery workers and flour millers. Ann. Occup. Hyg.
**1995**, 39, 291–298. [Google Scholar] [CrossRef] - McDonald, J.C.; McDonald, A.D.; Hughes, J.M.; Rando, R.J.; Weill, H. Mortality from lung and kidney disease in a cohort of North American industrial sand workers: An update. Ann. Occup. Hyg.
**2005**, 49, 367–373. [Google Scholar] [PubMed] - Lipworth, L.; Sonderman, J.S.; Mumma, M.T.; Tarone, R.E.; Marano, D.E.; Boice, J.D., Jr.; McLaughlin, J.K. Cancer mortality among aircraft manufacturing workers: An extended follow-up. J. Occup. Environ. Med.
**2011**, 53, 992–1007. [Google Scholar] [CrossRef] [PubMed] - Purdue, M.P.; Bakke, B.; Stewart, P.; De Roos, A.J.; Schenk, M.; Lynch, C.F.; Bernstein, L.; Morton, L.M.; Cerhan, J.R.; Severson, R.K. A case-control study of occupational exposure to trichloroethylene and non-Hodgkin lymphoma. Environ. Health Perspect.
**2011**, 119, 232–238. [Google Scholar] [CrossRef] - Burstyn, I.; Yang, Y.; Schnatter, A.R. Effects of non-differential exposure misclassification on false conclusions in hypothesis-generating studies. Int. J. Environ. Res. Public Health
**2014**, 11, 10951–10966. [Google Scholar] [CrossRef] - Loken, E.; Gelman, A. Measurement error and the replication crisis. Science
**2017**, 355, 584–585. [Google Scholar] [CrossRef] - Hoar, S. Job exposure matrix methodology. J. Toxicol. Clin. Toxicol.
**1983**, 21, 9–26. [Google Scholar] [CrossRef] - Peters, S.; Vermeulen, R.; Portengen, L.; Olsson, A.; Kendzia, B.; Vincent, R.; Savary, B.; Lavoue, J.; Cavallo, D.; Cattaneo, A.; et al. SYN-JEM: A Quantitative Job-Exposure Matrix for Five Lung Carcinogens. Ann. Occup. Hyg.
**2016**, 60, 795–811. [Google Scholar] [CrossRef][Green Version] - Kim, H.M.; Richardson, D.; Loomis, D.; Van Tongeren, M.; Burstyn, I. Bias in the estimation of exposure effects with individual-or group-based exposure assessment. J. Expo. Sci. Environ. Epidemiol.
**2011**, 21, 212–221. [Google Scholar] [CrossRef] - Tielemans, E.; Kupper, L.L.; Kromhout, H.; Heederik, D.; Houba, R. Individual-based and group-based occupational exposure assessment: Some equations to evaluate different strategies. Ann. Occup. Hyg.
**1998**, 42, 115–119. [Google Scholar] [CrossRef] - Xing, L.; Burstyn, I.; Richardson, D.B.; Gustafson, P. A comparison of Bayesian hierarchical modeling with group-based exposure assessment in occupational epidemiology. Stat. Med.
**2013**, 32, 3686–3699. [Google Scholar] [CrossRef] - Poole, C. Low P-values or narrow confidence intervals: Which are more durable? Epidemiology
**2001**, 12, 291–294. [Google Scholar] [CrossRef] - Lash, T.L. The Harm Done to Reproducibility by the Culture of Null Hypothesis Significance Testing. Am. J. Epidemiol.
**2017**, 186, 627–635. [Google Scholar] [CrossRef] - Lash, T.L.; Fox, M.P.; Fink, A.K. Applying Quantitative Bias Analysis to Epidemiologic Data; Springer Science+Business Media: Berlin, Germany, 2009. [Google Scholar]
- Talbott, E.O.; Gibson, L.B.; Burks, A.; Engberg, R.; McHugh, K.P. Evidence for a dose-response relationship between occupational noise and blood pressure. Arch. Environ. Health
**1999**, 54, 71–78. [Google Scholar] [CrossRef] - Seixas, N.S.; Neitzel, R.; Stover, B.; Sheppard, L.; Feeney, P.; Mills, D.; Kujawa, S. 10-Year prospective study of noise exposure and hearing damage among construction workers. Occup. Environ. Med.
**2012**, 69, 643–650. [Google Scholar] [CrossRef][Green Version] - Kennedy, S.M.; Chan-Yeung, M.; Marion, S.; Lea, J.; Teschke, K. Maintenance of stellite and tungsten carbide saw tips: Respiratory health and exposure-response evaluations. Occup. Environ. Med.
**1995**, 52, 185–191. [Google Scholar] [CrossRef] - Gustafson, P.; Burstyn, I. Bayesian inference of gene-environment interaction from incomplete data: What happens when information on environment is disjoint from data on gene and disease? Stat. Med.
**2011**, 30, 877–889. [Google Scholar] [CrossRef] - Koch, A.L. The logarithm in biology 1. Mechanisms generating the log-normal distribution exactly. J. Theor. Biol.
**1966**, 12, 276–290. [Google Scholar] [CrossRef] - Limpert, E.; Stahel, W.A.; Abbt, M. Log-normal distributions across the sciences: Keys and clues. BioScience
**2001**, 51, 341–352. [Google Scholar] [CrossRef] - Gualandi, S.; Toscani, G. Human Behavior And Lognormal Distribution. A Kinetic Description. arXiv
**2018**, arXiv:1809.01365. [Google Scholar] [CrossRef] - The R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2006; ISBN 3-900051-07-0. [Google Scholar]
- Berkson, J. Are there two regressions? Am. Stat. Assoc. J.
**1950**, 45, 164–180. [Google Scholar] [CrossRef] - Zellner, A. On assessing prior distributions and Bayesian regression analysis with g-prior distributions. Bayesian Inference Decis. Techn.
**1986**, 28, 253–305. [Google Scholar] - Hoff, P.D. Linear regression. In A First Course in Bayesian Statistical Methods, 1st ed.; Springer: New York, NY, USA, 2009; pp. 149–170. [Google Scholar]
- Reeves, G.K.; Cox, D.R.; Darby, S.C.; Whitley, E. Some aspects of measurement error in explanatory variables for continuous and binary regression models. Stat. Med.
**1998**, 17, 2157–2177. [Google Scholar] [CrossRef] - Prentice, R. Covariate measurement errors and parametric estimation in a failure time regression model. Biometrika
**1982**, 69, 331–341. [Google Scholar] [CrossRef] - Kim, H.M.; Yasui, Y.; Burstyn, I. Attenuation in risk estimates in logistic and Cox proportional-hazards models due to group-based exposure assessment strategy. Ann. Occup. Hyg.
**2006**, 50, 623–635. [Google Scholar] - Gustafson, P. Measurement Error and Misclassification in Statistics and Epidemiology; Chapman & Hall/CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
- Carrol, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement error in Nonlinear Models, 2nd ed.; Chapman & Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar]
- Lin, N.X.; Logan, S.; Henley, W.E. Bias and sensitivity analysis when estimating treatment effects from the cox model with omitted covariates. Biometrics
**2013**, 69, 850–860. [Google Scholar] [CrossRef] - Gail, M.H.; Wieand, S.; Piantadosi, S. Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika
**1984**, 71, 431–444. [Google Scholar] [CrossRef] - Lin, D.Y.; Psaty, B.M.; Kronmal, R.A. Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics
**1998**, 54, 948–963. [Google Scholar] [CrossRef] - McCandless, L.C.; Gustafson, P.; Levy, A. Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat. Med.
**2007**, 26, 2331–2347. [Google Scholar] [CrossRef] - Seixas, N.S.; Robins, T.G.; Becker, M. A novel approach to the characterization of cumulative exposure for the study of chronic occupational disease. Am. J. Epidemiol.
**1993**, 137, 463–471. [Google Scholar] [CrossRef] - Lubin, J.H.; Caporaso, N.E. Cigarette smoking and lung cancer: Modeling total exposure and intensity. Cancer Epidemiol. Biomarkers Prev.
**2006**, 15, 517–523. [Google Scholar] [CrossRef] - Smith, T.J.; Kriebel, D. A Biologic Approach to Environmental Assessment and Epidemiology; Oxford University Press: New York, NY, USA, 2010. [Google Scholar]
- Wang, D.; Shen, T.; Gustafson, P. Partial Identification arising from Nondifferential Exposure Misclassification: How Informative are Data on the Unlikely, Maybe, and Likely Exposed? Int. J. Biostat.
**2012**, 8, 1557–4679. [Google Scholar] [CrossRef] - Gustafson, P.; Le, N.D. Comparing the effects of continuous and discrete covariate mismeasurement, with emphasis on the dichotomization of mismeasured predictors. Biometrics
**2002**, 58, 878–887. [Google Scholar] [CrossRef] - Heavner, K.K.; Phillips, C.V.; Burstyn, I.; Hare, W. Dichotomization: 2 × 2 (×2 × 2 × 2...) categories: Infinite possibilities. BMC Med. Res. Methodol.
**2010**, 10, 59. [Google Scholar] [CrossRef]

**Figure 1.**The expected direction of the apparent association with duration of exposure, as a function of correlation of intensity and duration (ρ), ratio of variances of intensity and duration (k), and strength of causal effect (β

_{1}).

**Figure 2.**The root mean squared error (RMSE) as function of sample size in analysis (n) with duration of exposure (black), duration of exposure adjusted for distribution of intensity (grey), and cumulative exposure (light grey); dotted lines indicate that 95% confidence internal coverage is less than 50%. NB: correlation of intensity and duration varies by panel (ρ), ratio of variances of intensity and duration (k = 1), and strength of causal effect (β

_{1}= 0.5).

**Figure 3.**Circumstances when infusion of analysis with additional information on exposure intensity is expected to degrade root mean squared error (RMSE), as a function of correlation of intensity and duration (ρ = −0.5), ratio of variances of intensity and duration (k), and strength of causal effect (β

_{1}) for n = 5000, σ

^{2}= 0.01, Var(log C) = 1; red line indicates where RMSE’s are equal; blue line indicates where adjusted RMSE is undefined.

**Figure 4.**Adjusted estimates of β

_{1}with different degrees of knowledge about joint distribution of duration and intensity of exposure when ρ = −0.5 and k = 2.6 in four simulations of synthetic example; naïve estimate (NV) is contrasted with adjusted estimates obtained under “well-calibrated” priors on (ρ,k) that are “wide” (PR1), “narrow” (PR2), estimates obtained with ρ and k known (KNW; the best one can do without complete data), and complete data on intensity and duration (CMP); true value is denoted by dotted line, solid lines represent 95% credible intervals; see text for details.

**Figure 5.**Adjusted estimates of β

_{1}with different degrees of knowledge about joint distribution of duration and intensity of exposure when ρ = +0.5 k = 2.6 in four simulations of synthetic example; naïve estimate (NV) is contrasted with adjusted estimates obtained under “well-calibrated” priors on (ρ,k) that are “wide” (PR1), “narrow” (PR2) and estimates obtained with ρ and k known (KNW; the best one can do without complete data), and complete data on intensity and duration (CMP); true value is denoted by dotted line, solid lines represent 95% credible intervals; see text for details.

**Figure 6.**Estimated change in log(FVC, ml) among 570 male current smokers in NHANES 2011–2012 under different priors; naïve analysis is the association with log(years of smoking), complete analysis is the association with log(pack-years), see text for description of different priors (Prior 1, Prior 2, Fixed) that use information on correlation of logarithms of duration and pack-years (ρ) and ratio of standard deviations of logarithms of packs/day and duration (k); circles represent 50th percentile of posterior distributions and line span the 95% credible intervals, dashed line represents lower bound of the 95% credible interval with complete data.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Burstyn, I.; Barone-Adesi, F.; de Vocht, F.; Gustafson, P. What to Do When Accumulated Exposure Affects Health but Only Its Duration Was Measured? A Case of Linear Regression. *Int. J. Environ. Res. Public Health* **2019**, *16*, 1896.
https://doi.org/10.3390/ijerph16111896

**AMA Style**

Burstyn I, Barone-Adesi F, de Vocht F, Gustafson P. What to Do When Accumulated Exposure Affects Health but Only Its Duration Was Measured? A Case of Linear Regression. *International Journal of Environmental Research and Public Health*. 2019; 16(11):1896.
https://doi.org/10.3390/ijerph16111896

**Chicago/Turabian Style**

Burstyn, Igor, Francesco Barone-Adesi, Frank de Vocht, and Paul Gustafson. 2019. "What to Do When Accumulated Exposure Affects Health but Only Its Duration Was Measured? A Case of Linear Regression" *International Journal of Environmental Research and Public Health* 16, no. 11: 1896.
https://doi.org/10.3390/ijerph16111896