Model Averaging for Improving Inference from Causal Diagrams
Abstract
:1. Introduction
1.1. Uncertainty in Causal Modeling

1.2. Averaging Models to Avoid Investigator Bias
2. Methods
2.1. Example from the PIN Study

2.2. Three Approaches for Model Averaging
3. Results
| Adjustment Set | Covariates | Overweight vs. Normal | Obese vs. Normal | ||||
|---|---|---|---|---|---|---|---|
| Risk Ratio | 95% CI | Risk Ratio | 95% CI | AIC | Weight | ||
| 1 | Chronic hypertension, gestational weight gain, maternal age, maternal education, maternal race | 1.38 | 0.92, 2.09 | 1.75 | 1.23, 2.50 | 552.56 | 0.43 | 
| 2 | Chronic hypertension, maternal age, maternal education, maternal race, maternal height | 1.27 | 0.84, 1.92 | 1.46 | 1.04, 2.08 | 552.74 | 0.39 | 
| 3 | Gestational weight gain, maternal age, maternal education, maternal race, pre-eclampsia/eclampsia | 1.38 | 0.91, 2.09 | 1.74 | 1.22, 2.48 | 555.12 | 0.12 | 
| 4 | Maternal age, maternal education, maternal height, maternal race, pre-eclampsia/eclampsia | 1.29 | 0.85, 1.95 | 1.48 | 1.05, 2.10 | 556.43 | 0.06 | 
| AIC Averaged values | 1.33 | 0.86, 2.03 | 1.62 | 1.09, 2.39 | |||
| Overweight vs. Normal | Obese vs. Normal | |||
|---|---|---|---|---|
| Adjustment Set | Risk Ratio | 95% CI | Risk Ratio | 95% CI | 
| 1 | 1.41 | 0.93, 2.11 | 1.86 | 1.32, 2.62 | 
| 2 | 1.35 | 0.92, 2.00 | 1.48 | 1.06, 2.06 | 
| 3 | 1.38 | 0.91, 2.09 | 1.74 | 1.22, 2.48 | 
| 4 | 1.33 | 0.89, 1.97 | 1.43 | 1.02, 2.01 | 
| Average | 1.37 | 0.92, 2.04 | 1.61 | 1.15, 2.27 | 
| Overweight vs. Normal | Obese vs. Normal | |||||
|---|---|---|---|---|---|---|
| Adjustment Set | Risk Ratio | 95% Interval † | Risk Ratio | 95% Interval † | ||
| Mean | Median | Mean | Median | |||
| 1 | 1.39 | 1.36 | 0.92, 2.02 | 1.78 | 1.76 | 1.28, 2.46 | 
| 2 | 1.34 | 1.31 | 0.90, 1.93 | 1.45 | 1.42 | 1.07, 1.98 | 
| 3 | 1.40 | 1.38 | 0.87, 2.08 | 1.76 | 1.73 | 1.18, 2.50 | 
| 4 | 1.34 | 1.32 | 0.87, 1.96 | 1.43 | 1.41 | 1.01, 1.99 | 
| Average | 1.37 | 1.34 | 0.89, 2.01 | 1.60 | 1.57 | 1.07, 2.35 | 
| Averaging Approach | Overweight | Obese | 
|---|---|---|
| Akaike’s Information | 2.36 | 2.19 | 
| Inverse Variance | 2.22 | 1.97 | 
| Bootstrap resampling | 2.26 | 2.20 | 
4. Discussion
5. Conclusions
Supplementary Material
1. AIC Model Averaging
2. Inverse Variance Weighting
3. Software code for recreating model averaged results for bootstrap and AIC techniques
     SAS code
      *********************************************************************************
      ***** Model averaging via bootstrapping
      ***** hamrag@fellows.iarc.fr
      ***** Example from PIN study (UNC, Chapel Hill)
      ***** To request PIN data, please visit:
      ***** http://www.cpc.unc.edu/projects/pin/datause
      ********************************************************************************;
	  ** Import data;
      procimport out=one datafile='YOUR Directory'
      dbms=csv replace; getnames=yes; run;
	  ** Recode variables from data so there are 0 references;
      data one;
            set one;
            bmi = C_BMIIOM - 2;
            medu = edu - 1;
            height = C_INCHES - 65.04; *Center height at mean;
            if ind = 0 then induction = 0;
                  else induction = 1;
            run;
        *********************************************
        ***** Bootstrap data, 1000 replications
        *********************************************;
        procsurveyselect data=one out=pinboot
            seed = 280420141
            method = urs
            samprate = 100
            outhits
            rep = 1000;
        run;
        *****************************************************************
        ***** Fit 4 minimally sufficient models, output data from each
        *****************************************************************;
        * Model 1: adjust: hypertension, gestational weight gain, maternal age/edu/race;
        ods output ParameterEstimates = m1out;
        procgenmod data=pinboot desc;
            by Replicate;
            class bmi(ref=first);
            model cesarean = bmi hyper C_WTGAIN mom_age medu race /dist=binomial link=log;
        run;
        * Model 2: adjust: hypertension, maternal age/edu/race/height;
        ods output ParameterEstimates = m2out;
        procgenmod data=pinboot desc;
            by Replicate;
            class bmi(ref=first);
            model cesarean = bmi hyper mom_age medu race height/dist=binomial link=log;
        run;
        * Model 3: adjust: gestational weight gain, maternal age/edu/race, eclampsia;
        ods output ParameterEstimates = m3out;
        procgenmod data=pinboot desc;
            by Replicate;
            class bmi(ref=first);
            model cesarean = bmi C_WTGAIN mom_age medu race eclamp/dist=binomial link=log;
        run;
        * Model 4: adjust: maternal age/edu/race/height, eclampsia;
        ods output ParameterEstimates = m4out;
        procgenmod data=pinboot desc;
            by Replicate;
            class bmi(ref=first);
            model cesarean = bmi mom_age medu race height eclamp/dist=binomial link=log;
        run;
        **********************************************
        ****Extract BMI values from each dataset
        ****First, for overweight vs normal
        **********************************************;
        data m1over (keep = Estimate Replicate model);
            set m1out;
            if Parameter = 'bmi' AND Level1 = 1;
            model = 1;
            run;
        data m2over (keep = Estimate Replicate model);
            set m2out;
            if Parameter = 'bmi' AND Level1 = 1;
            model = 2;
            run;
        data m3over (keep = Estimate Replicate model);
            set m3out;
            if Parameter = 'bmi' AND Level1 = 1;
            model = 3;
            run;
        data m4over (keep = Estimate Replicate model);
            set m4out;
            if Parameter = 'bmi' AND Level1 = 1;
            model = 4;
            run;
        **** Pool datasets and summarize estimate;
        data over;
            set m1over m2over m3over m4over;
            exp = exp(Estimate);
            run;
        *summary of bootstrap estimates by adjustment set;
        procunivariate data=over;
            by model;
            var exp;
            output out= over1a mean=mean pctlpts = 2.5, 50, 97.5 pctlpre=ci;
        run;
        *Model average of overweight versus normal weight;
        procunivariate data=over;
            var exp;
            output out= over1b mean=mean pctlpts = 2.5, 50, 97.5 pctlpre=ci;
        run;
        **********************************************
        ** Repeat above for obese versus normal
        *********************************************;
        data m1obese (keep = Estimate Replicate model);
            set m1out;
            if Parameter = 'bmi' AND Level1 = 2;
            model = 1;
            run;
        data m2obese (keep = Estimate Replicate model);
            set m2out;
            if Parameter = 'bmi' AND Level1 = 2;
            model = 2;
            run;
        data m3obese (keep = Estimate Replicate model);
            set m3out;
            if Parameter = 'bmi' AND Level1 = 2;
            model = 3;
            run;
        data m4obese (keep = Estimate Replicate model);
            set m4out;
            if Parameter = 'bmi' AND Level1 = 2;
            model = 4;
            run;
        **** Pool datasets and summarize estimate;
        data obese;
            set m1obese m2obese m3obese m4obese;
            exp = exp(Estimate);
            run;
        *summary of bootstrap estimates by adjustment set;
        procunivariate data=obese;
            by model;
            var exp;
            output out= obese1a mean=mean pctlpts = 2.5, 50, 97.5 pctlpre=ci;
        run;
        *Model average of overweight versus normal weight;
        procunivariate data=obese;
            var exp;
            output out= obese1b mean=mean pctlpts = 2.5, 50, 97.5 pctlpre=ci;
        run;
        *end of file;
        R code
        ###########################################################################
        ##### Multi-model inference with AIC weighting
        ###########################################################################
        ## Load relevant libraries and set working directory
        library(Epi)
        library(foreign)
        library(MuMIn)
        library(boot)
        setwd('Your directory')
        ## load and summarize data
        PIN <- read.csv('PIN.csv',header=T)
        str(PIN)
        ## Re-code BMI and Education so there is a zero referent
        ## also center height and record induction
        PIN$BMI <- PIN$C_BMIIOM - 2
        PIN$m_edu <- PIN$edu - 1
        PIN$height <- PIN$C_INCHES - 65.04 # center height
        PIN$induction <- as.numeric(ifelse(PIN$ind==0,0,1)) # categorize induction into 0,1
        ####################################################
        ## Average over all minimally sufficient adjustment sets
        ####################################################
        ## restrict data to exclude missings. Necessary for averaging with AIC!
        keep <- c('cesarean','BMI','hyper','C_WTGAIN','mom_age','m_edu',
         'race','height','eclamp')
        PIN1 <- na.omit(PIN[keep])
        attach(PIN1)
        #################################################################
        ## NOTE: some models run with reduced adjustment sets to obtain
        ## starting values that help with model convergence
        #################################################################
        ## model 1: hypertension, gestational weight gain, maternal age/edu/race
        m1a <- glm(cesarean ~ factor(BMI) + hyper + C_WTGAIN +
         mom_age + m_edu + race,
         family=binomial(link='log'))
        m1 <- glm(cesarean ~ factor(BMI) + hyper + C_WTGAIN +
         mom_age + m_edu + race,
         family=binomial(link='log'))
        summary(m1)
        ci.exp(m1)
        ## model 2: hypertension, maternal age/edu/race/height
        m2a <- glm(cesarean ~ factor(BMI) + hyper + mom_age + m_edu +
         race,
         family=binomial(link='log'))
        m2 <- glm(cesarean ~ factor(BMI) + hyper + mom_age + m_edu +
         race + height,
         family=binomial(link='log'), start=c(coef(m2a),0))
        summary(m2)
        ci.exp(m2)
        ## model 3: gestational weight gain, maternal age/edu/race, eclampsia
        m3 <- glm(cesarean ~ factor(BMI) + C_WTGAIN + mom_age + m_edu +
         race + eclamp,
         family=binomial(link='log'))
        summary(m3)
        ci.exp(m3)
        ## model 4: maternal age/edu/race/height, eclampsia
        m4a <- glm(cesarean ~ factor(BMI) + mom_age + m_edu + race +
         eclamp,
         family=binomial(link='log'))
        m4 <- glm(cesarean ~ factor(BMI) + mom_age + m_edu + race +
         eclamp + height,
         family=binomial(link='log'), start=c(coef(m4a),0))
        summary(m4)
        ci.exp(m4)
        ## Average the results of the four models above
        models <- list(m1,m2,m3,m4)
        AIC_avg <- model.avg(models, rank=AIC, cumsum(weight)<=0.95)
        summary(AIC_avg)
        confint(AIC_avg)
        #end of file
		
	Acknowledgments
Author Contributions
Conflicts of Interest
References
- Robins, J.M.; Greenland, S. The role of model selection in causal inference from nonexperimental data. Am. J. Epidemiol. 1986, 123, 392–402. [Google Scholar] [PubMed]
 - Wynder, E.L.; Higgins, I.T.; Harris, R.E. The wish bias. J. Clin. Epidemiol. 1990, 43, 619–621. [Google Scholar] [PubMed]
 - Cope, M.B.; Allison, D.B. White hat bias: Examples of its presence in obesity research and a call for renewed commitment to faithfulness in research reporting. Int. J. Obes. 2010, 34, 84–88. [Google Scholar]
 - Cope, M.B.; Allison, D.B. White hat bias: A threat to the integrity of scientific reporting. Acta Paediatr. 2010, 99, 1615–1617. [Google Scholar] [PubMed]
 - Greenland, S.; Pearl, J.; Robins, J.M. Causal diagrams for epidemiologic research. Epidemiology 1999, 10, 37–48. [Google Scholar] [CrossRef] [PubMed]
 - Pearl, J. Causality: Models, Reasoning, and Inference; Cambridge University Press: New York, NY, USA, 2000. [Google Scholar]
 - Raftery, A.E. Bayesian model selection in social research. Sociol. Methodol. 1995, 25, 111–163. [Google Scholar] [CrossRef]
 - Viallefont, V.; Raftery, A.E.; Richardson, S. Variable selection and Bayesian model averaging in case-control studies. Stat. Med. 2001, 20, 3215–3230. [Google Scholar] [CrossRef] [PubMed]
 - Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference: A Practical Information-theoretic Approach, 2nd ed.; Springer: New York, NY, USA, 2002. [Google Scholar]
 - VanderWeele, T.J.; Robins, J.M. Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology 2007, 18, 561–568. [Google Scholar] [CrossRef] [PubMed]
 - Shrier, I.; Platt, R.W. Reducing bias through directed acyclic graphs. BMC Med. Res. Methodol. 2008, 8, 70. [Google Scholar] [CrossRef] [PubMed]
 - Lash, T.L.; Fox, M.P.; MacLehose, R.F.; Maldonado, G.; McCandless, L.C.; Greenland, S. Good practices for quantitative bias analysis. Int. J. Epidemiol. 2014, 43, 1969–1985. [Google Scholar] [CrossRef] [PubMed]
 - Schisterman, E.F.; Cole, S.R.; Platt, R.W. Overadjustment bias and unnecessary adjustment in epidemiologic studies. Epidemiology 2009, 20, 488–495. [Google Scholar] [CrossRef] [PubMed]
 - Naimi, A.I.; Cole, S.R.; Westreich, D.J.; Richardson, D.B. A comparison of methods to estimate the hazard ratio under conditions of time-varying confounding and nonpositivity. Epidemiology 2011, 22, 718–723. [Google Scholar] [CrossRef] [PubMed]
 - Cole, S.R.; Frangakis, C.E. The consistency statement in causal inference: A definition or an assumption? Epidemiology 2009, 20, 3–5. [Google Scholar] [CrossRef]
 - Sobel, M.E. What do randomized studies of housing mobility demonstrate? Causal inference in the face of interference. J. Am. Stat. Assoc. 2006, 101, 1398–1407. [Google Scholar] [CrossRef]
 - Greenland, S. Randomization, statistics, and causal inference. Epidemiology 1990, 1, 421–429. [Google Scholar] [CrossRef] [PubMed]
 - Cole, S.R.; Platt, R.W.; Schisterman, E.F.; Chu, H.; Westreich, D.; Richardson, D.; Poole, C. Illustrating bias due to conditioning on a collider. Int. J. Epidemiol. 2010, 39, 417–420. [Google Scholar] [CrossRef] [PubMed]
 - Savitz, D.A.; Dole, N.; Williams, J.; Thorp, J.M.; McDonald, T.; Carter, A.C.; Eucker, B. Determinants of participation in an epidemiological study of preterm delivery. Paediatr. Perinat. Epidemiol. 1999, 13, 114–125. [Google Scholar] [CrossRef] [PubMed]
 - Vahratian, A.; Siega-Riz, A.M.; Savitz, D.A.; Zhang, J. Maternal pre-pregnancy overweight and obesity and the risk of cesarean delivery in nulliparous women. Ann. Epidemiol. 2005, 15, 467–474. [Google Scholar] [CrossRef] [PubMed]
 - Buckland, S.T.; Burnham, K.P.; Augustin, N.H. Model selection: An integral part of inference. Biometrics 1997, 53, 603–618. [Google Scholar] [CrossRef]
 - Hoeting, J.A.; Madigan, D.; Raftery, A.E.; Volinsky, C.T. Bayesian model averaging: A tutorial. Statist. Sci. 1999, 14, 382–401. [Google Scholar]
 - Rothman, K.J.; Greenland, S.; Lash, T.L. Modern Epidemiology, 3rd ed.; Lippincott Williams & Wilkins: Philadelphia, PA, USA, 2008. [Google Scholar]
 - Cochrane Collaboration. In Cochrane Handbook for Systematic Reviews of Interventions; Higgins, J.P.T.; Green, S. (Eds.) Wiley-Blackwell: Hoboken, NJ, USA, 2008.
 - Efron, B.; Tibshirani, R. An Introduction to the Bootstrap; Chapman & Hall: New York, NY, USA; CRC Press: Boca Raton, FL, USA, 1993. [Google Scholar]
 - Poole, C. Low P-values or narrow confidence intervals: Which are more durable? Epidemiology 2001, 12, 291–294. [Google Scholar] [CrossRef] [PubMed]
 - Dominici, F.; Wang, C.; Crainiceanu, C.; Parmigiani, G. Model selection and health effect estimation in environmental epidemiology. Epidemiology 2008, 19, 558–560. [Google Scholar] [CrossRef] [PubMed]
 - Richardson, D.B.; Cole, S.R. Model averaging in the analysis of leukemia mortality among Japanese A-bomb survivors. Radiat. Environ. Biophys. 2012, 51, 93–95. [Google Scholar] [CrossRef] [PubMed]
 - Greenland, S. Invited commentary: Variable selection versus shrinkage in the control of multiple confounders. Am. J. Epidemiol. 2008, 167, 523–529. [Google Scholar] [CrossRef] [PubMed]
 - Rubin, D.B. The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Stat. Med. 2007, 26, 20–36. [Google Scholar] [CrossRef] [PubMed]
 - Brookhart, M.A.; Schneeweiss, S.; Rothman, K.J.; Glynn, R.J.; Avorn, J.; Sturmer, T. Variable selection for propensity score models. Am. J. Epidemiol. 2006, 163, 1149–1156. [Google Scholar] [CrossRef] [PubMed]
 - Greenland, S.; Robins, J.M.; Pearl, J. Confounding and collapsibility in causal inference. Stat. Sci. 1999, 14, 29–46. [Google Scholar]
 
© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hamra, G.B.; Kaufman, J.S.; Vahratian, A. Model Averaging for Improving Inference from Causal Diagrams. Int. J. Environ. Res. Public Health 2015, 12, 9391-9407. https://doi.org/10.3390/ijerph120809391
Hamra GB, Kaufman JS, Vahratian A. Model Averaging for Improving Inference from Causal Diagrams. International Journal of Environmental Research and Public Health. 2015; 12(8):9391-9407. https://doi.org/10.3390/ijerph120809391
Chicago/Turabian StyleHamra, Ghassan B., Jay S. Kaufman, and Anjel Vahratian. 2015. "Model Averaging for Improving Inference from Causal Diagrams" International Journal of Environmental Research and Public Health 12, no. 8: 9391-9407. https://doi.org/10.3390/ijerph120809391
APA StyleHamra, G. B., Kaufman, J. S., & Vahratian, A. (2015). Model Averaging for Improving Inference from Causal Diagrams. International Journal of Environmental Research and Public Health, 12(8), 9391-9407. https://doi.org/10.3390/ijerph120809391
        
                                                