A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling
Abstract
1. Introduction
2. Methods
2.1. Rubin’s Method
2.2. The Bootstrap Estimation Method
2.2.1. Assumptions
2.2.2. Variance Decomposition
2.2.3. Bootstrap Variance Estimator and Its Properties
3. Examples and Results
3.1. Simulation 1: Domain Mean Estimation
3.2. Simulation 2: Linear Regression
Rbias (%) | Mwidth | 95cov | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
n | B | M | d | |||||||||||||
500 | 500 | 10 | 0.8 | 0.2 | 0.6 | 146.2 | −12.5 | 0.5 | 65 | 102 | 58 | 65 | 95 | 99 | 89 | 95 |
0.6 | 1.6 | 21.6 | −4.9 | 1.5 | 48 | 53 | 46 | 48 | 95 | 97 | 95 | 95 | ||||
0.5 | 0.2 | −1.3 | 237.7 | −41.1 | 0.7 | 61 | 113 | 36 | 61 | 95 | 99 | 58 | 96 | |||
0.6 | −0.7 | 18.1 | −40.5 | −0.6 | 55 | 61 | 42 | 56 | 95 | 96 | 85 | 95 | ||||
30 | 0.8 | 0.2 | 1.0 | 156.8 | −12.6 | 0.9 | 63 | 100 | 57 | 62 | 95 | 99 | 92 | 95 | ||
0.6 | 1.3 | 21.6 | −5.3 | 1.3 | 47 | 52 | 46 | 47 | 95 | 97 | 95 | 95 | ||||
0.5 | 0.2 | −2.2 | 261.4 | −54.3 | 0.6 | 58 | 109 | 33 | 57 | 95 | 100 | 63 | 95 | |||
0.6 | −1.4 | 17.8 | −42.2 | −1.4 | 54 | 59 | 41 | 54 | 95 | 97 | 86 | 95 | ||||
200 | 10 | 0.8 | 0.2 | 0.5 | 146.2 | −12.5 | 0.2 | 65 | 102 | 58 | 65 | 95 | 99 | 89 | 95 | |
0.6 | 1.6 | 21.6 | −4.9 | 1.5 | 48 | 53 | 46 | 48 | 95 | 97 | 95 | 95 | ||||
0.5 | 0.2 | −1.4 | 237.7 | −41.1 | 0.7 | 62 | 113 | 36 | 61 | 95 | 99 | 58 | 96 | |||
0.6 | −0.8 | 18.1 | −40.5 | −0.7 | 55 | 61 | 42 | 56 | 95 | 96 | 85 | 95 | ||||
30 | 0.8 | 0.2 | 1.1 | 156.8 | −12.6 | 1.0 | 63 | 100 | 57 | 62 | 95 | 99 | 92 | 95 | ||
0.6 | 1.3 | 21.6 | −5.3 | 1.3 | 47 | 52 | 46 | 47 | 95 | 97 | 95 | 95 | ||||
0.5 | 0.2 | −2.2 | 261.4 | −54.3 | 0.6 | 61 | 108 | 33 | 57 | 96 | 100 | 63 | 95 | |||
0.6 | −1.3 | −42.2 | 17.8 | −1.3 | 54 | 59 | 41 | 54 | 95 | 97 | 86 | 95 | ||||
1000 | 500 | 10 | 0.8 | 0.2 | −1.1 | 141.4 | 13.1 | −1.1 | 46 | 72 | 42 | 46 | 95 | 99 | 89 | 95 |
0.6 | −4.0 | 53.4 | −9.7 | −4.0 | 34 | 37 | 33 | 34 | 95 | 97 | 94 | 95 | ||||
0.5 | 0.2 | −3.7 | 255.4 | −44.3 | −3.2 | 43 | 80 | 25 | 44 | 95 | 99 | 57 | 95 | |||
0.6 | −0.3 | 19.5 | −39.2 | −0.3 | 39 | 44 | 30 | 40 | 95 | 97 | 85 | 95 | ||||
30 | 0.8 | 0.2 | −1.5 | 148.6 | −14.6 | −1.6 | 44 | 71 | 41 | 44 | 95 | 99 | 92 | 95 | ||
0.6 | −3.7 | 15.7 | −9.8 | −3.6 | 33 | 37 | 32 | 34 | 95 | 96 | 94 | 95 | ||||
0.5 | 0.2 | −3.9 | 251.2 | −55.3 | −3.3 | 41 | 77 | 23 | 40 | 94 | 99 | 63 | 94 | |||
0.6 | −0.1 | 20.0 | −40.8 | −0.1 | 38 | 42 | 29 | 39 | 95 | 97 | 85 | 95 | ||||
200 | 10 | 0.8 | 0.2 | −1.0 | 141.4 | −13.1 | −1.0 | 46 | 72 | 42 | 46 | 95 | 99 | 89 | 95 | |
0.6 | −3.9 | 15.3 | −9.7 | −4.0 | 34 | 37 | 33 | 34 | 94 | 97 | 94 | 95 | ||||
0.5 | 0.2 | −3.8 | 225.4 | −44.3 | −3.3 | 44 | 80 | 25 | 44 | 96 | 99 | 57 | 95 | |||
0.6 | −0.3 | 19.5 | −39.2 | −0.6 | 39 | 44 | 30 | 40 | 95 | 97 | 85 | 95 | ||||
30 | 0.8 | 0.2 | −1.6 | 148.6 | 14.6 | −1.5 | 45 | 71 | 41 | 44 | 95 | 99 | 92 | 95 | ||
0.6 | −3.7 | 15.7 | −9.8 | −3.6 | 33 | 37 | 32 | 34 | 95 | 96 | 94 | 94 | ||||
0.5 | 0.2 | −4.0 | 251.2 | −55.3 | −3.3 | 42 | 77 | 23 | 40 | 95 | 99 | 63 | 94 | |||
0.6 | −0.1 | 20.0 | −40.8 | −0.1 | 38 | 42 | 29 | 39 | 95 | 97 | 85 | 95 |
3.3. A Real Data Analysis
4. Discussion and Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Clogg, C.C.; Rubin, D.B.; Schenker, N.; Schultz, B.; Weidman, L. Multiple imputation of industry and occupation codes in census public-use samples using bayesian logistic-regression. J. Am. Stat. Assoc. 1991, 86, 68–78. [Google Scholar] [CrossRef]
- Schafer, J.L.; Ezzati-Rice, T.M.; Johnson, W.; Khare, M.; Little, R.J.A.; Rubin, D.B. The NHANES III multiple imputation project. Race Ethn. 1996, 60, 28–37. [Google Scholar]
- Gelman, A.; King, G.; Liu, C. Not asked and not answered: Multiple imputation for multiple surveys. J. Am. Stat. Assoc. 1998, 93, 846–857. [Google Scholar] [CrossRef]
- Davey, A.; Shanahan, M.J.; Schafer, J.L. Correcting for selective nonresponse in the National Longitudinal Survey of Youth using multiple imputation. J. Hum. Resour. 2001, 36, 500–519. [Google Scholar] [CrossRef]
- Taylor, J.M.G.; Cooper, K.L.; Wei, J.T.; Sarma, A.V.; Raghunathan, T.E.; Heeringa, S.G. Use of multiple imputation to correct for nonresponse bias in a survey of urologic symptoms among African-American men. Am. J. Epidemiol. 2002, 156, 774–782. [Google Scholar] [CrossRef]
- Rubin, D.B. Multiple Imputation for Nonresponse in Surveys; John Wiley: Hoboken, NJ, USA, 1987. [Google Scholar]
- Rubin, D.B. Multiple imputation after 18+ years. J. Am. Stat. Assoc. 1996, 91, 473–489. [Google Scholar] [CrossRef]
- Meng, X.L. Multiple-imputation inferences with uncongenial sources of input. Stat. Sci. 1994, 9, 538–558. [Google Scholar]
- Fay, R.E. When are Inferences from Multiple Imputation Valid? Proceedings of the Section on Survey Research Methods; U.S. Bureau of the Census: Washington, DC, USA, 1992; pp. 227–232.
- Fay, R.E. Valid inferences from imputed survey data. Surv. Res. Methods 1993, 41–48. [Google Scholar]
- Binder, D.A.; Sun, W.M.; Amer Stat, A. Frequency valid multiple imputation for surveys with a complex design. Surv. Res. Methods 1996, 281–286. [Google Scholar]
- Wang, N.; Robins, J.M. Large-sample theory for parametric multiple imputation procedures. Biometrika 1998, 85, 935–948. [Google Scholar] [CrossRef]
- Nielsen, S.F. Proper and improper multiple imputation. Int. Stat. Rev. 2003, 71, 593–607. [Google Scholar] [CrossRef]
- Robins, J.M.; Wang, N.S. Inference for imputation estimators. Biometrika 2000, 87, 113–124. [Google Scholar] [CrossRef]
- Yang, S.; Kim, J.K. Fractional imputation in survey sampling: A Comparative Review. Stat. Sci. 2016, 31, 415–432. [Google Scholar] [CrossRef]
- Kim, J.K.; Michael Brick, J.; Fuller, W.A.; Kalton, G. On the bias of the multiple-imputation variance estimator in survey sampling. J. R. Stat. Soc. Ser. B 2006, 68, 509–521. [Google Scholar] [CrossRef]
- Efron, B.; Tibshirani, R. An Introduction to the Bootstrap; Chapman and Hall: New York, NY, USA, 1993. [Google Scholar]
- Sarndal, C.E. Methods for estimating the precision of survey estimates when imputation has been used. Surv. Methodol. 1992, 18, 241–252. [Google Scholar]
- Rao, J.N.K.; Shao, J. Jackknife variance-estimation with survey data under hot deck imputation. Biometrika 1992, 79, 811–822. [Google Scholar] [CrossRef]
- Rao, J.N.K. On variance estimation with imputed survey data. J. Am. Stat. Assoc. 1996, 91, 499–506. [Google Scholar] [CrossRef]
- Shao, J.; Sitter, R.R. Bootstrap for imputed survey data. J. Am. Stat. Assoc. 1996, 91, 1278–1288. [Google Scholar] [CrossRef]
- Shao, J.; Steel, P. Variance estimation for survey data with composite imputation and nonnegligible sampling fractions. J. Am. Stat. Assoc. 1999, 94, 254–265. [Google Scholar] [CrossRef]
- Haziza, D. Imputation and inference in the presence of missing data. In Handbook of Statistics: Sample Surveys: Theory Methods and Inference; Rao, C.R., Pfeffermann, D., Eds.; Elsevier: Amsterdam, The Netherlands, 2009; Volume 29A, pp. 215–246. [Google Scholar]
- Kim, J.K.; Rao, J.N.K. A unified approach to linearization variance estimation from survey data after imputation for item nonresponse. Biometrika 2009, 96, 917–932. [Google Scholar] [CrossRef]
- Chen, S.; Haziza, D.; Léger, C.; Mashreghi, Z. Pseudo-population bootstrap methods for imputed survey data. Biometrika 2019, 106, 369–384. [Google Scholar] [CrossRef] [PubMed]
- Lu, K.F.; Li, D.Y.; Koch, G.G. Comparison between two controlled multiple imputation methods for sensitivity analyses of time-to-event data with possibly informative censoring. Stat. Biopharm. Res. 2015, 7, 199–213. [Google Scholar] [CrossRef]
- Gao, F.; Liu, G.F.; Zeng, D.; Xu, L.; Lin, B.; Diao, G.; Golm, G.; Heyse, J.F.; Ibrahim, J.G. Control-based imputation for sensitivity analyses in informative censoring for recurrent event data. Pharm. Stat. 2017, 16, 424–432. [Google Scholar] [CrossRef] [PubMed]
- Schomaker, M.; Heumann, H. Bootstrap inference when using multiple imputation. Stat. Med. 2018, 37, 2252–2266. [Google Scholar] [CrossRef] [PubMed]
- Darken, P.; Nyberg, J.; Ballal, S.; Wright, D. The attributable estimand: A new approach to account for intercurrent events. Pharm. Stat. 2020, 19, 626–635. [Google Scholar] [CrossRef]
- Nguyen, T.L.; Collins, G.S.; Pellegrini, F.; Moons, K.G.; Debray, T.P. On the aggregation of published prognostic scores for causal inference in observational studies. Stat. Med. 2020, 39, 1440–1457. [Google Scholar] [CrossRef]
- Bartlett, J.W.; Hughes, R.A. Bootstrap inference for multiple imputation under uncongeniality and misspecification. Stat. Methods Med. Res. 2020, 29, 3533–3546. [Google Scholar] [CrossRef]
- Satterthwaite, F.E. An approximate distribution of estimates of variance components. Biom. Bull. 1946, 2, 110–114. [Google Scholar] [CrossRef]
- Schenker, N.; Welsh, A.H. Asymptotic results for multiple imputation. Ann. Stat. 1988, 16, 1550–1566. [Google Scholar] [CrossRef]
- Hox, J.J. Multilevel Analysis: Techniques and Applications; Lawrence Erlbaum: Mahwah, NJ, USA, 2002. [Google Scholar]
- Rubin, D.B.; Schenker, N. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J. Am. Stat. Assoc. 1986, 81, 366–374. [Google Scholar] [CrossRef]
Rbias (%) | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Scen | n | B | M | ||||||||||||
1 | 500 | 500 | 10 | 0.7 | 0.4 | 0.4 | 0.1 | 21 | 22 | 22 | 22 | 95 | 95 | 95 | 96 |
30 | 0.8 | 0.2 | 0.3 | 0.1 | 20 | 20 | 20 | 20 | 95 | 95 | 95 | 95 | |||
200 | 10 | 0.6 | 0.4 | 0.4 | −0.1 | 21 | 22 | 22 | 22 | 95 | 95 | 95 | 96 | ||
30 | 0.7 | 0.2 | 0.3 | 0.0 | 20 | 20 | 20 | 20 | 95 | 95 | 95 | 95 | |||
1000 | 500 | 10 | −1.8 | −2.4 | −2.3 | −2.3 | 14 | 15 | 15 | 16 | 94 | 95 | 95 | 96 | |
30 | −2.4 | −2.9 | −2.9 | −2.9 | 14 | 14 | 14 | 14 | 94 | 95 | 95 | 95 | |||
200 | 10 | −2.0 | −2.4 | −2.4 | −2.4 | 14 | 15 | 15 | 16 | 94 | 95 | 95 | 96 | ||
30 | −2.4 | −2.9 | −2.9 | −2.8 | 14 | 14 | 14 | 14 | 94 | 95 | 95 | 95 | |||
2 | 500 | 500 | 10 | −6.0 | −5.0 | −5.0 | −5.0 | 13 | 14 | 14 | 14 | 94 | 94 | 94 | 95 |
30 | −5.0 | −4.0 | −4.0 | −4.1 | 13 | 13 | 13 | 13 | 94 | 95 | 95 | 95 | |||
200 | 10 | −6.0 | −5.0 | −5.0 | −5.0 | 13 | 14 | 14 | 14 | 94 | 94 | 94 | 95 | ||
30 | −5.0 | −4.0 | −4.0 | −4.0 | 13 | 13 | 13 | 13 | 94 | 95 | 95 | 95 | |||
1000 | 500 | 10 | −1.0 | 0.4 | 0.4 | 0.0 | 9 | 10 | 10 | 10 | 95 | 95 | 95 | 96 | |
30 | −0.6 | 0.8 | 0.8 | 0.6 | 9 | 9 | 9 | 9 | 95 | 95 | 95 | 95 | |||
200 | 10 | −1.1 | 0.4 | 0.4 | −0.1 | 9 | 10 | 10 | 10 | 95 | 95 | 95 | 96 | ||
30 | −0.8 | 0.8 | 0.8 | 0.7 | 9 | 9 | 9 | 9 | 95 | 95 | 95 | 95 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, L.; Zhao, Y. A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling. Stats 2022, 5, 1231-1241. https://doi.org/10.3390/stats5040074
Yu L, Zhao Y. A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling. Stats. 2022; 5(4):1231-1241. https://doi.org/10.3390/stats5040074
Chicago/Turabian StyleYu, Lili, and Yichuan Zhao. 2022. "A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling" Stats 5, no. 4: 1231-1241. https://doi.org/10.3390/stats5040074
APA StyleYu, L., & Zhao, Y. (2022). A Bootstrap Method for a Multiple-Imputation Variance Estimator in Survey Sampling. Stats, 5(4), 1231-1241. https://doi.org/10.3390/stats5040074