A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs
Abstract
:1. Introduction
2. The Setup
- (i)
- A sample of psus, of size n, is selected according to a given sampling design with first-order inclusion probabilities, and with second-order inclusion probabilities, Finally, let
- (ii)
- In the ith psu sampled at the first stage, , a subsample of the elements, , of size is selected according to a given sampling design with first-order inclusion probabilities and second-order inclusion probabilities . Subsampling in a given psu is carried out independently of subsampling in any other psu.
3. Bootstrap Procedures for Simple Random Sampling without Replacement at Both Stages
3.1. The Rescaling Bootstrap Algorithm
- Step 1.
- Draw a sample of size n psus from , according to simple random sampling with replacement.
- Step 2.
- From each psu selected in Step 1, select a sample of elements, of size according to simple random sampling with replacement. For a psu selected more than once in Step 1, perform independent subsampling.
- Step 3.
- Let be the y-value of the kth bootstrap element in the ith bootstrap psu and be the -value of the ith bootstrap psu and is defined similarly. Let
- Step 4.
- Compute using the same formulae that were used to obtain the original point estimator.
- Step 5.
- Repeat Steps 1–4 a large number of times, B, to obtain .
- Step 6.
- The bootstrap variance estimator is . In practice, the Monte Carlo approximation of is applied
3.2. The Mirror-Match Bootstrap Algorithm
- Step 1.
- Choose and draw a sample of size psus from , according to simple random sampling without replacement.
- Step 2.
- Repeat Step 1 times independently to obtain a bootstrap sample of psus of size , where .
- Step 3.
- Choose and draw according to simple random sampling without replacement units within the ith psu obtained in Steps 1 and 2.
- Step 4.
- Repeat Step 3 times independently to obtain a bootstrap sample of size from the ith psu drawn in Step 3, where
- Step 5.
- Compute using the same formulae that were used to obtain the original point estimator.
- Step 6.
- Repeat Steps 1–5 a large number of times, B, to obtain .
- Step 7.
- The bootstrap variance estimator is . In practice, the Monte Carlo approximation of is applied
3.3. The Without-Replacement Bootstrap Algorithm
- Step 1:
- Create a pseudo-population by replicating each psu in times and each unit within the ith psu times. Let be the resulting pseudo-population consisting of psus, , of size , where there exists such that . Let be the total number of elements in the pseudo-population.
- Step 2:
- From the pseudo-population , select a sample of psus, , of size according to simple random sampling without replacement. In each selected psu, select a sample, , of size according to simple random sampling without replacement.
- Step 3:
- Compute using the formulae that were used to obtain the original point estimator.
- Step 4:
- Repeat Steps 2 and 3 a large number of times, B, to obtain .
- Step 5:
- The bootstrap variance estimator is . In practice, the Monte Carlo approximation of is applied
3.4. The Bernoulli Bootstrap Algorithm
- Step 1.
- Draw a sample, of size from the original sample of clusters, , according to simple random sampling with replacement. Generate n Bernoulli random variables, , with probabilityFor each , keep the ith cluster in the bootstrap sample and go to Step 2, if , and replace the ith cluster with one randomly selected cluster from , if .
- Step 2.
- For cluster i kept in Step 1, draw a sample, of size from the original sample according to simple random sampling with replacement. Generate Bernoulli random variable, , with probabilityFor each , keep the th element in the bootstrap sample, , if , and replace it with one randomly selected element from , if .
- Step 3.
- Compute using the formulae that were used to obtain the original point estimator.
- Step 4.
- Repeat Steps 1–3 a large number of times, B, to obtain .
- Step 5.
- The bootstrap variance estimator is . In practice, the Monte Carlo approximation of is applied
3.5. The Preston Bootstrap Weights Algorithm
- Step 1.
- Draw a sample of size psus from , according to simple random sampling without replacement. Let if the ith psu is selected and otherwise.
- Step 2.
- Define the psu bootstrap weights:
- Step 3.
- Within each of the sample of psus selected in Step 1, draw a simple random sample without replacement, of size Let if the kth element in the ith psu is selected and otherwise. We define the conditional element bootstrap weights:
- Step 4.
- Compute using the formulae that were used to obtain the original point estimator with the original weights replaced by the bootstrap weights .
- Step 5.
- Repeat Steps 1–4 a large number of times, B, to obtain .
- Step 6.
- The bootstrap variance estimator is . In practice, the Monte Carlo approximation of is applied
4. Bootstrap Procedures for Unequal Probability Sampling Designs
4.1. The Rao-Wu-Yue Bootstrap Weights Algorithm
- Step 1.
- Select psus according to simple random sampling with replacement from .
- Step 2.
- Define the bootstrap weight as
- Step 3.
- Compute using the formulae that were used to obtain the original point estimator with the original weights replaced by the bootstrap weights .
- Step 4.
- Repeat Steps 1–3 B times to obtain .
- Step 5.
- The bootstrap variance estimator is . In practice, the Monte Carlo approximation of is applied
4.2. The Pseudo-Population Bootstrap Algorithm
- Step 1.
- Each unit is duplicated times to create a second-stage pseudo-population denoted by , where denotes the closet integer.
- Step 2.
- Each pair is duplicated times. The population of pairs is completed by selecting a sample in the set by means of sampling design with first-order inclusion probabilities . This leads to the pseudo-population .
- Step 3.
- Select a first-stage bootstrap sample from using the original first-stage sampling design with first-order inclusion probabilities .
- Step 4.
- Select a second-stage bootstrap sample from using the original second-stage sampling design. We set with probability and with probability . This procedure is applied to each pair The union of the ’s leads to the bootstrap sample .
- Step 5.
- Compute using the formulae that were used to obtain the original point estimator.
- Step 6.
- Steps 3–5 are repeated times to obtain the bootstrap statistics . Let
- Step 7.
- Steps 2–6 are repeated times to obtain . The variance of is estimated by
5. Simulation Study
- (i)
- At the first stage, we selected n psus according to two sampling designs: simple random sampling without replacement and inclusion probability-proportional-to-size randomized systematic sampling. The value of n was set to which corresponds to a first-stage sampling fraction, and which corresponds to
- (ii)
- At the second stage, elements within each psu selected at the first stage were selected according to simple random sampling without replacement.
6. Final Remarks
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
References
- Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
- Rao, J.N.K.; Wu, C.F.J. Resampling inference with complex survey data. J. Am. Stat. Assoc. 1988, 83, 231–241. [Google Scholar] [CrossRef]
- Sitter, R.R. A resampling procedure for complex survey data. J. Am. Stat. Assoc. 1992, 87, 755–765. [Google Scholar] [CrossRef]
- Rao, J.N.K.; Wu, C.F.J.; Yue, K. Some recent work on resampling methods for complex surveys. Surv. Methodol. 1992, 18, 209–217. [Google Scholar]
- Gross, S. Median estimation in sample surveys. Proc. Sect. Surv. Res. Methods 1980, 181–184. [Google Scholar]
- Bickel, P.J.; Freedman, D.A. Asymptotic normality and the bootstrap in stratified sampling. Ann. Stat. 1984, 12, 470–482. [Google Scholar] [CrossRef]
- Booth, J.G.; Butler, R.W.; Hall, P. Bootstrap methods for finite populations. J. Am. Stat. Assoc. 1994, 89, 1282–1289. [Google Scholar] [CrossRef]
- Chauvet, G. Méthodes de Bootstrap en Population Finie. Ph.D. Thesis, École Nationale de Statistique et Analyse de l’Information, Bruz, France, 2007. [Google Scholar]
- Antal, E.; Tillé, Y. A direct bootstrap method for complex sampling designs from a finite population. J. Am. Stat. Assoc. 2011, 106, 534–543. [Google Scholar] [CrossRef] [Green Version]
- Beaumont, J.F.; Patak, Z. On the generalized bootstrap for sample surveys with special attention to Poisson sampling. Int. Stat. Rev. 2012, 80, 127–148. [Google Scholar] [CrossRef]
- Mashreghi, Z.; Haziza, D.; Léger, C. A survey of bootstrap methods in finite population sampling. Stat. Surv. 2016, 10, 1–52. [Google Scholar] [CrossRef]
- Särndal, C.E.; Swensson, B.; Wretman, J. Model-Assisted Survey Sampling; Springer: New York, NY, USA, 1992. [Google Scholar]
- Beaumont, J.F.; Béliveau, A.; Haziza, D. Clarifying some aspects of variance estimation in two-phase sampling. J. Surv. Stat. Methodol. 2015, 3, 524–542. [Google Scholar] [CrossRef]
- Wolter, K.M. Introduction to Variance Estimation; Springer Series in Statistics: New York, NY, USA, 2007. [Google Scholar]
- Sitter, R.R. Resampling Procedures for Complex Survey Data. Ph.D. Thesis, University of Waterloo, Waterloo, ON, Canada, 1989. [Google Scholar]
- Sitter, R.R. Comparing three bootstrap methods for survey data. Can. J. Stat. 1992, 20, 135–154. [Google Scholar] [CrossRef]
- Funaoka, F.; Saigo, H.; Sitter, R.R.; Toida, T. Bernoulli bootstrap for stratified multistage sampling. Surv. Methodol. 2006, 32, 151–156. [Google Scholar]
- Preston, J. Rescaled bootstrap for stratified multistage sampling. Surv. Methodol. 2009, 35, 227–234. [Google Scholar]
- Hájek, J. Asymptotic theory of rejective sampling with varying probabilities from a finite population. Ann. Math. Stat. 1964, 35, 1491–1523. [Google Scholar] [CrossRef]
- Berger, Y.G. Rate of convergence for asymptotic variance of the Horvitz–Thompson estimator. J. Stat. Plan. Inference 1998, 74, 149–168. [Google Scholar] [CrossRef]
- Matei, A.; Tillé, Y. Evaluation of variance approximations and estimators in maximum entropy sampling with unequal probability and fixed sample size. J. Off. Stat. 2005, 21, 543–570. [Google Scholar]
- Haziza, D.; Mecatti, F.; Rao, J.N.K. Evaluation of some approximate variance estimators under the Rao–Sampford unequal probability sampling design. Metron 2008, 66, 91–108. [Google Scholar]
- Saigo, H. Comparing four bootstrap methods for stratified three-stage sampling. J. Off. Stat. 2010, 26, 193–207. [Google Scholar]
- Shao, J.; Sitter, R.R. Bootstrap for imputed survey data. J. Am. Stat. Assoc. 1996, 91, 1278–1288. [Google Scholar] [CrossRef]
- Beaumont, J.F.; Émond, N. A bootstrap variance estimation method for multistage sampling and two-phase sampling when poisson sampling is used at the second phase. Stats 2022, 5, 339–357. [Google Scholar] [CrossRef]
Sampling Design | Population Total | Population Median |
---|---|---|
SRSWOR/SRSWOR | Textbook variance estimator Rao and Wu [2] Rao et al. [4] Modified Sitter Funaoka et al. [17] Chauvet [8] Preston [18] | Linearization variance estimator Rao and Wu [2] Rao et al. [4] Modified Sitter Funaoka et al. [17] Chauvet [8] Preston [18] |
IPPSWOR/SRSWOR | Textbook variance estimator Rao et al. [4] Chauvet [8] | Linearization variance estimator Rao et al. [4] Chauvet [8] |
Bootstrap Method | |||||
Textbook | 1.99 | 5.10 | 1.28 | 5.05 | |
Chauvet [8] | −7.79 | 2.49 | −8.39 | 2.54 | |
Rao et al. [4] | 6.99 | 29.07 | 6.27 | 29.38 | |
Rao and Wu [2] | 8.66 | 10.00 | 6.96 | 9.20 | |
Preston [18] | 1.88 | 5.20 | 1.13 | 5.11 | |
Modified Sitter | 6.97 | 9.93 | 5.84 | 9.49 | |
Funaoka et al. [17] | 1.80 | 4.00 | 1.08 | 4.15 | |
Textbook | 11.71 | 7.42 | 19.31 | 10.96 | |
Chauvet [8] | −0.02 | 7.56 | 0.03 | 8.67 | |
Rao et al. [4] | 11.19 | 20.39 | 13.05 | 27.92 | |
Rao and Wu [2] | 930.28 | 729.81 | 502.50 | 367.14 | |
Preston [18] | 10.97 | 7.58 | 16.58 | 11.00 | |
Modified Sitter | 11.81 | 12.98 | 10.74 | 13.20 | |
Funaoka et al. [17] | 7.38 | 2.60 | 9.21 | 6.76 |
Bootstrap Method | |||||
Textbook | 43.4 | 18.9 | 45.3 | 19.8 | |
Chauvet [8] | 43.9 | 19.9 | 45.9 | 20.9 | |
Rao et al. [4] | 44.0 | 20.1 | 45.8 | 21.0 | |
Rao and Wu [2] | 41.2 | 19.3 | 43.3 | 20.2 | |
Preston [18] | 43.8 | 20.2 | 45.6 | 21.0 | |
Modified Sitter | 43.6 | 19.9 | 45.0 | 20.6 | |
Funaoka et al. [17] | 44.1 | 20.2 | 46.0 | 21.0 | |
Textbook | 62.0 | 37.2 | 62.4 | 36.4 | |
Chauvet [8] | 61.0 | 41.1 | 60.5 | 38.9 | |
Rao et al. [4] | 59.9 | 41.1 | 59.3 | 38.0 | |
Rao and Wu [2] | 64.3 | 37.4 | 69.1 | 39.7 | |
Preston [18] | 63.8 | 38.0 | 64.0 | 37.3 | |
Modified Sitter | 59.4 | 40.4 | 59.0 | 38.3 | |
Funaoka et al. [17] | 59.8 | 42.3 | 59.3 | 39.5 |
Bootstrap Method | |||||
Textbook | 95.33 | 95.07 | 95.40 | 94.73 | |
Chauvet [8] | 94.50 | 94.87 | 94.70 | 94.67 | |
Rao et al. [4] | 95.50 | 96.63 | 95.77 | 96.73 | |
Rao and Wu [2] | 96.10 | 95.33 | 96.00 | 95.47 | |
Preston [18] | 95.37 | 95.07 | 95.30 | 94.77 | |
Modified Sitter | 95.80 | 95.47 | 95.73 | 95.30 | |
Funaoka et al. [17] | 95.27 | 94.97 | 95.33 | 94.73 | |
Textbook | 94.63 | 95.23 | 94.27 | 95.20 | |
Chauvet [8] | 93.90 | 94.67 | 93.17 | 94.57 | |
Rao et al. [4] | 95.10 | 95.97 | 94.70 | 96.03 | |
Rao and Wu [2] | 100.00 | 99.97 | 99.97 | 99.93 | |
Preston [18] | 94.50 | 95.23 | 93.70 | 94.80 | |
Modified Sitter | 95.07 | 95.13 | 94.67 | 94.93 | |
Funaoka et al. [17] | 94.73 | 94.27 | 94.20 | 94.13 |
Bootstrap Method | |||||
Textbook | 21,470.7 | 8725.5 | 23,157.5 | 9721.9 | |
Chauvet [8] | 20,404.4 | 8612.6 | 22,012.1 | 9599.7 | |
Rao et al. [4] | 21,978.5 | 9663.6 | 23,707.2 | 10,782.5 | |
Rao and Wu [2] | 22,218.5 | 8925.4 | 23,857.5 | 9910.3 | |
Preston [18] | 21,450.9 | 8724.4 | 23,134.4 | 9718.9 | |
Modified Sitter | 21,988.4 | 8919.4 | 23,684.7 | 9921.4 | |
Funaoka et al. [17] | 21,435.4 | 8674.5 | 23,119.5 | 9674.2 | |
Textbook | 0.9796 | 0.4081 | 1.3847 | 0.5705 | |
Chauvet [8] | 0.9284 | 0.4071 | 1.2717 | 0.5634 | |
Rao et al. [4] | 0.9798 | 0.4306 | 1.3537 | 0.6117 | |
Rao and Wu [2] | 2.9770 | 1.1343 | 3.0965 | 1.1681 | |
Preston [18] | 0.9736 | 0.4081 | 1.3654 | 0.5701 | |
Modified Sitter | 0.9795 | 0.4175 | 1.3407 | 0.5753 | |
Funaoka et al. [17] | 0.9634 | 0.3972 | 1.3306 | 0.5582 |
Bootstrap Method | |||||
Textbook | −1.26 | −0.36 | −1.98 | −0.05 | |
Chauvet [8] | −13.65 | −5.74 | −11.86 | −3.83 | |
Rao et al. [4] | 1.24 | 9.93 | 2.06 | 18.03 | |
Textbook | 18.37 | 3.92 | 22.58 | 5.14 | |
Chauvet [8] | 0.16 | −3.24 | 2.65 | 2.09 | |
Rao et al. [4] | 16.45 | 15.71 | 17.72 | 21.56 |
Bootstrap Method | |||||
Textbook | 44.9 | 18.9 | 44.6 | 19.0 | |
Chauvet [8] | 46.2 | 21.0 | 46.7 | 21.3 | |
Rao et al. [4] | 46.9 | 22.4 | 46.1 | 21.3 | |
Textbook | 62.0 | 37.7 | 59.3 | 36.1 | |
Chauvet [8] | 61.3 | 40.5 | 61.0 | 40.1 | |
Rao et al. [4] | 60.3 | 41.2 | 58.4 | 37.5 |
Bootstrap Method | |||||
Textbook | 95.27 | 95.50 | 94.73 | 94.83 | |
Chauvet [8] | 93.23 | 94.83 | 93.63 | 94.70 | |
Rao et al. [4] | 95.40 | 96.07 | 95.03 | 96.43 | |
Textbook | 94.53 | 94.63 | 95.70 | 93.63 | |
Chauvet [8] | 93.77 | 93.80 | 93.27 | 93.93 | |
Rao et al. [4] | 95.17 | 94.93 | 95.27 | 95.43 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, S.; Haziza, D.; Mashreghi, Z. A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs. Stats 2022, 5, 521-537. https://doi.org/10.3390/stats5020031
Chen S, Haziza D, Mashreghi Z. A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs. Stats. 2022; 5(2):521-537. https://doi.org/10.3390/stats5020031
Chicago/Turabian StyleChen, Sixia, David Haziza, and Zeinab Mashreghi. 2022. "A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs" Stats 5, no. 2: 521-537. https://doi.org/10.3390/stats5020031
APA StyleChen, S., Haziza, D., & Mashreghi, Z. (2022). A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs. Stats, 5(2), 521-537. https://doi.org/10.3390/stats5020031