A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs

Multi-stage sampling designs are often used in household surveys because a sampling frame of elements may not be available or for cost considerations when data collection involves face-to-face interviews. In this context, variance estimation is a complex task as it relies on the availability of second-order inclusion probabilities at each stage. To cope with this issue, several bootstrap algorithms have been proposed in the literature in the context of a two-stage sampling design. In this paper, we describe some of these algorithms and compare them empirically in terms of bias, stability, and coverage probability.


Introduction
Many surveys conducted by national statistical offices use stratified multi-stage sampling designs for selecting a sample.Reasons for using multi-stage sampling rather than direct element sampling include the lack of element-level sampling frames and cost considerations when data collection involves face-to-face interviews.Stratified multi-stage sampling designs include some form of stratification, selection of primary sampling units (psu), and subsampling within selected psus.This is especially common in social and health surveys.For general multi-stage sampling designs, unbiased variance estimation is a complex task as it relies on the availability of the second-order inclusion probabilities at each stage.If the first-stage sampling fraction is small, a common variance estimation strategy is to pretend that the psus were selected with replacement and use the customary with replacement variance estimator.The resulting estimator is generally conservative in the sense that it may suffer from a small positive bias.
Another approach to variance estimation for survey data is bootstrap variance estimation originally proposed by Efron [1] in the context of independent and identically distributed observations.In a finite population sampling, bootstrap procedures can be classified into two broad groups.In the first, bootstrap samples are selected from the original sample; e.g., [2,3] among others.Rao and Wu [2] applied a scale adjustment directly to the survey data values so as to recover the usual variance formulae.Rao et al. [4] presented a modification of the method of Rao and Wu [2], where the scale adjustment is applied to the survey weights rather than to the data values.The second group of procedures consists of first creating a pseudo-population from the original sample.Bootstrap samples are then selected from the pseudo-population using the same sampling design utilized to select the original samples; see [5][6][7][8], among others.Many of the aforementioned bootstrap procedures may be implemented by randomly generating bootstrap weights so that the first two (or more) design moments of the sampling error are tracked by the corresponding bootstrap moments; see [9,10].These procedures are often referred to as bootstrap weight procedures.For a comprehensive review of bootstrap procedures for survey data, the reader is referred to Mashreghi et al. [11].
The goal of this paper is to empirically compare several existing bootstrap algorithms that have been proposed in the literature for two-stage sampling designs.The bootstrap procedures are compared with respect to bias, stability, and coverage probability of confidence intervals.In Section 2 we present the basic setup and discuss some classical variance estimation procedures for two-stage sampling designs.In Section 3, we present some bootstrap algorithms proposed in the case of simple random sampling without replacement in both stages.Bootstrap algorithms for unequal probability sampling designs are described in Section 4. In Section 5, we present the results from a simulation study.We make some final remarks in Section 6.

The Setup
Consider a finite population U consisting of N primary sampling units (psu), be the total number of elements in the population.We are interested in estimating a population total t y of a survey variable y: where t i = ∑ k ∈ U i y ik denotes the ith psu total, i = 1, …, N, and y ik denotes the y-value for the kth element in the ith psu.To that end, we select a sample according to a two-stage sampling design:

i.
A sample S 1st of psus, of size n, is selected according to a given sampling design P S 1st with first-order inclusion probabilities, π i = P i ∈ S 1st , and with secondorder inclusion probabilities, π ij = P i ∈ S 1st &j ∈ S 1st .Finally, let Δ ij = π ij − π i π j . ii.
In the ith psu sampled at the first stage, i ∈ S 1st , a subsample of the elements, S i , of size m i is selected according to a given sampling design P S i | S 1st with firstorder inclusion probabilities π k | i = P k ∈ S i | i ∈ S 1st and second-order inclusion probabilities . Subsampling in a given psu is carried out independently of subsampling in any other psu.
A design-unbiased estimator of t y is the Horvitz-Thompson estimator given by where t i = ∑ k ∈ S i y ik /π k | i .The estimator (1) can be written as , and S denotes the sample of elements of size ∑ i ∈ S 1st m i .
The design variance of t y , denoted by V p t y , can be unbiasedly estimated by where , where E 1 ⋅ denotes the expectation with respect to the first-stage sampling design, and E 2 ⋅ | S 1st denotes the expectation with respect to the second-stage sampling design conditionally on S 1st .In the case of simple random sampling without replacement at both stages, the estimator (2) reduces to where and For general two-stage sampling designs, the computation of ( 2) is cumbersome as it requires the availability of the second-order inclusion probabilities at each stage.A simplified variance estimator is given by That is, only the first term of ( 2) is kept.The bias of V sim t y , which is always negative, is expected to be small provided that the first-stage sampling fraction, f 1 = n/N, is small; see [12,13].
An alternative simplified variance estimator can be obtained by pretending that the psus are selected with replacement.It is given by where t y, wr = ∑ i ∈ S t i np i , with p i denoting the probability of selection of the ith psu at any given draw.If the first-stage sampling fraction, f 1 , is small, we expect (5) to suffer from a small positive bias.Unlike (4), the estimator does not require the availability of the second-order inclusion probabilities π ij .
So far, we have considered the case of a population total t y .In practice, it may be of interest to estimate more complex parameters such as distribution functions and quantiles.Let θ N be defined as the solution of the following census estimating equation: where u y ik ; θ can be either a smooth (i.e., a function that is differentiable and whose derivatives are continuous) or a non-smooth function of θ.When u ⋅ is smooth, the solution of ( 6) is called a smooth parameter; otherwise, it is called a non-smooth parameter.Common parameters include: (i) the population mean obtained with u y ik ; θ = y ik − θ; (ii) the finite population distribution function obtained with u y ik ; θ = I y ik ≤ t − θ, t ∈ ℝ; (iii) the τ-th population percentile obtained with u y ik ; θ = I y ik ≤ θ − τ.The population mean is an example of a smooth parameter, whereas distribution functions and quantiles are examples of non-smooth parameters.
An estimator θ of θ N can be obtained by solving the following sample estimating equation: The variance of θ may be obtained using a first-order Taylor expansion or by using a resampling method such as balanced repeated replication, jackknife, and bootstrap; see [14] for a discussion of resampling methods.In the remainder of this paper, we confine to bootstrap.

Replacement at Both Stages
In this section, we describe several bootstrap algorithms for a two-stage sampling design with simple random sampling without replacement at both stages.

The Algorithm of Rescaled Bootstrap
Rao and Wu [2] proposed a rescaled bootstrap algorithm for both uni-stage and two-stage sampling designs.Because the rescaling factor is applied to the y-values, this method is applicable to smooth statistics but not to the case of non-smooth statistics such as quantiles.The algorithm can be described as follows: Step 1. Draw a sample of size n psus from S 1st , according to simple random sampling with replacement.
Step 2. From each psu selected in Step 1, select a sample of elements, of size m i according to simple random sampling with replacement.For a psu selected more than once in Step 1, perform independent subsampling.
Step 3. Let y ik * be the y-value of the kth bootstrap element in the ith bootstrap psu and Step 4. Compute θ * using the same formulae that were used to obtain the original point estimator.
Step 6.The bootstrap variance estimator is var* θ * .In practice, the Monte Carlo approximation of var* θ * is applied where Rao and Wu [2] showed that in the case of a population total, the above algorithm matches the standard variance estimator (3).Rao et al. [4] proposed a weighted version of the Rao-Wu method, whereby the rescaling is applied to the sampling weights rather than the y-values; see also [10].The method of Rao et al. [4] is described in Section 4.

The Algorithm of Mirror-Match Bootstrap
Sitter [3] proposed an extension of his mirror-match bootstrap to the case of a two-stage sampling design.In [3], the algorithm assumed that the number of repetetions k 1 and k 2i (see below) are integers.It can be described as follows: Step 1. Choose 1 ≤ n′ < n and draw a sample of size n′ psus from S 1st , according to simple random sampling without replacement.
Step 3. Choose 1 ≤ m i ′ < m i and draw according to simple random sampling without replacement m i ′ units within the ith psu obtained in Steps 1 and 2.
Step 4. Repeat Step 3 Step 5. Compute θ * using the same formulae that were used to obtain the original point estimator.
Step 7. The bootstrap variance estimator is var* θ * .In practice, the Monte Carlo approximation of var* θ * is applied where θ * Sitter [3] showed that in the case of a population total, the above algorithm matches the standard variance estimator (3).If k 1 and k 2i are not integers, a randomization between bracketing integer values is proposed in [15].

The Algorithm of Pseudo-Population Bootstrap
Sitter [16] proposed a pseudo-population bootstrap algorithm, referred to as the withoutreplacement bootstrap (BWO) method, in the case of uni-stage and two-stage sampling designs.We focus on the latter.In [16], the algorithm assumed that the quantities k 1 , n′, k 2i , and m i ′ (see below) are integers.It can be described as follows: Step 1: Create a pseudo-population by replicating each psu in S 1st k 1 times and each unit within the ith psu k 2i times.Let U′ be the resulting pseudo-population consisting , where there exists j ∈ S 1st such ′ be the total number of elements in the pseudopopulation.
Step 2: From the pseudo-population U′, select a sample of psus, S 1st * , of size n′, according to simple random sampling without replacement.In each selected psu, select a sample, S i * , of size m i ′ according to simple random sampling without replacement.
Step 3: Compute θ * using the formulae that were used to obtain the original point estimator.
Step 5: The bootstrap variance estimator is var* θ * .In practice, the Monte Carlo approximation of var* θ * is applied where θ * In the case of the population total (or the population mean), Sitter [16] showed that the bootstrap variance estimator reduces to the standard variance estimator provided that n′ and k 1 satisfy and m i ′ and k 2i satisfy where , and m i ′ are not integers, a randomization between bracketing integer values was proposed in [15].In Appendix A, we show that, if we define k 1 , n′, k 2i , and m i ′ as in ( 8) and ( 9), the bootstrap variance estimator does not reduce to the standard variance estimator in (3).We suggest a modification to ( 8) and ( 9) so that the bootstrap variance estimators reduces to the standard variance estimator (3).In the simulation study (see Section 5), we show that the modified version of Sitter [16] works well in terms of bias and coverage probability of confidence intervals.

The Algorithm of Bernoulli Bootstrap
Funaoka et al. [17] proposed two bootstrap procedures, referred to as Bernoulli bootstrap, for stratified three-stage sampling.Here, we consider the special case of two-stage sampling.Funaoka et al. [17] proposed a short-cut algorithm and a general algorithm.The general algorithm can handle any combination of sample sizes but is more computationally intensive.The general algorithm can be described as follows: Step 1. Draw a sample, S 1 , of size n′ = n − 1, from the original sample of clusters, S 1st , according to simple random sampling with replacement.Generate n Bernoulli random variables, I 1i , with probability For each i ∈ S 1st , keep the ith cluster in the bootstrap sample S* and go to Step 2, if I 1i = 1, and replace the ith cluster with one randomly selected cluster from S 1st , if Step 2. For cluster i kept in Step 1, draw a sample, S 2i , of size m i ′ = m i − 1, from the original sample S 2i according to simple random sampling with replacement.Generate m i Bernoulli random variable, I 2ij , with probability For each ij ∈ S 2i , keep the ij th element in the bootstrap sample, S*, if I 2ij = 1, and replace it with one randomly selected element from S 2i , if I 2ij = 0.
Step 3. Compute θ * using the formulae that were used to obtain the original point estimator.
Step 5.The bootstrap variance estimator is var* θ * .In practice, the Monte Carlo approximation of var* θ * is applied where Funaoka et al. [17] argued that the resulting bootstrap variance estimator is consistent for both smooth and non-smooth parameters.

The Algorithm of Bootstrap Weight Approach
Preston [18] proposed a bootstrap weight approach for stratified three-stage sampling.Here, we focus on the special case of two-stage sampling.The algorithm can be described as follows: Step 1. Draw a sample of size n′ psus from S 1st , according to simple random sampling without replacement.Let δ i = 1 if the ith psu is selected and δ i = 0, otherwise.
Step 2. Define the psu bootstrap weights: where Step 3. Within each of the sample of psus selected in Step 1, draw a simple random sample without replacement, of size m i ′ .Let δ ik = 1 if the kth element in the ith psu is selected and δ ik = 0, otherwise.We define the conditional element bootstrap weights: where Step 4. Compute θ * using the formulae that were used to obtain the original point estimator with the original weights replaced by the bootstrap weights w ik * .
Step 6.The bootstrap variance estimator is var* θ * .In practice, the Monte Carlo approximation of var* θ * is applied where θ * In the case of the population total, Preston [18] showed that the bootstrap variance estimator reduces to the textbook variance estimator (3).He suggested that the choice of n′ = n/2 and m i ′ = m i /2 will be optimal and lead to non-negative bootstrap weights, where ⋅ denotes the integer part.

The Procedure of Bootstrap Weight Approach
Rao et al. [4] proposed a bootstrap weight approach for stratified multi-stage sampling designs.Unlike the method of Rao and Wu [2], it can be applied to estimate the variance of smooth and non-smooth parameters (e.g., quantiles).
Step 1. Select n′ psus according to simple random sampling with replacement from S 1st .
Step 2. Define the bootstrap weight as where n i * denotes the number of times the ith psu is selected in the bootstrap sample.
Step 3. Compute θ * using the formulae that were used to obtain the original point estimator with the original weights replaced by the bootstrap weights w ik * .
Step 5.The bootstrap variance estimator is var* θ * .In practice, the Monte Carlo approximation of var* θ * is applied where θ * The algorithm of Rao et al. [4] leads to consistent variance estimators provided that the first-stage sampling fraction is negligible.The choice 0 < n′ ≤ n − 1 leads to non-negative bootstrap weights.

The Procedure of Pseudo-Population Bootstrap Approach
Chauvet [8] proposed a pseudo-population bootstrap approach (PPB) in the case of unequal two-stage sampling designs.It can be described as follows: Step 1.Each unit k ∈ S i is duplicated π k | i −1 times to create a second-stage pseudopopulation denoted by U i * , i ∈ S 1st , where ⋅ denotes the closet integer.
. Let Chauvet [8] showed that in the case of high entropy sampling design (e.g., [19][20][21][22]) at both stages, the above algorithm leads to a consistent estimator in the context of a population total.In the case of a fixed-size sampling design, Chauvet [8] suggested completing the pseudo-population in Step 2, by applying Poisson sampling design with inclusion probabilities π i −1 − π i −1 .

Simulation Study
We conducted a simulation study to assess the performance of the bootstrap procedures described in Sections 3 and 4 in terms of bias, stability, and coverage rate of confidence intervals based on the t-distribution with n − 1 degrees of freedom.The simulation study was carried out using the R software.We created two finite populations consisting of N = 200 primary sampling units.The cluster sizes M i were generated according to a Poisson distribution with a mean equal to 50.In each population, we generated a survey variable y according to

Author Manuscript
Author Manuscript

Author Manuscript
Author Manuscript where The parameter ρ in (10) was set to 0.1 for Population 1 and 0.3 for Population 2. We were interested in estimating the population total of the y-variable, t y , as well as the finite population median.
From each population, we selected K = 3000 samples according to a two-stage sampling design: i.
At the first stage, we selected n psus according to two sampling designs: simple random sampling without replacement and inclusion probability-proportionalto-size randomized systematic sampling.The value of n was set to n = 10 which corresponds to a first-stage sampling fraction, f 1 = 5%, and n = 40, which corresponds to f 1 = 20%. ii.
At the second stage, m i = 5 elements within each psu selected at the first stage were selected according to simple random sampling without replacement.
In each sample, we computed the estimator t y given by (1).Its variance was estimated using the variance estimation procedures listed in Table 1.Except of the procedure of Chauvet [8], we used B = 500 bootstrap samples for all the other bootstrap procedures.For the procedure of Chauvet [8], we used A P P B = 10 and B P P B = 50 B = A P P B × B P P B .
As a measure of bias of a variance estimator var, we computed its Monte Carlo percent relative bias where E MC var denotes the Monte Carlo expectation of var and V MC t y denotes the Monte Carlo variance estimator of t y .As a measure of stability of a variance estimator var, we computed its Monte Carlo percent coefficient of variation given by where V MC var denotes the Monte Carlo variance estimator of var.Finally, we computed the Monte Carlo coverage rate of t-based confidence intervals and their corresponding Monte Carlo average length.
Tables 2-5 show the results for the bootstrap methods in the case of SRSWOR/SRSWOR.Table 2 shows the Monte Carlo percent relative bias of the bootstrap variance estimators.For the population total, all the procedures led to small biases for f 1 = 5% with the value of absolute relative bias ranging from 1.08% to 8.66%.For the population median, except for the method of Rao and Wu [2], the other procedures led to good results with an absolute relative bias ranging from 0.02% to 16.58%.As expected, the method of Rao and Wu [2] led to substantial bias because it cannot be applied to non-smooth statistics.For f 1 = 20%, the absolute relative bias varied between 2.49% and 10.00% for all bootstrap methods except for the method of Rao et al. [4] who suffered from a significant positive bias.This can be explained by the fact that the sampling fraction was not small.For f 1 = 20% the absolute relative bias varied between 2.60% and 13.20% for all bootstrap methods except for the methods of Rao and Wu [2] and Rao et al. [4].
Table 3 shows the percent CV.All the bootstrap methods led to similar Monte Carlo coefficients of variation (CV).For f 1 = 5%, the CV varied between 41.2% and 46.0% for the population total, and between 59.0% and 64.3% (except for the method of Rao and Wu [2] that led to a CV of 69.1% for ρ = 0.3) for the population median.For f 1 = 20%, the CV varied between 18.9% and 21.0% for the population total, and between 36.4% and 42.3% for the population median.
Tables 4 and 5 show the coverage probability and the average length of 95% confidence intervals based on the t-distribution, respectively.All the bootstrap methods led to good coverage and similar average length except the method of Rao and Wu [2] for the population median.The coverage rate in all cases, except the method of Rao and Wu [2] for the population median, varied between 93.17% and 96.73%.
Tables 6-9 show the results for the bootstrap methods in the case of randomized IPPS systematic/SRSWOR.Table 6 shows the percent relative bias of the bootstrap variance estimators.The method of Chauvet [8] worked generally better than the method of Rao et al. [4] in terms of relative bias, especially in the case of the population median.The percent CVs presented in Table 7 were very similar for both methods.
Tables 8 and 9 respectively show the coverage probability and the average length of the 95 percent confidence intervals based on the t-distribution for both methods.Both bootstrap methods led to good coverage and similar average length.The coverage rate in all cases varied from 93.23% to 96.43%.

Final Remarks
The results of the simulation studies suggest that most bootstrap procedures work well in terms of bias and coverage rate of confidence intervals for estimating smooth or non-smooth parameters.An exception is the method of Rao and Wu [2] for quantiles and the method of Rao et al. [4] for appreciable first-stage sampling fractions.In terms of stability of the variance estimators, there is little difference between the bootstrap procedures.Our results are aligned with those of Saigo [23] who empirically compared four bootstrap procedures in the context of stratified three-stage sampling with simple random sampling without replacement at each stage: the procedure of Funaoka et al. [17], the mirror match bootstrap of Sitter [3], the method of Rao et al. [4], and the BWO method of Sitter [16].
In this paper, we have compared several bootstrap algorithms in the context of a two-stage sampling design.The algorithms were described in an ideal setup.In practice, the design weights π k −1 undergo a weighting process that involves a nonresponse adjustment followed by some form of calibration, whose goal is to ensure consistency between survey estimates and known population quantities.When the first-stage sampling fraction is small, the method of Rao et al. [4] is typically used in surveys conducted by national statistical offices.To account for unit nonresponse and calibration, the bootstrap weights need to undergo the same weighting process (non-response adjustment and calibration) that was used in the original sample were.
Bootstrap may be used to estimate the variance of imputed estimators in the context of imputation for item nonresponse.If the first-stage sampling fraction is small, one can reimpute the missing values in each bootstrap sample using the same imputation procedure that was utilized in the original sample, see [24].The case of non-negligible first-stage sampling fractions requires further research.
We end this paper by pointing out a very recent paper by Beaumont and Émond [25], who proposed a bootstrap weight approach for multi-stage sampling designs.It would be interesting to compare this method to the procedures considered in this paper.
where the subscripts 1* and 2* respectively denote the expectation and the variance with respect to the first-stage and second-stage resampling design in Step 2. Let t i ′ = ∑ k ∈ U y ik be the total of y-values in U i ′ in Step 1.The first and the second component of the bootstrap variance estimator var* θ * in (A1) respectively are and where y ‾i = ∑ i ∈ S i /m i .In [16], it is claimed that the first and the second components of the bootstrap variance estimator respectively are and see Equation (3.5) in Section 3.2 in [16].This is true only when N′ = k 1 n = N, and k 2i m i = M i for all i ∈ S 1st .

(A2)
In other words, we have to define k 1 = N /n and k 2i = M i /m i which is contradictory to the way Sitter [16] defined k 1 , n′, k 2i , and m i ′ using Equations ( 8) and ( 9).
In the following, we suggest how the method of Sitter [16] can be modified.We first define To have the equality between the first and the second component of the bootstrap variance estimator and the first and the second component of the standard variance estimator in (3), respectively, we need to have and Defining k 1 , n′, k 2i , and m i ′ as in (A3), (A4) and (A5), the bootstrap variance estimator reduces to the usual variance estimator in (3).In the simulation study, the method "Modified Sitter" refers to the method of Sitter [16] while applying the modified k 1 , n′, k 2i , and m i ′ defined in (A3), (A4) and (A5).When k 1 , n′, k 2i , or m i ′ is not integer, we simply rounded it to the closest integer value.Chauvet [8] Chauvet [8] Preston [18] Preston [18] IPPSWOR/SRSWOR Textbook variance estimator Linearization variance estimator Rao et al. [4] Rao et al. [4] Chauvet [8] Chauvet [8] Stats (Basel).Author manuscript; available in PMC 2024 May 10.

Table 2.
Monte Carlo percent RB of the bootstrap variance estimators to estimate the variance of the point estimator based on 3000 samples selected according to SRSWOR/SRSWOR.

7 . 1 *
Steps 2-6 are repeated A P P B times to obtain v

Table 1 .
Variance estimation procedures for each population parameter/sampling design.

Table 3 .
Monte Carlo percent CV based on 3000 samples selected according to SRSWOR/SRSWOR.

Table 4 .
Coverage rate (CR) of the 95% t-distribution based confidence intervals constructed using bootstrap standard error estimators based on 3000 samples selected according to SRSWOR/SRSWOR.

Table 5 .
Average length (AL) of the 95% t-distribution based confidence intervals constructed using bootstrap standard error estimators based on 3000 samples selected according to SRSWOR/SRSWOR.

Table 7 .
Monte Carlo percent CV based on 3000 samples selected according to IPPS/SRSWOR.

Table 8 .
Coverage rate (CR) of the 95% t-distribution based confidence intervals constructed using bootstrap standard error estimators based on 3000 samples selected according to IPPS/SRSWOR.