1. Introduction
Data are often drawn from dissimilar environments which render the independent and identically distributed (iid) assumption that underlies many results on the bootstrap suspect.
1 This paper extends results concerning the consistency of the pairs and wild OLS bootstraps, which have mostly been derived for iid data, to general regression frameworks with independently but not-necessarily identically distributed (inid) data. Instead of considering the sampling distribution of the bootstraps, the usual approach, it notes that any permutation of the pairs bootstrap vector of sampling frequencies or the realization of the external variable used by the wild bootstrap to transform residuals is equally likely. Using results on the asymptotic distribution of permutation statistics by
Wald and Wolfowitz (
1944),
Noether (
1949), and
Hoeffding (
1951), these equally likely permutations can be used to characterize the bootstrap distributions conditional on the data as normal given restrictions on sample moments of the data.
White’s (
1980a) conditions for the asymptotic normality of OLS coefficients with clustered/heteroskedastic residuals and inid data guarantee these restrictions almost surely, ensuring that the asymptotic distribution of pairs and wild bootstrapped coefficients and Wald statistics conditional on the data matches the unconditional distribution of the original OLS estimates.
While proofs of bootstrap consistency typically require the existence of at least fourth moments of the regressors with iid data, the permutation distribution allows this paper to prove consistency with no more than second regressor moments and inid data. For iid data,
Mammen (
1993) proved consistency of the wild OLS bootstrap coefficient and homoskedasticity-based Wald test distributions with bounded expectations of the product of the fourth power of the regressors with the squared errors and an additional Lindeberg condition. Similarly, for the pairs OLS bootstrap with iid data
Freedman (
1981) showed that bounded fourth moments of both regressors and errors are sufficient for consistency of the pairs bootstrap coefficient distribution
2 and that of the Wald statistic based upon the (potentially incorrect) assumption of homoskedastic errors.
Stute (
1990) tightened the result for the coefficient distribution alone, showing it is sufficient for the squared regressors and the product of the squared regressors with the squared errors to have finite expectation. This paper proves consistency of both the coefficient and clustered heteroskedasticity robust Wald statistic distribution in a broader inid environment for both the pairs and wild bootstraps with finite expectations of only slightly more than second powers of the regressors and of the product of the second powers of the regressors with the second power of the errors. These are much less demanding assumptions than those used by Freedman and Mammen, requiring only slightly higher moments than used by Stute for the proof of only the pairs bootstrap coefficient distribution in a narrower iid environment. Moreover, when residuals are heteroskedastic or interrelated within clusters, the homoskedasticity-based Wald test is not guaranteed to be asymptotically accurate, as recognized by
Freedman (
1981) and
Mammen (
1993). In such cases, practitioners are likely to prefer clustered/heteroskedasticity robust covariance estimates and Wald statistics as these are asymptotically accurate and pivotal, respectively, ensuring the asymptotic accuracy of the conventional test and higher order accuracy and faster convergence of rejection probabilities to the nominal value in the bootstrap (
Singh, 1981;
Hall, 1992).
For OLS models with inid data, the salient contribution is
Liu (
1988), who showed that the wild bootstrap provides consistent estimates of the second central moment of a linear combination of coefficients in an OLS regression model with bounded regressors, provided the first and second moments of the wild bootstrap external variable are 0 and 1, respectively. Liu’s result regarding the second central moment is easily extended to the case of the multivariate second central moments of coefficients for unbounded inid regressors without any additional restrictions on the moments of the external variable, as shown below. Our interest here, however, is in the full distribution of wild bootstrap coefficient and Wald statistic estimates, where our proof requires the existence of higher moments of the wild bootstrap external variable to ensure the convergence of higher moments to the normal. As the external variable is selected by the practitioner, and not an exogenous characteristic of the data, these additional moment conditions pose no obstacle. The two-point distribution proposed by
Mammen (
1993) and the Rademacher distribution, both often used in practical applications (e.g.,
Davidson & Flachaire, 2008), have moments of all orders.
Liu’s consideration of inid data has largely not been extended, as the OLS bootstrap literature has since focused on time series dependent data, where the absence of random sampling of independent observations raises different statistical issues and the use of different bootstrap methods (see the review in
Hardle et al., 2003).
Djogbenou et al. (
2019), who prove consistency of the wild bootstrap t-statistic distribution for independently distributed cluster groupings of data, are a notable exception. With the moment assumptions used here, plus the additional requirement of bounded slightly higher than fourth moments of the regressors, their proof allows for heterogeneity in the distribution of data across clusters. However, they limit that heterogeneity in requiring that the cross product of the regressors and the covariance matrix of coefficient estimates converge to matrices of constants, a condition that in other papers is typically motivated by an iid assumption.
3 The data-generating process examined in this paper is more fully inid in that there is no restriction that such matrices converge to anything, and the proof requires only slightly higher than second moments of the regressors. In sum, by emphasizing the permutation distribution, this paper lowers typical fourth moment restrictions on regressors to second moments, allows a fully inid data process in which average moments do not converge, and highlights the conceptual similarity between the wild and pairs bootstraps, proving results for both in a unified framework.
The paper proceeds as follows:
Section 2 reviews the OLS model, White’s assumptions and results regarding OLS with inid data, and pairs and wild bootstrap methods for clustered/heteroskedastic data.
Section 3 presents the foundational theorems regarding the asymptotic normality of permutation distributions that motivate the results.
Section 4 then combines these with
White’s (
1980a) result to derive sufficient conditions for pairs and wild OLS bootstrap consistency with inid data and potentially cluster interdependent heteroskedastic residuals.
Section 5 more fully contrasts the assumptions and results herein with those found in the papers cited above.
Section 6 provides Monte Carlo evidence of the consistency of the bootstrap in a challenging environment with an inid data process where average moments do not converge, regressors have barely second moments, and residuals are bounded, varyingly skewed, sometimes bi-modal, and otherwise generally highly non-normal. The Appendix provides proofs of the main theorems, while the on-line Appendix extends the pair results to sub-sampling and provides lengthy technical proofs of otherwise minor lemmas and extensions of the theorems.
2. Framework and Notation
Our interest is in inference for the linear model where, with
i = 1 …
N observations,
where
represents the
N × 1 matrix of observations on the dependent (outcome) variable,
X the
N ×
K matrix of observations of independent variables,
the
K × 1 vector of unobserved parameters of interest, and
the
N × 1 matrix of unobserved disturbances. The ordinary least squares (OLS) estimates
of
minimize the sum of squared estimated residuals
, where
producing the estimates
If the disturbances
εi are homoskedastic with common variance
, one can use the homoskedastic variance estimate of
,
, but we focus on more general inference where the
εi are heteroskedastic and possibly interdependent within
C ≤ N “cluster” groupings of observations, using the clustered/heteroskedasticity robust covariance estimate
where we use the subscript
to denote the rows of matrices and vectors associated with the observations in cluster grouping
. As will be seen later, we assume that the regressors and disturbances
are independent across cluster groupings. When observations themselves are independent, each grouping
equals an individual observation
i,
C =
N, and (3) is
White’s (
1980a) heteroskedasticity robust covariance estimate. The clustered extension, however, is often used to allow for unspecified grouped dependence, and so we present the results within a more general framework. In describing limits, we use the subscript
C, as in
and
, to emphasize that the estimated coefficients and covariance estimates are functions of
C realized observation groupings.
White (
1980a) provided conditions for valid OLS inference when the row vector of random variables associated with each observation is independently but not necessarily identically distributed (inid). With
denoting the
jth column of
, we extend these to allow for grouped dependence:
Theorem I (extending White, 1980a). If there exist strictly positive finite constants γ, Δ, and η, such that
(Ia) is a sequence of independent but not necessarily identically distributed random matrices, such that ;
(Ib) For all for all j, k = 1 … K, and for all C sufficiently large is non-singular with determinant (MC) > η;
(Ic) For all i, for all j, k = 1 … K, and for all C sufficiently large is non-singular with determinant (VC) > η;
then
- (i)
;
- (ii)
;
- (iii)
- (iv)
;
- (v)
;
where
and
denote convergence almost surely and in distribution across (X,ε), respectively, A½ the “square root” of symmetric positive definite matrix A,4
nK the K dimensional standard normal,
the central chi-squared with K degrees of freedom, and
0K and
0KxK vectors and matrices of zeros of the indicated dimensions. Remark 1. White’s covariance estimate often motivates inference with heteroskedasticity or clustering in an otherwise iid setting where each observation or cluster grouping is a draw from a fixed distribution. However,
allows for asymptotically accurate inference in the much more general inid setting given the above, where MC, VC, and
do not necessarily converge to matrices of constants, as illustrated in Monte Carlos further below.
Remark 2. White (1980a) used (Ia)–(Ic) to prove (i), (ii), and parts of (iii) and added the assumption to prove (iv), (v), and other results. As reviewed below, a similar fourth moment condition on the regressors is also used in prior proofs of bootstrap consistency. However, (Ia)–(Ic), with only slightly higher than second regressor moments, suffice to prove (i)–(v) and ensure bootstrap consistency, as shown in proofs and Monte Carlos below. In this paper, we examine two bootstrap techniques commonly used for OLS inference with heteroskedastic and clustered disturbances and prove the asymptotic consistency of their distributions for general inid data.
Wu’s (
1986) external bootstrap, now commonly known as the wild bootstrap, holds the design matrix
constant and generates new realizations of the outcome vector
by multiplying the estimated residuals of each cluster grouping by an independently and identically distributed external random variable
, so that the dependent variable for grouping
is now given by
. Selecting
so as to minimize the sum of squared residuals for this new data yields coefficient and covariance estimates expressed in terms of the original data and its estimates as
Repeated draws of the
Cx1 vector
δw of iid variables are made and the resulting distribution of coefficients
and Wald statistics
used to evaluate the statistical significance of corresponding measures for tests of the null hypothesis
in the original sample, i.e.,
and
. All permutations of any given realization of
are equally likely, a fact that plays a prominent role in the results of this paper.
The pairs bootstrap samples with replacement
C cluster groupings of “pairs” of dependent and independent variables
) from the rows of the original data
, producing a new data set composed of
h = 1 …
C cluster groups of observations
), with each
h corresponding to one of the original
groupings.
5 Selecting
so as to minimize the sum of squared residuals for this new data, the resulting coefficient and covariance estimates can be expressed in terms of the original data, its estimates, and its indices
= 1 …
C as
where
denotes the number of times (possibly 0) cluster grouping
was drawn. Repeated bootstrap samples are made and the resulting distribution of coefficients
and Wald statistics
once again used to evaluate the statistical significance of corresponding measures for tests of the null hypothesis
in the original sample. As in the case of the wild bootstrap, all permutations of any given realization of the
Cx1 sampling frequency vector
are equally likely. Consequently, we use the common notation
δ, distinguished by superscripted
p or
w, for seemingly dissimilar objects because these operate identically in the theorems and proofs below.
Our interest is in deriving sufficient conditions for the conditional consistency of the bootstrap distributions in an inid framework. Specifically, we show that
White’s (
1980a) assumptions are sufficient to ensure that for the bootstrapped coefficient and clustered/heteroskedasticity robust covariance estimates, with
b (both) denoting
p (pairs) or
w (wild),
where
denotes convergence in distribution across
δ almost surely across realizations of (
X,
ε). These results show that the asymptotic conditional distribution given the data (
X,
ε) of the bootstrap equals the asymptotic distribution of the OLS estimates across (
X,
ε), allowing for valid inference using the percentiles of bootstrapped coefficient estimates or Wald statistics.
6The key characteristic exploited in the proofs below is that any of the row permutations of the vectors
δ are equally likely. Consequently, the distribution of the bootstraps can be thought of as the distribution across permutations of
δ integrated across the ordered realizations of
δ. Permutation theorems characterize this permutation distribution as asymptotically normal with covariance matrix
provided (
X,
ε) and
δ have certain moment properties.
White’s (
1980a) assumptions ensure that these properties hold almost surely for (
X,
ε), while the properties of the multinomial sampling frequencies
δp and moment assumptions on the iid elements of
δw ensure the requisite conditions on
δ also hold almost surely. Consequently, almost surely conditional on the data (
X,
ε), the distributions of the bootstraps across the draws
δ that determine their coefficient estimates and Wald statistics converge to the distribution of their OLS counterparts for the original sample (
X,
ε) across its data-generating process.
3. Foundational Permutation Theorems
The proofs in this paper rely on a theorem first proven by
Wald and Wolfowitz (
1944) and later refined by
Noether (
1949) and
Hoeffding (
1951) concerning the asymptotic distribution of root-
C times the correlation of a permuted sequence with another sequence:
Theorem II: Let z’ = (z1, …, zC) and δ’ = (δ1, …, δC) denote sequences of real numbers, not all equal, and d’ = (d1, …, dC) denote any of the C! equally likely permutations of δ. Then, as C → ∞ the distribution across the realizations of d of the random variableconverges to that of the standard normal if for all integer τ > 2
The proof is based on showing that the moments of
vC converge to those of the standard normal. A simple multivariate extension, proven in the on-line Appendix, is
Theorem IIm: Let denote the centering matrix,7 a sequence of K x 1 vectors such that is positive definite, δ’ = (δ1, …, δC) a sequence of real numbers not all equal, and d’ = (d1, …, dC) any of the C! equally likely permutations of δ. Then, as C → ∞ the distribution across the realizations of d of the random variableconverges to that of the multivariate iid standard normal if (IIb) holds for each element in the vector sequence .
Theorem II is easily extended to a probabilistic environment by noting the following result due to
Ghosh (
1950) that translates the almost-sure or in probability characteristics of an infinite number of moment conditions into similar statements regarding a distribution:
Theorem III:
If all the moments of the cumulative distribution function FC(x) converge almost surely (in probability) to those of F(x), which possesses a density function, and for which, with denoting the absolute moment of order k + 1,then FC(x) converges almost surely (in probability) to F(x). Condition (IIIa) is of course true for the normal distribution.
Hoeffding (
1952) generalized the result by showing that condition (IIIa) is not even needed for convergence in probability at all points of continuity of any
F(
x) that is uniquely determined by its moments. By virtue of the Cramér–Wold device, Theorem III covers the multivariate case given in (IIc) above, as for all
, such that
, all moments of
converge to those of the standard normal. In light of Theorem III, in applying Theorem II below, we use the notation
, i.e., almost surely across the realizations of (
δ,
X,
ε), the distribution of
across permutations
d of
δ converges to the multivariate standard normal. Theorems II and III are used to characterize the asymptotic distribution of
, which appears in the expressions for the bootstrapped coefficient estimates in (4) and (5) above.
A less demanding form of Theorem II, proven in
Appendix B below, provides a weaker condition under which the mean of products converges in probability across permutations to the product of means:
Theorem IV: Let z’ = (z1, …, zC) and δ’ = (δ1, …, δC) denote sequences of real numbers, possibly all equal, and d’ = (d1, …, dC) any of the C! equally likely permutations of δ. Then, as C → ∞, across permutations d of δ,if
Theorem IV is used in proofs to make statements regarding the convergence in probability of terms such as
,
, and
which appear in (4) and (5) above. As the satisfaction of (IVb) depends on the realized sample moments of (
X,
ε) and
δ, we use the notation
, i.e., almost surely across the realizations of (
δ,
X,
ε)
converges in probability across the permutations
d of
δ to
.
5. Comparison of Bootstrap Consistency Results
This section contrasts the assumptions and results of this paper with other papers on bootstrap consistency. These usually assume independent observations, with moment conditions given at that level. To simplify the comparison of moment conditions, where possible I use the
i = 1 …
N notation, taking each cluster
as composed of one observation and using the implied observational level assumptions in the theorems given above.
Table 1 below summarizes key elements of the discussion that follows.
For an OLS model with iid data and potentially heteroskedastic residuals,
Mammen (
1993) showed that for a fixed number of regressors the wild bootstrap distributions of linear combinations of the coefficients and Wald statistics based upon the homoskedastic covariance estimate are in probability consistent, given
< ∞ and the Lindeberg type condition
for every fixed
γ > 0. For the same model,
Freedman (
1981) proved almost-sure consistency of pairs bootstrap coefficients and homoskedastic-based Wald tests if the row vectors
are independently and identically distributed and
< ∞.
Stute (
1990) tightened part of the result, showing that almost-sure convergence of the pairs bootstrap coefficients alone for iid data only requires
and
to be finite. By adopting a permutation approach, this paper proves almost-sure consistency of both coefficients and Wald statistics based upon the clustered/heteroskedasticity robust covariance estimate with inid observations for both the pairs and wild bootstrap with the existence of only slightly higher moments than required by
Stute (
1990), i.e.,
and
for some
γ > 0. It should be noted, however, that Mammen’s result was part of a broader framework that allowed for a growing number of regressors.
For inid data,
Liu (
1988) proved consistency in probability of the second central moment of the wild OLS bootstrap coefficient distribution with bounded regressors (with all moments) and finite second moments of
εi. This paper proves almost-sure consistency of the wild bootstrap distribution for inid data with unbounded regressors using the additional moment conditions described above.
Djogbenou et al. (
2019) prove consistency in probability of the distribution of the wild bootstrap t-statistic for within-cluster correlated but cross-cluster independent but not identically distributed data. Their assumptions on the existence of moments are those used in this paper, plus the addition of the fourth moment restriction
for some
γ > 0. They also impose asymptotic homogeneity of the data-generating process in the form of assuming that
converges to a matrix of constants, while for any vector
α, such that
, there exists a finite scalar
vα > 0 and non-random sequence
μN → ∞, such that
. Thus, while papers usually use the iid assumption to motivate the convergence of key matrices to matrices of constants,
Djogbenou et al. (
2019) avoid the iid assumption but assume that the data nevertheless converge to such matrices. This paper, using clustered versions of
White’s (
1980a) assumptions, requires no such convergence of the asymptotic regressor cross product and covariance matrix of coefficient estimates and as such covers more fundamentally inid data without the addition of the fourth moment condition
.
This paper makes no explicit assumptions regarding maximum cluster size, but in practice the assumption that the expectation of vector products of the regressors are uniformly bounded for all
, i.e.,
, implies that either the maximum cluster size is bounded or, as seems less likely, the expectation of individual observations shrinks with cluster size. In contrast,
Djogbenou et al.’s (
2019) proof of consistency allows the maximum cluster size to increase with the sample size in an unbounded fashion at a rate determined by the form of dependency (albeit unknown) within clusters. All proofs of consistency necessarily require that asymptotically individual observations or clusters exert a negligible influence on coefficient and variance estimates, although ironically it is often the strong influence of outlier observations or groupings in finite samples that makes conventional tests less accurate relative to the bootstrap (c.f.
Davidson and Flachaire, 2008;
Young, 2019, and the simulations below).
- 2.
Type of consistency proven
Aside from consistency of the coefficient distribution,
Freedman (
1981) and
Mammen (
1993) prove consistency of the Wald statistic for the pairs and wild bootstrap, respectively, based upon the covariance estimate with homoskedastic errors.
Djogbenou et al. (
2019) prove consistency of the Wald statistic using the cluster/heteroskedasticity robust covariance estimate, which is also asymptotically accurate with homoskedastic errors. This test statistic is asymptotically pivotal and hence provides higher-order asymptotic bootstrap accuracy (
Singh, 1981;
Hall, 1992). This paper does the same for both the pairs and wild bootstrap using weaker moment conditions and a unified permutation framework that highlights a similarity between the two methods.
Freedman and Stute allowed for sub-sampling M < N observations in the pairs bootstrap and proved convergence in distribution if M and N both go to infinity. As shown in the on-line Appendix, at the expense of complicating the proofs, the permutation-based pairs bootstrap consistency results can be extended to sub-sampling, with and without replacement, if M/N → 0 and for some γ* > (1 + γ)−1, M is such that liminf > 0. The requirement that M not fall too rapidly relative to N is needed to ensure the existence and convergence of higher moments to the normal, as the proof of Theorem II is based upon the method of moments.
Liu (
1988) proves consistency of the wild bootstrap second central moment with bounded regressors. Proving such consistency with the unbounded regressors of this paper is trivial. If we assume, as did
Liu (
1988), that
and
(the identity matrix), then taking the expectation with respect to this variable for a given realization of
X and
ε, we have
where we make use of the fact that
as the OLS estimates
in (2) above minimize
Thus, for any sample size the variance of wild bootstrap coefficient estimates equals White’s clustered/heteroskedasticity robust covariance estimate for the sample. Since under White’s conditions given in Theorem I,
is a consistent estimator of the asymptotic variance of
, it follows that for such general inid data the wild bootstrap coefficient variance is a consistent estimator as well, reproducing Liu’s result in a more general framework.
A similar result for the pairs bootstrap is more problematic. The first two moments of the multinomial sampling frequencies (
) for
C draws with replacement from
C cluster groups are
(a vector of ones) and
Examining the moments of the latter half of
, we see:
Were
multiplied by
, this would prove consistency of the second central moment of pairs bootstrap coefficients, but unfortunately it is multiplied by
. However, it is easy to show that
converges in probability to
(see
Appendix C below). Using this fact,
Shao and Tu (
1995) prove consistency of the second central moment using the artifice of assuming that when the minimum eigenvalue of
is less than ½ of the minimum eigenvalue of
, an event whose probability converges to zero,
is set equal to
.
It is well known that convergence in distribution does not imply convergence of moments, but the fact that the proof of Theorem II regarding the asymptotic permutation distribution of root-
C correlation coefficients is based upon the method of moments (see
Hoeffding, 1951 and the on-line Appendix of this paper) might lead to the erroneous conclusion that the results here imply consistency of all moments. They do not, as already implied by the discussion of the second moment of the pairs bootstrap. In
Appendix C below, Theorem II is used to prove that across the equally likely permutations
d of a given
δb, for
b (both) =
p (pairs) or
w (wild)
signifying, by the method of proof, that the moments across permutations
d of
δ of the left-hand side converge to those of the multivariate standard normal. Since this is true for all
δ, such that
, which almost surely holds (see (L2) in
Appendix C), we can equally say that across the distribution of
δ, the moments of (11) converge to those of the multivariate standard normal. For the wild bootstrap
consists of (11) multiplied by
, and as
1, we can say that all the moments of
converge to those of the multivariate normal with covariance matrix
, although these need not be the asymptotic moments of the sample coefficients
. In the case of the pairs bootstrap,
equals (11) multiplied by
, and as both
and
are only shown to converge in probability, nothing can be said about the asymptotic moments of
without the use of an artifice such as that of
Shao and Tu (
1995) mentioned above.
- 3.
Assumptions on the wild bootstrap external variable
Liu (
1988) proves the consistency of the second central moment of the wild bootstrap coefficients, assuming that the first and second moments of the wild bootstrap external variable
are 0 and 1, respectively.
9 This paper extends the proof to consistency in distribution by additionally requiring that
for
θ1 > 1/
γ, where
γ > 0 is such that
and
. As the proof of Theorem II is based on the method of moments, depending upon the existence of higher moments for the regressors higher moments on
are needed to ensure that all moments of (11) above exist and converge to the normal. Proofs of the consistency of wild bootstrap distributions typically assume that the external variable
comes from a particular distribution, such as the Rademacher, with moments of all orders (e.g.,
Mammen, 1993;
Canay et al., 2021). A notable exception is
Djogbenou et al. (
2019), where the proof of convergence in distribution merely requires that
for some
λ > 0. The wild bootstrap external variable, however, is under the control of the practitioner (i.e., not a characteristic of the given data) and, at this time, there appear to be no known advantages to using an external variable without higher moments.
6. Monte Carlo Illustration with INID Data
To illustrate the properties and consistency of the bootstraps with fully inid data, I use a data-generating process that departs strongly from the independently and identically distributed ideal. To ensure average moments do not even begin to converge in finite samples, I model underlying distributional parameters as following a random walk across the data. To stress test the theorems above, I use regressors with heavy-tailed distributions that barely satisfy the specified moment conditions. Finally, to further hinder convergence to the normal, I choose an error distribution that departs strongly from the shape of that ideal.
To begin with unclustered data, for
i = 1 …
N independent observations, I assume that:
where
denotes an independent draw from the Beta distribution with parameters
a and
b (and expectation
a/(
a +
b)),
t(
v) an independent draw from the t-distribution with
v degrees of freedom, and
an independent draw from the uniform distribution on [−0.5, 0.5]. The random walks
a and
b (separate for
ε and
x) with their expanding variances ensure that the moments of the data do not meaningfully converge in simulation.
10 These random walks can be thought of as an underlying population characteristic that develops, say, geographically or intertemporally. Observations, however, are drawn independently from these otherwise related distributions. Beta random variables are bounded on [0, 1] and, depending upon
a and
b, can be heavily skewed toward 0 or 1, symmetric unimodally around 0.5, or bimodally concentrated at both 0 and 1, to name just a few possibilities. This departs strongly from the symmetric unimodal normal distribution on the real line. Random variables drawn from a t-distribution only have finite moments up to their degrees of freedom. Thus, the regressors
xi only have moments between 2.01 and 3.01, approaching the limits of the assumptions in Theorem I.
For a clustered data-generating process, with dependence within clusters, I generate
clusters with cluster effects that follow (12) above (substituting
for
i everywhere in those equations), and observation level data:
where
denotes the cluster to which observation
i belongs. Thus, each regressor and disturbance observation within a cluster is composed of a common cluster effect plus a similarly, but independently, distributed observation effect. The estimated regression model is:
Table 2 reports Monte Carlo results for tests of the true null of
β = 0 in the OLS regression (14) for the data-generating processes described in (12) and (13) with 10, 100, 1000, 10,000, 100,000, and 1,000,000 observations or clusters (and five observations per cluster). For the conventional test, I report the
p-value of the two-sided test using the squared sample t-statistic based upon the clustered/heteroskedasticity robust covariance estimate evaluated using its asymptotic chi-squared distribution. For the bootstraps, I report
p-values based upon the bootstrap-c, evaluating the squared coefficient deviation from the null using the percentiles of the squared coefficient deviations of the bootstraps from the mean of their data-generating processes, and the bootstrap-t, evaluating the sample squared t-statistic using the corresponding squared bootstrap test statistics, i.e.,
99 draws are used for each bootstrap, and an exact test relative to the bootstrap distribution is achieved using a
p-value given by (
G + (
T + 1)
U[0,1])/100, where
G and
T are the number of bootstrap test statistics greater than and equal to, respectively, that of the sample and
U[0,1] is a draw from the uniform distribution on [0.1].
11 For the wild bootstrap,
is drawn from the Rademacher distribution, which equals
1 with equal probability and appears to perform better than alternatives (
Davidson & Flachaire, 2008). 1000 realizations of the data-generating process are used for each specification.
As can be seen in panel (A) of the table, rejection rates using both the conventional chi-squared distribution and those of the bootstraps differ substantially from nominal value in small samples, but converge to the 0.01, 0.05, and 0.10 levels as the number of observations or clusters increases. The central 95 percentiles of the binomially distributed Monte Carlo rejection probability in 1000 independent draws are 0.004 to 0.016 at the 0.01 level, 0.037 to 0.063 at the 0.05 level, and 0.082 to 0.118 at the 0.1 level. With 1,000,000 independent observations or clusters, most rejection rates lie within those bounds. As shown in panel (B) of the table, the Kolmogorov–Smirnov test statistic for the null that the p-value distributions are uniform, i.e., the maximum absolute difference between the cumulative distribution function of each set of 1000 p-values from that of the uniform distribution, is less than or equal to 0.028 in all cases, with the p-value of the null that the distributions are uniform on [0,1] exceeding 0.41 in each instance.
Theorem V asserts conditional consistency, i.e., the asymptotic distribution of the bootstraps is normal with a covariance matrix equal to that of the conventional estimate. If so, evaluating the conventional test statistic with the full distribution of each bootstrap should asymptotically yield a
p-value identical to that found by evaluating the same using the chi-squared distribution.
12 Panel (C) of
Table 2 reports the correlation between the bootstrap and conventional
p-values with 1,000,000 observations or clusters. As can be seen, this is at least 0.9889 in all cases. As the bootstrap
p-values are based upon only a sample from their distribution, while exact relative to that distribution, they cannot be expected to equal the conventional chi-squared
p-value. Their correlation with the conventional
p-value, however, should be the same as that found when evaluating the conventional test statistic using 99 independent draws from the chi-squared distribution and the same formula
p = (
G + (
T + 1)
U[0,1])/100.
Panel (C) of
Table 2 reports that the probability that a correlation less than or equal to that found between the bootstraps and the conventional
p-value would be found when evaluating a squared t-statistic using 99 draws from the chi-squared vs. the full chi-squared distribution itself. “
p-value1” evaluates the correlation using the distribution conditional on the realized conventional squared t-statistics, i.e., is a test of conditional consistency alone and does not assume consistency of the conventional test. While most
p-values are very large, two of the eight are near zero, indicating that the correlation is not yet quite up to the level that would be expected from completed conditional convergence.
13 “
p-value2” evaluates the correlations using the distribution across random draws of the initial conventional test statistics from the chi-squared, i.e., a joint test of convergence of the conventional test statistic and conditional consistency of the bootstraps. Here the smallest
p-value is 0.045, which, given any adjustment for multiple testing, can be taken as indicating that the tests do not reject the joint null implied by the theorems above at the 0.05 level.
Table 2 illustrates the consistency of bootstrap procedures with a highly challenging data-generating process. While previous results cited above require the existence of at least fourth moments of the regressors for convergence of both coefficients and Wald statistics in environments with iid data or inid data with convergent average moments, no more than slightly higher than second regressor moments are actually sufficient for fully inid data whose moments need not converge to anything, as specified in the theorems above and illustrated in these Monte Carlos.
As can also be seen in
Table 2, in small samples with heavy-tailed regressors the conventional clustered/heteroskedasticity robust test statistic performs poorly, with rejection probabilities far above the nominal value. In such environments, the bootstraps often perform better (
Davidson & Flachaire, 2008;
Young, 2019). In the simulations of
Table 2, this is clearly the case for the pairs bootstrap and, to a lesser degree, with the wild bootstrap using the asymptotically pivotal t-statistic.
14 Thus, while providing the same asymptotic assurances as conventional inference methods, the bootstraps often provide a better approximation to the distribution of test statistics in small finite sample environments. It should be noted, however, that other methods also exist for improving the finite sample performance of the conventional test, such as the HC bias corrections of
MacKinnon and White (
1985) and the effective degrees of freedom corrections of
Bell and McCaffrey (
2002),
Pustejovsky and Tipton (
2018), and
Young (
2016).
15