1. Introduction
When the bootstrap sample size B is small or moderate, bootstrap confidence regions, including bootstrap confidence intervals, tend to have undercoverage: the probability that the confidence region contains the  parameter vector  is less than the nominal large-sample coverage probability . Then, coverage can be increased by increasing the nominal coverage of the large-sample bootstrap confidence region. For example, if the undercoverage of the nominal large-sample 95% bootstrap confidence region with  is 2%, the coverage is increased to 97%. This procedure is known as calibrating the confidence region. Calibration tends to be difficult since the amount of undercoverage is usually unknown. This paper provides a simple method for improving the coverage and provides a method for visualizing some bootstrap confidence regions.
Using correction factors for large-sample confidence intervals, tests, prediction intervals, prediction regions, and confidence regions improves the coverage performance for a moderate sample size n. If confidence regions are used for hypothesis testing, then this calibration reduces the type I error. For a random variable X, let  Note that correction factors  as  are used in large-sample confidence intervals and large-sample tests if the limiting distribution is  or , but a  or  cutoff is used:  with  and  with  if  as . For moderate n, the test or confidence interval with the correction factor  has better level or coverage than the test or confidence interval that does not use the correction factor, in that the simulated level or coverage is closer to the nominal level or coverage.
Sometimes, the test statistic has a 
 or 
 distribution under normality, but the test statistic (possibly scaled by multiplying by 
k) is asymptotically normal or asymptotically 
 for a large class of distributions. The 
t test and 
t confidence interval for the sample mean are examples where the asymptotic normality holds by the central limit theorem. Many 
F tests for linear models, experimental design models, and multivariate analyses also satisfy 
 as 
, where 
 is the test statistic. See, for example, Olive (2017) [
1].
Section 1.1 reviews prediction intervals, prediction regions, confidence intervals, and confidence regions. Several of these methods use correction factors to improve the coverage, and several bootstrap confidence intervals and regions are obtained by applying prediction intervals and regions to the bootstrap sample. 
Section 1.2 reviews a bootstrap theorem and shows that some bootstrap confidence regions are asymptotically equivalent.
 Section 2.1 gives a new bootstrap confidence region with a simple correction factor, while 
Section 2.2 shows how to visualize some bootstrap confidence regions. 
Section 3 presents some simulation results.
   1.1.  Prediction
Regions and Confidence Regions
Consider predicting a future test value  given past training data , where  are independent and identically distributed (iid). A large-sample  prediction interval (PI) for  is , where the coverage  is eventually bounded below by  as . We often want  as . A large-sample  PI is asymptotically optimal if it has the shortest asymptotic length: the length of  converges to  as , where  is the population shorth, the shortest interval covering at least  of the mass.
Let the data 
 have joint probability density function or probability mass function 
 with parameter space 
 and support 
. Let 
 and 
 be statistics such that 
 Then, the interval 
 is a large-sample 
 confidence interval (CI) for 
 if
        
        is eventually bounded below by 
 for all 
 as the sample size 
Consider predicting a  future test value , given past training data , where  are iid. A large-sample  prediction region is a set  such that  is eventually bounded below by  as . A prediction region is asymptotically optimal if its volume converges in probability to the volume of the minimum volume covering region or the highest density region of the distribution of 
A large-sample  confidence region for a  vector of parameters  is a set  such that  is eventually bounded below by  as  For testing  versus , we fail to reject  if  is in the confidence region and reject  if  is not in the confidence region.
For prediction intervals, let 
 be the order statistics of the training data. Open intervals need more regularity conditions than closed intervals. For the following prediction interval, if the open interval 
 was used, we would need to add the regularity condition that the population percentiles 
 and 
 are continuity points of the cumulative distribution function 
 See Frey (2013) [
2] for references.
Let 
 and 
, where 
. A large-sample 
 percentile prediction interval for 
 is
        
The bootstrap percentile confidence interval given by Equation (
2) is obtained by applying the percentile prediction interval (1) to the bootstrap sample 
, where 
 is a test statistic. See Efron (1982) [
3].
A large-sample 
 bootstrap percentile confidence interval for 
 is an interval 
 containing 
 of the 
. Let 
 and 
. A common choice is
        
The shorth (
c) estimator of the population shorth is useful for making asymptotically optimal prediction intervals. For a large-sample 
 PI, the nominal coverage is 
. Undercoverage occurs if the actual coverage is below the nominal coverage. For example, if the actual coverage is 0.93 for a large-sample 95% PI, then the undercoverage is 0.02. Consider intervals that contain 
c cases 
. Compute 
. Then, the estimator shorth (
c) 
 is the interval with the shortest length. The shorth (
c) interval is a large-sample 
 PI if 
 as 
 that often has the asymptotically shortest length. Let 
. Frey (2013) [
2] showed that for large 
 and iid data, the large-sample 
 shorth (
) prediction interval has maximum undercoverage ≈
, and then used the large-sample 
 PI shorth (
c) =
        
The shorth confidence interval is a practical implementation of Hall’s (1988) [
4] shortest bootstrap percentile interval based on all possible bootstrap samples, and is obtained by applying shorth PI (3) to the bootstrap sample 
 See Pelawa Watagoda and Olive (2021) [
5]. The large-sample 
 shorth (
c) CI =
        
To describe Olive’s (2013) [
6] nonparametric prediction region, Mahalanobis distances will be useful. Let the 
 column vector 
 be a multivariate location estimator, and let the 
 symmetric positive definite matrix 
 be a dispersion estimator. Then, the 
ith squared sample Mahalanobis distance is the scalar
        
        for each observation 
 where 
. Notice that the Euclidean distance of 
 from the estimate of center 
T is 
, where 
 is the 
 identity matrix. The classical Mahalanobis distance 
 uses 
, the sample mean, and sample covariance matrix, where
        
Let the 
 location vector be 
, which is often the population mean, and let the 
 dispersion matrix be 
, which is often the population covariance matrix. If 
x is a random vector, then the population squared Mahalanobis distance is
        
Like prediction intervals, prediction regions often need correction factors. For iid data from a distribution with a 
 nonsingular covariance matrix, it was found that the simulated maximum undercoverage of prediction region (9) without the correction factor was about 0.05 when 
. Hence, correction factor (8) is used to obtain better coverage for small 
n. Let 
 for 
 and
        
If 
 and 
, set 
. Let 
 be the 
th sample quantile of the 
, where 
. Olive (2013) [
6] suggests that 
 may be needed for the following prediction region to have a good volume, and 
 for good coverage. Of course, for any 
n, there are distributions that will have severe undercoverage.
The large-sample 
 nonparametric prediction region for a future value 
 given iid data 
 is
        
Olive’s (2017, 2018) [
1,
7] prediction region method confidence region applies prediction region (9) to the bootstrap sample. Let the bootstrap sample be 
. Let 
 and 
 be the sample mean and sample covariance matrix of the bootstrap sample.
The large-sample 
 prediction region method confidence region for 
 is
        
        where the cutoff 
 is the 
th sample quantile of the 
 for 
. Note that the corresponding test for 
 rejects 
 if 
.
Olive’s (2017, 2018) [
1,
7] large-sample 
 modification of Bickel and Ren’s (2001) [
8] confidence region is
        
        where the cutoff 
 is the 
th sample quantile of the 
 Note that the corresponding test for 
 rejects 
 if 
.
Shift region (9) to have center 
, or equivalently, to change the cutoff of region (11) to 
 to obtain Pelawa Watagoda and Olive’s (2021) [
5] large-sample 
 hybrid confidence region,
        
Note that the corresponding test for  rejects  if .
Rajapaksha and Olive (2024) [
9] gave the following two confidence regions. The names of these confidence regions were chosen since they are similar to Bickel and Ren’s and the prediction region method’s confidence regions.
The large-sample 
 BR confidence region is
        
        where the cutoff 
 is the 
th sample quantile of the 
. Note that the corresponding test for 
 rejects 
 if 
.
The large-sample 
 PR confidence region for 
 is
        
        where 
 is the 
th sample quantile of the 
 for 
. Note that the corresponding test for 
 rejects 
 if 
.
Assume that 
 are iid 
. Then, Chew’s (1966) [
10] large-sample 
 classical prediction region for multivariate normal data is
        
The next bootstrap confidence region is similar to what would be obtained if the classical prediction region (15) for multivariate normal data was applied to the bootstrap sample. The large-sample 
 standard bootstrap confidence region for 
 is
        
        where 
 or 
, where 
 as 
.
If 
, then a hyperellipsoid is an interval, and confidence intervals are special cases of confidence regions. Suppose the parameter of interest is 
, and there is a bootstrap sample 
 where the statistic 
 is an estimator of 
 based on a sample of size 
n. Let 
 and let 
 Let 
 and 
 be the sample mean and variance of 
. Then, the squared Mahalanobis distance 
 is equivalent to 
, which is an interval centered at 
 just long enough to cover 
 of the 
. Efron (2014) [
11] used a similar large-sample 
 confidence interval assuming that 
 is asymptotically normal. Then, the large-sample 
 PR CI is 
 The large-sample 
 BR CI is 
, which is an interval centered at 
 just long enough to cover 
 of the 
. The large-sample 
 hybrid CI is 
.
The following prediction region will be used to develop a new correction factor for bootstrap confidence regions. See 
Section 2.1. Data splitting divides the training data 
 into two sets: 
H and the validation set 
V, where 
H has 
 of the cases and 
V has the remaining 
 cases 
.
The estimator 
 is computed using data set 
H. Then, the squared validation distances 
 are computed for the 
 cases in the validation set 
V. Let 
 be the 
th order statistic of the 
, where
        
Haile, Zhang, and Olive’s (2024) [
12] large-sample 
 data splitting prediction region for 
 is
        
  1.2. Some Confidence Region Theories
Some large-sample theories for bootstrap confidence regions are given in the references in 
Section 1.1. The following theorem of Pelawa Watagoda and Olive (2021) [
5] and its proof are useful.
Theorem 1.  (a) Suppose as , (i) , and (ii)  with  and . Then, (iii) , (iv) , and (v) .
(b) Then, the prediction region method gives a large-sample confidence region for  provided that  and the sample percentile  of the  is a consistent estimator of the percentile  of the random variable  in that 
 Proof.  With respect to the bootstrap sample, 
 is a constant, and the 
 are iid for 
. Fix 
B. Then,
          
          where the 
 are iid with the same distribution as 
u. For fixed 
B, the average of the 
 is
          
          by the Continuous Mapping Theorem, where 
 is an asymptotic multivariate normal approximation. Note that if 
, then
          
Hence, as ,  and (iii), (iv), and (v) hold. Hence, (b) follows.    □
 Under regularity conditions, Bickel and Ren (2001), Olive (2017, 2018), and Pelawa Watagoda and Olive (2021) [
1,
5,
7,
8] proved that (10), (11), and (12) are large-sample confidence regions. For Theorem 1, usually (i) and (ii) are proven using large-sample theory. Then,
        
        are well behaved. If 
 then 
, and (13) and (14) are large-sample confidence regions. If 
 is “not too ill conditioned," then 
 for large 
n, and confidence regions (13) and (14) will have coverage near 
. See Rajapaksha and Olive (2024) [
9].
If 
 and 
, where 
U has a unimodal probability density function symmetric about zero, then the confidence intervals from 
Section 1.1, including (2) and (3), are asymptotically equivalent (use the central proportion of the bootstrap sample, asymptotically). See Pelawa Watagoda and Olive (2021) [
5].
  3. Results
Example 1.  We generated  for . The coordinate-wise median was the statistic . The nonparametric bootstrap was used with  for the 90% confidence region (10). Then, the th sample quantile of the  is the 90.4% quantile. The DD plot of the bootstrap sample is shown in Figure 1. This bootstrap sample was a rather poor sample: the plotted points cluster about the identity line, but for most bootstrap samples, the clustering is tighter (as in Figure 2). The vertical line MD = 2.9098 is the cutoff for the prediction region method 90% confidence region (10). Hence, the points to the left of the vertical line correspond to , which are inside confidence region (10), while the points to the right of the vertical line correspond to , which are outside of confidence region (10). The long horizontal line RD = 3.0995 is the cutoff using the robust estimator. When , under mild regularity conditions,  The short horizontal line is RD = 2.8074, and MD = 2.8074 =  is approximately the cutoff  that would be used by the standard bootstrap confidence region (mentally drop a vertical line from where the short horizontal line ends at the identity line). Variability in DD plots increases as MD increases.  Inference after variable selection is an example where the undercoverage of confidence regions can be quite high. See, for example, Kabaila (2009) [
23]. Variable selection methods often use the Schwarz (1978) [
24] BIC criterion, the Mallows (1973) [
25] 
 criterion, or lasso due to Tibshirani (1996) [
26]. To describe a variable selection model, we will follow Rathnayake and Olive (2023) [
27] closely. Consider regression models where the response variable 
Y depends on the 
 vector of predictor 
x only through 
. Multiple linear regression models, generalized linear models, and proportional hazards regression models are examples of such regression models. Then, a model for variable selection can be described by
      
      where 
 is a 
 vector of predictors, 
 is an 
 vector, and 
 is a 
 vector. Given that 
 is in the model, 
, and 
E denotes the subset of terms that can be eliminated given that the subset 
S is in the model. Since 
S is unknown, candidate subsets will be examined. Let 
 be the vector of 
a terms from a candidate subset indexed by 
I, and let 
 be the vector of the remaining predictors (out of the candidate submodel). Then,
      
Suppose that 
S is a subset of 
I and that model (20) holds. Then,
      
      where 
 denotes the predictors in 
I that are not in 
 Underfitting occurs if submodel 
I does not contain 
S.
To clarify the notation, suppose that , a constant  corresponding to , is always in the model, and . Then, there are  possible subsets of  that contain 1, including  and . There are  subsets such that . Let  and  The full model uses 
Let  correspond to the set of predictors selected by a variable selection method such as forward selection or lasso variable selection. If  is , use zero padding to form the  vector  from  by adding 0s corresponding to the omitted variables. For example, if  and , then the observed variable selection estimator  As a statistic,  with probabilities  for , where there are J subsets, e.g., . Then, the variable selection estimator , and  with probabilities  for , where there are J subsets.
Assume 
p is fixed. Suppose model (20) holds, and that if 
, where the dimension of 
 is 
, then 
, where 
 is the covariance matrix of the asymptotic multivariate normal distribution. Then,
      
      where 
 adds columns and rows of zeros corresponding to the 
 not in 
, and 
 is singular unless 
 corresponds to the full model. This large-sample theory holds for many models.
If 
 are pairwise disjoint and if 
 then the collection of sets 
 is a 
partition of 
 Then, the Law of Total Probability states that if 
 form a partition of 
S such that 
 for 
, then
      
Let sets 
 satisfy 
 for 
 Define 
 if 
. Then, a Generalized Law of Total Probability is
      
Pötscher (1991) [
28] used the conditional distribution of 
 to find the distribution of 
 Let 
 be a random vector from the conditional distribution 
. Let 
 Denote 
 by 
 Then, Pötscher (1991) [
28] used the Generalized Law of Total Probability to prove that the cumulative distribution function (cdf) of 
 is
      
Hence,  has a mixture distribution of the  with probabilities , and  has a mixture distribution of the  with probabilities 
For the following Rathnayake and Olive (2023) [
27] theorem, the first assumption is 
 as 
. Then, the variable selection estimator corresponding to 
 underfits with probability going to zero, and the assumption holds under regularity conditions, if BIC and AIC is used for many parametric regression models such as GLMs. See Charkhi and Claeskens (2018) [
29] and Claeskens and Hjort (2008, pp. 70, 101, 102, 114, 232) [
30]. This assumption is a necessary condition for a variable selection estimator to be a consistent estimator. See Zhao and Yu (2006) [
31]. Thus, if a sparse estimator that performs variable selection is a consistent estimator of 
, then 
 as 
. Hence, Theorem 2 proves that the lasso variable selection estimator is a 
 consistent estimator of 
 if lasso is consistent. Charkhi and Claeskens (2018) [
29] showed that 
 if 
 for the maximum likelihood estimator with AIC, and gave a forward selection example. For a multiple linear regression model where 
S is the model with exactly one predictor that can be deleted, then only 
 and 
 are positive. If the 
 criterion is used, then it can be shown that 
, and 
. Theorem 2 proves that 
w is a mixture distribution of the 
 with probabilities 
.
Theorem 2.  Assume  as , and let  with probabilities , where  as . Denote the positive  by . Assume
. Then,where the cdf of w is .  Rathnayake and Olive (2023) [
27] suggested the following bootstrap procedure. Use a bootstrap method for the full model, such as the nonparametric bootstrap or the residual bootstrap, and then compute the full model and the variable selection estimator from the bootstrap data set. Repeat this 
B times to obtain the bootstrap sample for the full model and for the variable selection model. They could only prove that the bootstrap procedure works under very strong regularity conditions such as a 
 in Theorem 2, where 
 is known as the oracle property. See Claeskens and Hjort (2008, pp. 101–114) [
30] for references for the oracle property. For many statistics, a bootstrap data cloud 
 and a data cloud from 
B iid statistics 
 tend to have similar variability. Rathnayake and Olive (2023) [
27] suggested that when 
T is the variable selection estimator 
, the bootstrap data cloud often has more variability than the iid data cloud, and that this result tends to increase the bootstrap confidence region coverage.
For variable selection with the  vector , consider testing  versus  with , where oftentimes, . Then, let  and let  for . The shorth estimator can be applied to a bootstrap sample  to obtain a confidence interval for . Here,  and . The simulations used , , and . Let the multiple linear regression model  for . Hence,  with  ones and  zeros.
The regression models used the residual bootstrap with the forward selection estimator 
. 
Table 1 gives results for when the iid errors 
 with 
, 
, and 
. 
Table 1 shows two rows for each model giving the observed confidence interval coverages and average lengths of the confidence intervals. The nominal coverage was 95%. The term “reg" is for the full model regression, and the term “vs" is for forward selection. The last six columns give results for the tests. The terms pr, hyb, and br are for prediction region method (10), hybrid region (12), and Bickel and Ren region (11). The 0 indicates that the test was 
 versus 
, while the 1 indicates that the test was 
 versus 
. The length and coverage = P (fail to reject 
) for the interval 
 or 
, where 
 or 
 is the cutoff for the confidence region. The cutoff will often be near 
 if the statistic 
T is asymptotically normal. Note that 
 is close to 2.45 for the full model regression bootstrap tests. For the full model, 
 len 
 as 
 for the simulated data, and the shorth 95% confidence intervals have simulated length 
 The variable selection estimator and the full model estimator were similar for 
, and 
. The two estimators differed for 
 and 
 because 
 often occurred for 
 and 4. In particular, the confidence interval coverages for the variable selection estimator were very high, but the average lengths were shorter than those for the full model. If 
 was never selected, then 
 for all runs, and the confidence interval would be [0, 0] with 100% coverage and zero length.
Note that for the variable selection estimator with , the average cutoff values were near 2.7 and 3.0, which are larger than the  cutoff 2.448. Hence, using the standard bootstrap confidence region (16) would result in undercoverage. For , the bootstrap estimator often appeared to be approximately multivariate normal. Example 2 illustrates this result with a DD plot.
Example 2.  We generated  and  for  with the  iid  and . Then, we examined several bootstrap methods for multiple linear regression variable selection. The nonparametric bootstrap draws n cases  with replacement from the n original cases, and then selects variables on the resulting data set, resulting in . If  is , use zero padding to form the  vector  from  by adding 0s corresponding to the omitted variables. Repeat  times to obtain the bootstrap sample . Typically, the full model  or the submodel  that omitted  was selected. The residual bootstrap using the full model residuals was also used, where  for  where the  are sampled with replacement from the full model residuals . Forward selection and backward elimination could be used with the  or BIC criterion, or lasso could be used to perform the variable selection. Let  be obtained from  by leaving out the fifth value. Hence, if  then . Figure 2 shows the DD plot for the confidence region corresponding to the  using forward selection with the  criterion. This confidence region corresponds to the test , e.g., . Plots created with backward elimination and lasso were similar. Rathnayake and Olive (2023) [27] obtained the large-sample theory for the variable selection estimators  for multiple linear regression and many other regression methods. The limiting distribution is a complicated non-normal mixture distribution by Theorem 2, but in simulations, where S is known, the  often appeared to have an approximate multivariate normal distribution.  A small simulation study was conducted on large-sample 95% confidence regions. The coordinate-wise median was used since this statistic is moderately difficult to bootstrap. We used 5000 runs. Then, the coverage within [0.94, 0.96] suggests that the true coverage is near the nominal coverage 0.95. The simulation used 10 distributions, where xtype = 1 for 
 xtype = 2, 3, 4, and 5 for 
; xtype = 6, 7, 8, and 9 for a multivariate 
 with 
d = 3, 5, 19, or 
d, given by the user; and xtype=10 for a log-normal distribution shifted to have the coordinate-wise median = 
0. If 
w corresponds to one of the above distributions, then 
 with 
. Then, the population coordinate-wise median is 
0 for each distribution. 
Table 2 shows the coverages and average cutoff for four large-sample confidence regions: (10), (19), with 
, (19) with 
, and (19) with 
. The coverage is the proportion of times that the confidence region contained 
, where 
 is a 
 vector. Each confidence region has a cutoff, 
, that depends on the bootstrap sample, and the average of the 5000 cutoffs is given. Here, 
 for confidence region (10), while 
 for confidence region (19), where the cutoff also depends on 
. The coverages were usually between 0.94 and 0.96. The average cutoffs for the prediction region method’s large-sample 95% confidence region tended to be very close to the average cutoffs for confidence region (19) with 
. Note that 
 and 
 are the cutoffs for the standard bootstrap confidence region (15). The ratio of volumes of the two confidence regions is volume (10)/volume (19) 
.
  4. Discussion
The bootstrap was due to Efron (1979) [
32]. Also, see Efron (1982) [
3] and Bickel and Freedman (1981) [
33]. Ghosh and Polansky (2014) and Politis and Romano (1994) [
34,
35] are useful references for bootstrap confidence regions. For a small dimension 
p, nonparametric density estimation can be used to construct confidence regions and prediction regions. See, for example, Hall (1987) and Hyndman (1986) [
36,
37] Visualizing a bootstrap confidence region is useful for checking whether the asymptotic normal approximation for the statistic is good since the plotted points will then tend to cluster tightly about the identity line. Making five plots corresponding to five bootstrap samples can be used to check the variability of the plots and the probability of obtaining a bad sample. For Example 1, most of the bootstrap samples produced plots that had tighter clustering about the identity line than the clustering in 
Figure 1.
The new bootstrap confidence region (19) used the fact that bootstrap confidence region (10) is simultaneously a prediction region for a future bootstrap statistic  and a confidence region for  with the same asymptotic coverage . Hence, increasing the coverage as a prediction region also increases the coverage as a confidence region. The data splitting technique used to increase the coverage only depends on the  being iid with respect to the bootstrap distribution. Correction factor (8) increases the coverage, but this calibration technique needed intensive simulation.
Calibrating a bootstrap confidence region is useful for several reasons. For simulations, computation time can be reduced if 
B can be reduced. Using correction factor (8) is faster than using the two-sample bootstrap of 
Section 2.1, but the two-sample bootstrap can be used to check the accuracy of (8), as in 
Table 2 with 
. For a nominal 95% prediction region, correction factor (8) increases the coverage to at most 97.5% of the training data. Coverage for test data 
 tends to be worse than coverage for training data. Using the cutoff 
 of (8) gives better coverage than using cutoff 
 with 
. The two calibration methods in this paper were first applied to prediction regions, and work for bootstrap confidence regions (10) and (11) since those two regions are also prediction regions for 
.
Plots and simulations were conducted in 
R. See R Core Team (2020) [
38]. Welagedara (2023) [
39] lists some 
R functions for bootstrapping several statistics. The programs used are in the collection of functions 
slpack.txt. See 
http://parker.ad.siu.edu/Olive/slpack.txt, accessed on 1 August 2024. The function 
ddplot4 applied to the bootstrap sample can be used to visualize the bootstrap prediction region method’s confidence region. The function 
medbootsim was used for 
Table 2. Some functions for bootstrapping multiple linear regression variable selection with the residual bootstrap are 
belimboot for backward elimination using 
, 
bicboot for forward selection using BIC, 
fselboot for forward selection using 
, 
lassoboot for lasso variable selection, and 
vselboot for all of the subsets’ variable selection with 
.