Next Article in Journal
Capital Asset Pricing Model and Ordered Weighted Average Operator for Selecting Investment Portfolios
Previous Article in Journal
Two-Matchings with Respect to the General Sum-Connectivity Index of Trees
Previous Article in Special Issue
The Geometry of Dynamic Time-Dependent Best–Worst Choice Pairs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Calibrating and Visualizing Some Bootstrap Confidence Regions

by
Welagedara Arachchilage Dhanushka M. Welagedara
1 and
David J. Olive
2,*
1
Department of Mathematics, Hampton University, Hampton, VA 23668, USA
2
Mathematical & Statistical Sciences, Southern Illinois University, Carbondale, IL 62901, USA
*
Author to whom correspondence should be addressed.
Axioms 2024, 13(10), 659; https://doi.org/10.3390/axioms13100659
Submission received: 16 August 2024 / Revised: 17 September 2024 / Accepted: 19 September 2024 / Published: 25 September 2024
(This article belongs to the Special Issue New Perspectives in Mathematical Statistics)

Abstract

:
When the bootstrap sample size is moderate, bootstrap confidence regions tend to have undercoverage. Improving the coverage is known as calibrating the confidence region. Consider testing H 0 : θ = θ 0 versus H 1 : θ θ 0 . We reject H 0 only if θ 0 is not contained in a large-sample 95% confidence region. If the confidence region has 3% undercoverage for the data set sample size, then the type I error is 8% instead of the nominal 5%. Hence, calibrating confidence regions is also useful for testing hypotheses. Several bootstrap confidence regions are also prediction regions for a future value of a bootstrap statistic. A new bootstrap confidence region uses a simple prediction region calibration technique to improve the coverage. The DD plot for visualizing prediction regions can also be used to visualize some bootstrap confidence regions.
MSC:
62F40

1. Introduction

When the bootstrap sample size B is small or moderate, bootstrap confidence regions, including bootstrap confidence intervals, tend to have undercoverage: the probability that the confidence region contains the p × 1 parameter vector θ is less than the nominal large-sample coverage probability 1 δ . Then, coverage can be increased by increasing the nominal coverage of the large-sample bootstrap confidence region. For example, if the undercoverage of the nominal large-sample 95% bootstrap confidence region with B = 1000 is 2%, the coverage is increased to 97%. This procedure is known as calibrating the confidence region. Calibration tends to be difficult since the amount of undercoverage is usually unknown. This paper provides a simple method for improving the coverage and provides a method for visualizing some bootstrap confidence regions.
Using correction factors for large-sample confidence intervals, tests, prediction intervals, prediction regions, and confidence regions improves the coverage performance for a moderate sample size n. If confidence regions are used for hypothesis testing, then this calibration reduces the type I error. For a random variable X, let P ( X x 1 δ ) = 1 δ . Note that correction factors b n 1 as n are used in large-sample confidence intervals and large-sample tests if the limiting distribution is Z N ( 0 , 1 ) or X χ k 2 , but a t d n or k F k , d n cutoff is used: t d n , 1 δ = ( t d n , 1 δ / z 1 δ ) z 1 δ with b n = t d n , 1 δ / z 1 δ 1 and k F k , d n , 1 δ = ( k F k , d n , 1 δ / χ k , 1 δ 2 ) χ k , 1 δ 2 with b n = k F k , d n , 1 δ / χ k , 1 δ 2 1 if d n as n . For moderate n, the test or confidence interval with the correction factor b n > 1 has better level or coverage than the test or confidence interval that does not use the correction factor, in that the simulated level or coverage is closer to the nominal level or coverage.
Sometimes, the test statistic has a t d n or F k , d n distribution under normality, but the test statistic (possibly scaled by multiplying by k) is asymptotically normal or asymptotically χ k 2 for a large class of distributions. The t test and t confidence interval for the sample mean are examples where the asymptotic normality holds by the central limit theorem. Many F tests for linear models, experimental design models, and multivariate analyses also satisfy k F 0 D χ k 2 as n , where F 0 is the test statistic. See, for example, Olive (2017) [1].
Section 1.1 reviews prediction intervals, prediction regions, confidence intervals, and confidence regions. Several of these methods use correction factors to improve the coverage, and several bootstrap confidence intervals and regions are obtained by applying prediction intervals and regions to the bootstrap sample. Section 1.2 reviews a bootstrap theorem and shows that some bootstrap confidence regions are asymptotically equivalent.
Section 2.1 gives a new bootstrap confidence region with a simple correction factor, while Section 2.2 shows how to visualize some bootstrap confidence regions. Section 3 presents some simulation results.

1.1. Prediction Regions and Confidence Regions

Consider predicting a future test value Y f given past training data Y 1 , , Y n , where Y 1 , , Y n , Y f are independent and identically distributed (iid). A large-sample 100 ( 1 δ ) % prediction interval (PI) for Y f is [ L n , U n ] , where the coverage P ( L n Y f U n ) = 1 δ n is eventually bounded below by 1 δ as n . We often want 1 δ n 1 δ as n . A large-sample 100 ( 1 δ ) % PI is asymptotically optimal if it has the shortest asymptotic length: the length of [ L ^ n , U ^ n ] converges to U s L s as n , where [ L s , U s ] is the population shorth, the shortest interval covering at least 100 ( 1 δ ) % of the mass.
Let the data Y = ( Y 1 , , Y n ) T have joint probability density function or probability mass function f ( y | θ ) with parameter space Θ and support Y . Let L n = L n ( Y ) and U n = U n ( Y ) be statistics such that L n ( y ) U n ( y ) , y Y . Then, the interval [ L n ( y ) , U n ( y ) ] is a large-sample 100 ( 1 δ ) % confidence interval (CI) for θ if
P θ ( L n ( Y ) θ U n ( Y ) )
is eventually bounded below by 1 δ for all θ Θ as the sample size n .
Consider predicting a p × 1 future test value x f , given past training data x 1 , , x n , where x 1 , , x n , x f are iid. A large-sample 100 ( 1 δ ) % prediction region is a set A n such that P ( x f A n ) is eventually bounded below by 1 δ as n . A prediction region is asymptotically optimal if its volume converges in probability to the volume of the minimum volume covering region or the highest density region of the distribution of x f .
A large-sample 100 ( 1 δ ) % confidence region for a p × 1 vector of parameters θ is a set A n such that P ( θ A n ) is eventually bounded below by 1 δ as n . For testing H 0 : θ = θ 0 versus H 1 : θ θ 0 , we fail to reject H 0 if θ 0 is in the confidence region and reject H 0 if θ 0 is not in the confidence region.
For prediction intervals, let Y ( 1 ) Y ( 2 ) Y ( n ) be the order statistics of the training data. Open intervals need more regularity conditions than closed intervals. For the following prediction interval, if the open interval ( Y ( k 1 ) , Y ( k 2 ) ) was used, we would need to add the regularity condition that the population percentiles Y δ / 2 and Y 1 δ / 2 are continuity points of the cumulative distribution function F Y ( y ) . See Frey (2013) [2] for references.
Let k 1 = n δ / 2 and k 2 = n ( 1 δ / 2 ) , where 0 < δ < 1 . A large-sample 100 ( 1 δ ) %  percentile prediction interval for Y f is
[ Y ( k 1 ) , Y ( k 2 ) ] .
The bootstrap percentile confidence interval given by Equation (2) is obtained by applying the percentile prediction interval (1) to the bootstrap sample T 1 * , , T B * , where T = T n is a test statistic. See Efron (1982) [3].
A large-sample 100 ( 1 δ ) % bootstrap percentile confidence interval for θ is an interval [ T ( k L ) * , T ( k U ) * ] containing B ( 1 δ ) of the T i * . Let k 1 = B δ / 2 and k 2 = B ( 1 δ / 2 ) . A common choice is
[ T ( k 1 ) * , T ( k 2 ) * ] .
The shorth (c) estimator of the population shorth is useful for making asymptotically optimal prediction intervals. For a large-sample 100 ( 1 δ ) % PI, the nominal coverage is 100 ( 1 δ ) % . Undercoverage occurs if the actual coverage is below the nominal coverage. For example, if the actual coverage is 0.93 for a large-sample 95% PI, then the undercoverage is 0.02. Consider intervals that contain c cases [ Y ( 1 ) , Y ( c ) ] , [ Y ( 2 ) , Y ( c + 1 ) ] , , [ Y ( n c + 1 ) , Y ( n ) ] . Compute Y ( c ) Y ( 1 ) , Y ( c + 1 ) Y ( 2 ) , , Y ( n ) Y ( n c + 1 ) . Then, the estimator shorth (c) = [ Y ( s ) , Y ( s + c 1 ) ] is the interval with the shortest length. The shorth (c) interval is a large-sample 100 ( 1 δ ) % PI if c / n 1 δ as n that often has the asymptotically shortest length. Let k n = n ( 1 δ ) . Frey (2013) [2] showed that for large n δ and iid data, the large-sample 100 ( 1 δ ) % shorth ( k n ) prediction interval has maximum undercoverage ≈ 1.12 δ / n , and then used the large-sample 100 ( 1 δ ) % PI shorth (c) =
[ Y ( s ) , Y ( s + c 1 ) ] w i t h c = min ( n , n [ 1 δ + 1.12 δ / n ] ) .
The shorth confidence interval is a practical implementation of Hall’s (1988) [4] shortest bootstrap percentile interval based on all possible bootstrap samples, and is obtained by applying shorth PI (3) to the bootstrap sample T 1 * , , T B * . See Pelawa Watagoda and Olive (2021) [5]. The large-sample 100 ( 1 δ ) % shorth (c) CI =
[ T ( s ) * , T ( s + c 1 ) * ] w h e r e c = min ( B , B [ 1 δ + 1.12 δ / B ] ) .
To describe Olive’s (2013) [6] nonparametric prediction region, Mahalanobis distances will be useful. Let the p × 1 column vector T = T n be a multivariate location estimator, and let the p × p symmetric positive definite matrix C n be a dispersion estimator. Then, the ith squared sample Mahalanobis distance is the scalar
D i 2 = D i 2 ( T n , C n ) = D x i 2 ( T n , C n ) = ( x i T n ) T C n 1 ( x i T n )
for each observation x i , where i = 1 , , n . Notice that the Euclidean distance of x i from the estimate of center T is D i ( T n , I p ) , where I p is the p × p identity matrix. The classical Mahalanobis distance D i uses ( T n , C n ) = ( x ¯ , S ) , the sample mean, and sample covariance matrix, where
x ¯ = 1 n i = 1 n x i and S = 1 n 1 i = 1 n ( x i x ¯ ) ( x i x ¯ ) T .
Let the p × 1 location vector be μ , which is often the population mean, and let the p × p dispersion matrix be Σ , which is often the population covariance matrix. If x is a random vector, then the population squared Mahalanobis distance is
D x 2 ( μ , Σ ) = ( x μ ) T Σ 1 ( x μ ) .
Like prediction intervals, prediction regions often need correction factors. For iid data from a distribution with a p × p nonsingular covariance matrix, it was found that the simulated maximum undercoverage of prediction region (9) without the correction factor was about 0.05 when n = 20 p . Hence, correction factor (8) is used to obtain better coverage for small n. Let q n = min ( 1 δ + 0.05 , 1 δ + p / n ) for δ > 0.1 and
q n = min ( 1 δ / 2 , 1 δ + 10 δ p / n ) , otherwise .
If 1 δ < 0.999 and q n < 1 δ + 0.001 , set q n = 1 δ . Let D ( U n ) be the 100 q n th sample quantile of the D i , where i = 1 , , n . Olive (2013) [6] suggests that n 50 p may be needed for the following prediction region to have a good volume, and n 20 p for good coverage. Of course, for any n, there are distributions that will have severe undercoverage.
The large-sample 100 ( 1 δ ) % nonparametric prediction region for a future value x f given iid data x 1 , , x n is
{ z : ( z x ¯ ) T S 1 ( z x ¯ ) D ( U n ) 2 } = { z : D z 2 ( x ¯ , S ) D ( U n ) 2 } .
Olive’s (2017, 2018) [1,7] prediction region method confidence region applies prediction region (9) to the bootstrap sample. Let the bootstrap sample be T 1 * , , T B * . Let T ¯ * and S T * be the sample mean and sample covariance matrix of the bootstrap sample.
The large-sample 100 ( 1 δ ) % prediction region method confidence region for θ is
{ w : ( w T ¯ * ) T [ S T * ] 1 ( w T ¯ * ) D ( U B ) 2 } = { w : D w 2 ( T ¯ * , S T * ) D ( U B ) 2 }
where the cutoff D ( U B ) 2 is the 100 q B th sample quantile of the D i 2 = ( T i * T ¯ * ) T [ S T * ] 1 ( T i * T ¯ * ) for i = 1 , , B . Note that the corresponding test for H 0 : θ = θ 0 rejects H 0 if ( T ¯ * θ 0 ) T [ S T * ] 1 ( T ¯ * θ 0 ) > D ( U B ) 2 .
Olive’s (2017, 2018) [1,7] large-sample 100 ( 1 δ ) % modification of Bickel and Ren’s (2001) [8] confidence region is
{ w : ( w T n ) T [ S T * ] 1 ( w T n ) D ( U B T ) 2 } = { w : D w 2 ( T n , S T * ) D ( U B T ) 2 }
where the cutoff D ( U B T ) 2 is the 100 q B th sample quantile of the D i 2 = ( T i * T n ) T [ S T * ] 1 ( T i * T n ) . Note that the corresponding test for H 0 : θ = θ 0 rejects H 0 if ( T n θ 0 ) T [ S T * ] 1 ( T n θ 0 ) > D ( U B T ) 2 .
Shift region (9) to have center T n , or equivalently, to change the cutoff of region (11) to D ( U B ) 2 to obtain Pelawa Watagoda and Olive’s (2021) [5] large-sample 100 ( 1 δ ) % hybrid confidence region,
{ w : ( w T n ) T [ S T * ] 1 ( w T n ) D ( U B ) 2 } = { w : D w 2 ( T n , S T * ) D ( U B ) 2 } .
Note that the corresponding test for H 0 : θ = θ 0 rejects H 0 if ( T n θ 0 ) T [ S T * ] 1 ( T n θ 0 ) > D ( U B ) 2 .
Rajapaksha and Olive (2024) [9] gave the following two confidence regions. The names of these confidence regions were chosen since they are similar to Bickel and Ren’s and the prediction region method’s confidence regions.
The large-sample 100 ( 1 δ ) % BR confidence region is
{ w : n ( w T n ) T C n 1 ( w T n ) D ( U B T ) 2 } = { w : D w 2 ( T n , C n / n ) D ( U B T ) 2 }
where the cutoff D ( U B T ) 2 is the 100 q B th sample quantile of the D i 2 = n ( T i * T n ) T C n 1 ( T i * T n ) . Note that the corresponding test for H 0 : θ = θ 0 rejects H 0 if n ( T n θ 0 ) T C n 1 ( T n θ 0 ) > D ( U B T ) 2 .
The large-sample 100 ( 1 δ ) % PR confidence region for θ is
{ w : n ( w T ¯ * ) T C n 1 ( w T ¯ * ) D ( U B ) 2 } = { w : D w 2 ( T ¯ * , C n / n ) D ( U B ) 2 }
where D ( U B ) 2 is the 100 q B th sample quantile of the D i 2 = n ( T i * T ¯ * ) T C n 1 ( T i * T ¯ * ) for i = 1 , , B . Note that the corresponding test for H 0 : θ = θ 0 rejects H 0 if n ( T ¯ * θ 0 ) T C n 1 ( T ¯ * θ 0 ) > D ( U B ) 2 .
Assume that x 1 , , x n , x f are iid N p ( μ , Σ x ) . Then, Chew’s (1966) [10] large-sample 100 ( 1 δ ) %  classical prediction region for multivariate normal data is
{ z : D z 2 ( x ¯ , S ) χ p , 1 δ 2 } .
The next bootstrap confidence region is similar to what would be obtained if the classical prediction region (15) for multivariate normal data was applied to the bootstrap sample. The large-sample 100 ( 1 δ ) % standard bootstrap confidence region for θ is
{ w : ( w T n ) T [ S T * ] 1 ( w T n ) D 1 δ 2 } = { w : D w 2 ( T n , S T * ) D 1 δ 2 }
where D 1 δ 2 = χ p , 1 δ 2 or D 1 δ 2 = p F p , d n , 1 δ , where d n as n .
If p = 1 , then a hyperellipsoid is an interval, and confidence intervals are special cases of confidence regions. Suppose the parameter of interest is θ , and there is a bootstrap sample T 1 * , , T B * where the statistic T n is an estimator of θ based on a sample of size n. Let a i = | T i * T ¯ * | and let b i = | T i * T n | . Let T ¯ * and S T 2 * be the sample mean and variance of T i * . Then, the squared Mahalanobis distance D θ 2 = ( θ T ¯ * ) 2 / S T * 2 D ( U B ) 2 is equivalent to θ [ T ¯ * S T * D ( U B ) , T ¯ * + S T * D ( U B ) ] = [ T ¯ * a ( U B ) , T ¯ * + a ( U B ) ] , which is an interval centered at T ¯ * just long enough to cover U B of the T i * . Efron (2014) [11] used a similar large-sample 100 ( 1 δ ) % confidence interval assuming that T ¯ * is asymptotically normal. Then, the large-sample 100 ( 1 δ ) % PR CI is [ T ¯ * a ( U B ) , T ¯ * + a ( U B ) ] . The large-sample 100 ( 1 δ ) % BR CI is [ T n b ( U B T ) , T n + b ( U B T ) ] , which is an interval centered at T n just long enough to cover U B T of the T i * . The large-sample 100 ( 1 δ ) % hybrid CI is [ T n a ( U B ) , T n + a ( U B ) ] .
The following prediction region will be used to develop a new correction factor for bootstrap confidence regions. See Section 2.1. Data splitting divides the training data x 1 , , x n into two sets: H and the validation set V, where H has n H of the cases and V has the remaining n V = n n H cases i 1 , , i n V .
The estimator ( T H , C H ) is computed using data set H. Then, the squared validation distances D j 2 = D x i j 2 ( T H , C H ) = ( x i j T H ) T C H 1 ( x i j T H ) are computed for the j = 1 , , n V cases in the validation set V. Let D ( U V ) 2 be the U V th order statistic of the D j 2 , where
U V = min ( n V , ( n V + 1 ) ( 1 δ ) ) .
Haile, Zhang, and Olive’s (2024) [12] large-sample 100 ( 1 δ ) % data splitting prediction region for x f is
{ z : D z 2 ( T H , C H ) D ( U V ) 2 } .

1.2. Some Confidence Region Theories

Some large-sample theories for bootstrap confidence regions are given in the references in Section 1.1. The following theorem of Pelawa Watagoda and Olive (2021) [5] and its proof are useful.
Theorem 1. 
(a) Suppose as n , (i) n ( T n θ ) D u , and (ii) n ( T i * T n ) D u with E ( u ) = 0 and Cov ( u ) = Σ u . Then, (iii) n ( T ¯ * θ ) D u , (iv) n ( T i * T ¯ * ) D u , and (v)  n ( T ¯ * T n ) P 0 .
(b) Then, the prediction region method gives a large-sample confidence region for θ provided that B and the sample percentile D ^ 1 δ 2 of the D T i * 2 ( T ¯ * , S T * ) = n ( T i * T ¯ * ) T ( n S T * ) 1 n ( T i * T ¯ * ) is a consistent estimator of the percentile D n , 1 δ 2 of the random variable D θ 2 ( T ¯ * , S T * ) = n ( θ T ¯ * ) T ( n S T * ) 1 n ( θ T ¯ * ) in that D ^ 1 δ 2 D n , 1 δ 2 P 0 .
Proof. 
With respect to the bootstrap sample, T n is a constant, and the n ( T i * T n ) are iid for i = 1 , , B . Fix B. Then,
n ( T 1 * T n ) n ( T B * T n ) D v 1 v B
where the v i are iid with the same distribution as u. For fixed B, the average of the n ( T i * T n ) is
n ( T ¯ * T n ) D 1 B i = 1 B v i A N g 0 , Σ u B
by the Continuous Mapping Theorem, where z A N g ( 0 , Σ ) is an asymptotic multivariate normal approximation. Note that if u N g ( 0 , Σ u ) , then
n ( T ¯ * T n ) D N g 0 , Σ u B .
Hence, as B , n ( T ¯ * T n ) P 0 , and (iii), (iv), and (v) hold. Hence, (b) follows.    □
Under regularity conditions, Bickel and Ren (2001), Olive (2017, 2018), and Pelawa Watagoda and Olive (2021) [1,5,7,8] proved that (10), (11), and (12) are large-sample confidence regions. For Theorem 1, usually (i) and (ii) are proven using large-sample theory. Then,
D 1 2 = D T i * 2 ( T ¯ * , C n / n ) = n ( T i * T ¯ * ) T C n 1 n ( T i * T ¯ * ) ,
D 2 2 = D θ 2 ( T n , C n / n ) = n ( T n θ ) T C n 1 n ( T n θ ) ,
D 3 2 = D θ 2 ( T ¯ * , C n / n ) = n ( T ¯ * θ ) T C n 1 n ( T ¯ * θ ) , and
D 4 2 = D T i * 2 ( T n , C n / n ) = n ( T i * T n ) T C n 1 n ( T i * T n ) ,
are well behaved. If C n 1 P C 1 , then D j 2 D D 2 = u T C 1 u , and (13) and (14) are large-sample confidence regions. If C n 1 is “not too ill conditioned," then D j 2 u T C n 1 u for large n, and confidence regions (13) and (14) will have coverage near 1 δ . See Rajapaksha and Olive (2024) [9].
If n ( T n θ ) D U and n ( T i * T n ) D U , where U has a unimodal probability density function symmetric about zero, then the confidence intervals from Section 1.1, including (2) and (3), are asymptotically equivalent (use the central proportion of the bootstrap sample, asymptotically). See Pelawa Watagoda and Olive (2021) [5].

2. Materials and Methods

2.1. The Two-Sample Bootstrap

Correction factors for calibrating confidence regions and prediction regions are often difficult to obtain. For prediction regions, see Barndorff-Nielsen and Cox (1996); Beran (1990); Fonseca, Giummole, and Vidoni (2012); Frey (2013); Hall, Peng, and Tajvidi (1999); Hall and Rieck (2001); and Ueki and Fueda (2007) [2,13,14,15,16,17,18]. For confidence regions, see DiCiccio and Efron (1996) and Loh (1987, 1991) [19,20,21]. Simulation was used to obtain correction factor (8). The bootstrap confidence regions (2), (4), and (10) were obtained by applying prediction regions (1), (3), and (9), respectively, on the bootstrap sample. By Theorem 1, bootstrap confidence regions (11) and (12) are asymptotically equivalent to (10). Hence, these large-sample confidence regions for θ are also large-sample prediction regions for a future value of the bootstrap statistic T F * .
Haile, Zhang, and Olive (2024) [12] proved that the data splitting prediction regions (18) have coverage min ( n V , ( n V + 1 ) ( 1 δ ) ) / ( n V + 1 ) , with equality if the probability of ties is zero. Hence, data splitting can be used to calibrate some prediction regions. The new confidence region obtains ( T H , C H ) from the bootstrap data set T 1 * , , T B * using n H = B . For example, ( T H , C H ) = ( T ¯ * , S T * ) . Then, a second bootstrap sample T 2 , 1 * , , T 2 , n V * is drawn. Then, the new large-sample 100 ( 1 δ ) % two-sample bootstrap confidence region is
{ w : D w 2 ( T H , C H ) D ( U V ) 2 } .
This result holds since if ( T H , C H ) = ( T ¯ * , S T * ) , then both (10) and (19) are also 100 ( 1 δ ) % prediction regions for a future value of T F * , and only differ by the cutoff used: D ( U B ) 2 or D ( U V ) 2 . See the following paragraph. Hence, as n , B , and n V , D ( U B ) 2 D ( U V ) 2 P 0 , and confidence regions (10) and (19) are asymptotically equivalent. For a large-sample 95% confidence region, we recommend n v = 49 , 99 , or B.
The two-sample bootstrap confidence region applies the data splitting prediction region on T 1 * , , T B * , T 2 , 1 * , , T 2 , n V * with n H = B and n V = n V , where H uses the first B cases, and V uses the remaining n V cases. A random selection of cases is not needed since the T * s are iid with respect to the bootstrap sample. For (19) to be a large-sample 100 ( 1 δ ) % confidence region, the region applied to the first sample H needs to be both a large-sample 100 ( 1 δ ) % confidence region for θ and a large-sample 100 ( 1 δ ) % prediction region for T f * . Using ( T H , C H ) = ( T ¯ * , S T * ) corresponds to (10), while using ( T H , C H ) = ( T n , S T * ) corresponds to (11). Thus, the two-sample bootstrap confidence region corresponding to (10) is
{ w : ( w T ¯ * ) T [ S T * ] 1 ( w T ¯ * ) D ( U V ) 2 } = { w : D w 2 ( T ¯ * , S T * ) D ( U V ) 2 } .
Hence, the sample percentile D ( U B ) 2 in (10) is replaced by the order statistic D ( U V ) 2 .

2.2. Visualizing Some Bootstrap Confidence Regions

Olive (2013) [6] showed how to visualize nonparametric prediction region (9) with the Rousseeuw and Van Driessen (1999) [22] DD plot of classical distances versus robust distances on the vertical axis. Hence, the exact same method can be used to visualize bootstrap confidence region (10).
If a good robust estimator is used, the plotted points in a DD plot cluster about the identity line with zero intercept and unit slope if the x i are iid from a multivariate normal distribution with nonsingular covariance matrix, while the plotted points cluster about some other line through the origin if the x i are iid from a large family of non-normal elliptically contoured distributions. For the robust estimator of the multivariate location and dispersion, we recommend the RFCH or RMVN estimator. See Olive (2017) [1]. These two estimators ( T n , C n ) are such that C n is a n consistent estimator of a Cov ( x ) for a large class of elliptically contoured distributions, where the constant a > 0 depends on the elliptically contoured distribution and the estimator RFCH or RMVN, and a = 1 for the multivariate normal distribution with a nonsingular covariance matrix. We used the RMVN estimator in the simulations.
Example 1, in the following section, shows how to use a DD plot to visualize some bootstrap confidence regions. Often, n ( T n θ ) D N p ( 0 , Σ T ) , n ( T i * T n ) D N p ( 0 , Σ T ) , and n ( T i * T ¯ * ) D N p ( 0 , Σ T ) . Then, the plotted points in the DD plot tend to cluster about the identity line in the DD plot. Note that { w : D w 2 ( T ¯ * , S T * ) D ( U B ) 2 } = { w : D w ( T ¯ * , S T * ) D ( U B ) } . Hence, T i * such that D T i * ( T ¯ * , S T * ) D ( U B ) are in confidence region (10). These T i * correspond to the points to the left of the vertical line M D = D ( U B ) in the DD plot.

3. Results

Example 1. 
We generated x i N 4 ( 0 , I ) for i = 1 , , 250 . The coordinate-wise median was the statistic T n . The nonparametric bootstrap was used with B = 1000 for the 90% confidence region (10). Then, the 100 q B th sample quantile of the D i is the 90.4% quantile. The DD plot of the bootstrap sample is shown in Figure 1. This bootstrap sample was a rather poor sample: the plotted points cluster about the identity line, but for most bootstrap samples, the clustering is tighter (as in Figure 2). The vertical line MD = 2.9098 is the cutoff for the prediction region method 90% confidence region (10). Hence, the points to the left of the vertical line correspond to T i * , which are inside confidence region (10), while the points to the right of the vertical line correspond to T i * , which are outside of confidence region (10). The long horizontal line RD = 3.0995 is the cutoff using the robust estimator. When n ( T n θ ) D N p ( 0 , Σ T ) , under mild regularity conditions, n ( T n T ¯ n * ) P 0 . The short horizontal line is RD = 2.8074, and MD = 2.8074 = χ 4 , 0.904 2 is approximately the cutoff χ 4 , 0.9 2 = 2.7892 that would be used by the standard bootstrap confidence region (mentally drop a vertical line from where the short horizontal line ends at the identity line). Variability in DD plots increases as MD increases.
Inference after variable selection is an example where the undercoverage of confidence regions can be quite high. See, for example, Kabaila (2009) [23]. Variable selection methods often use the Schwarz (1978) [24] BIC criterion, the Mallows (1973) [25] C p criterion, or lasso due to Tibshirani (1996) [26]. To describe a variable selection model, we will follow Rathnayake and Olive (2023) [27] closely. Consider regression models where the response variable Y depends on the p × 1 vector of predictor x only through x T β . Multiple linear regression models, generalized linear models, and proportional hazards regression models are examples of such regression models. Then, a model for variable selection can be described by
x T β = x S T β S + x E T β E = x S T β S
where x = ( x S T , x E T ) T is a p × 1 vector of predictors, x S is an a S × 1 vector, and x E is a ( p a S ) × 1 vector. Given that x S is in the model, β E = 0 , and E denotes the subset of terms that can be eliminated given that the subset S is in the model. Since S is unknown, candidate subsets will be examined. Let x I be the vector of a terms from a candidate subset indexed by I, and let x O be the vector of the remaining predictors (out of the candidate submodel). Then,
x T β = x I T β I + x O T β O .
Suppose that S is a subset of I and that model (20) holds. Then,
x T β = x S T β S = x S T β S + x I / S T β ( I / S ) + x O T 0 = x I T β I
where x I / S denotes the predictors in I that are not in S . Underfitting occurs if submodel I does not contain S.
To clarify the notation, suppose that p = 4 , a constant x 1 = 1 corresponding to β 1 , is always in the model, and β = ( β 1 , β 2 , 0 , 0 ) T . Then, there are J = 2 p 1 = 8 possible subsets of { 1 , 2 , , p } that contain 1, including I 1 = { 1 } and S = I 2 = { 1 , 2 } . There are 2 p a S = 4 subsets such that S I j . Let β ^ I 2 = ( β ^ 1 , β ^ 2 ) T and x I 2 = ( x 1 , x 2 ) T . The full model uses β F = β .
Let I m i n correspond to the set of predictors selected by a variable selection method such as forward selection or lasso variable selection. If β ^ I is a × 1 , use zero padding to form the p × 1 vector β ^ I , 0 from β ^ I by adding 0s corresponding to the omitted variables. For example, if p = 4 and β ^ I m i n = ( β ^ 1 , β ^ 3 ) T , then the observed variable selection estimator β ^ V S = β ^ I m i n , 0 = ( β ^ 1 , 0 , β ^ 3 , 0 ) T . As a statistic, β ^ V S = β ^ I k , 0 with probabilities π k n = P ( I m i n = I k ) for k = 1 , , J , where there are J subsets, e.g., J = 2 p 1 . Then, the variable selection estimator β ^ V S = β ^ I m i n , 0 , and β ^ V S = β ^ I k , 0 with probabilities π k n = P ( I m i n = I k ) for k = 1 , , J , where there are J subsets.
Assume p is fixed. Suppose model (20) holds, and that if S I j , where the dimension of I j is a j , then n ( β ^ I j β I j ) D N a j ( 0 , V j ) , where V j is the covariance matrix of the asymptotic multivariate normal distribution. Then,
n ( β ^ I j , 0 β ) D N p ( 0 , V j , 0 )
where V j , 0 adds columns and rows of zeros corresponding to the x i not in I j , and V j , 0 is singular unless I j corresponds to the full model. This large-sample theory holds for many models.
If A 1 , A 2 , , A k are pairwise disjoint and if i = 1 k A i = S , then the collection of sets A 1 , A 2 , , A k is a partition of S . Then, the Law of Total Probability states that if A 1 , A 2 , , A k form a partition of S such that P ( A i ) > 0 for i = 1 , , k , then
P ( B ) = j = 1 k P ( B A j ) = j = 1 k P ( B | A j ) P ( A j ) .
Let sets A k + 1 , , A m satisfy P ( A i ) = 0 for i = k + 1 , , m . Define P ( B | A j ) = 0 if P ( A j ) = 0 . Then, a Generalized Law of Total Probability is
P ( B ) = j = 1 m P ( B A j ) = j = 1 m P ( B | A j ) P ( A j ) .
Pötscher (1991) [28] used the conditional distribution of β ^ V S | ( β ^ V S = β ^ I k , 0 ) to find the distribution of w n = n ( β ^ V S β ) . Let β ^ I k , 0 C be a random vector from the conditional distribution β ^ I k , 0 | ( β ^ V S = β ^ I k , 0 ) . Let w k n = n ( β ^ I k , 0 β ) | ( β ^ V S = β ^ I k , 0 ) n ( β ^ I k , 0 C β ) . Denote F z ( t ) = P ( z 1 t 1 , , z p t p ) by P ( z t ) . Then, Pötscher (1991) [28] used the Generalized Law of Total Probability to prove that the cumulative distribution function (cdf) of w n is
F w n ( t ) = P [ n 1 / 2 ( β ^ V S β ) t ] = k = 1 J F w k n ( t ) π k n .
Hence, β ^ V S has a mixture distribution of the β ^ I k , 0 C with probabilities π k n , and w n has a mixture distribution of the w k n with probabilities π k n .
For the following Rathnayake and Olive (2023) [27] theorem, the first assumption is P ( S I m i n ) 1 as n . Then, the variable selection estimator corresponding to I m i n underfits with probability going to zero, and the assumption holds under regularity conditions, if BIC and AIC is used for many parametric regression models such as GLMs. See Charkhi and Claeskens (2018) [29] and Claeskens and Hjort (2008, pp. 70, 101, 102, 114, 232) [30]. This assumption is a necessary condition for a variable selection estimator to be a consistent estimator. See Zhao and Yu (2006) [31]. Thus, if a sparse estimator that performs variable selection is a consistent estimator of β , then P ( S I m i n ) 1 as n . Hence, Theorem 2 proves that the lasso variable selection estimator is a n consistent estimator of β if lasso is consistent. Charkhi and Claeskens (2018) [29] showed that w j n = n ( β ^ I j , 0 C β ) D w j if S I j for the maximum likelihood estimator with AIC, and gave a forward selection example. For a multiple linear regression model where S is the model with exactly one predictor that can be deleted, then only π S and π F are positive. If the C p criterion is used, then it can be shown that π S = P ( χ 1 2 < 2 ) = 0.8427 , and π F = 1 π S = 0.1573 . Theorem 2 proves that w is a mixture distribution of the w j with probabilities π j .
Theorem 2. 
Assume P ( S I m i n ) 1 as n , and let β ^ V S = β ^ I k , 0 with probabilities π k n , where π k n π k as n . Denote the positive π k by π j . Assume
w j n = n ( β ^ I j , 0 C β ) D w j . Then,
w n = n ( β ^ V S β ) D w
where the cdf of w is F w ( t ) = j π j F w j ( t ) .
Rathnayake and Olive (2023) [27] suggested the following bootstrap procedure. Use a bootstrap method for the full model, such as the nonparametric bootstrap or the residual bootstrap, and then compute the full model and the variable selection estimator from the bootstrap data set. Repeat this B times to obtain the bootstrap sample for the full model and for the variable selection model. They could only prove that the bootstrap procedure works under very strong regularity conditions such as a π i = 1 in Theorem 2, where π S = 1 is known as the oracle property. See Claeskens and Hjort (2008, pp. 101–114) [30] for references for the oracle property. For many statistics, a bootstrap data cloud T 1 * , , T B * and a data cloud from B iid statistics T 1 , , T B tend to have similar variability. Rathnayake and Olive (2023) [27] suggested that when T is the variable selection estimator β ^ V S , the bootstrap data cloud often has more variability than the iid data cloud, and that this result tends to increase the bootstrap confidence region coverage.
For variable selection with the p × 1 vector β ^ I m i n , 0 , consider testing H 0 : A β = θ 0 versus H 1 : A β θ 0 with θ = A β , where oftentimes, θ 0 = 0 . Then, let T n = A β ^ I m i n , 0 and let T i * = A β ^ I m i n , 0 , i * for i = 1 , , B . The shorth estimator can be applied to a bootstrap sample β ^ i 1 * , , β ^ i B * to obtain a confidence interval for β i . Here, T n = β ^ i and θ = β i . The simulations used θ = A β = β i , θ = A β = β S = 1 , and θ = A β = β E = 0 . Let the multiple linear regression model Y i = 1 + 1 x i , 2 + + 1 x i , k + 1 + e i for i = 1 , , n . Hence, β = ( 1 , , 1 , 0 , , 0 ) T with k + 1 ones and p k 1 zeros.
The regression models used the residual bootstrap with the forward selection estimator β ^ I m i n , 0 . Table 1 gives results for when the iid errors e i N ( 0 , 1 ) with n = 100 , p = 4 , and k = 1 . Table 1 shows two rows for each model giving the observed confidence interval coverages and average lengths of the confidence intervals. The nominal coverage was 95%. The term “reg" is for the full model regression, and the term “vs" is for forward selection. The last six columns give results for the tests. The terms pr, hyb, and br are for prediction region method (10), hybrid region (12), and Bickel and Ren region (11). The 0 indicates that the test was H 0 : β E = ( β 3 , β 4 ) T = 0 versus H 1 : β E 0 , while the 1 indicates that the test was H 0 : β S = ( β 1 , β 2 ) T = 1 versus H 1 : β S 1 . The length and coverage = P (fail to reject H 0 ) for the interval [ 0 , D ( U B ) ] or [ 0 , D ( U B , T ) ] , where D ( U B ) or D ( U B , T ) is the cutoff for the confidence region. The cutoff will often be near χ g , 0.95 2 if the statistic T is asymptotically normal. Note that χ 2 , 0.95 2 = 2.448 is close to 2.45 for the full model regression bootstrap tests. For the full model, n len 3.92 as n for the simulated data, and the shorth 95% confidence intervals have simulated length 0.398 3.92 / 10 = 0.392 . The variable selection estimator and the full model estimator were similar for β 1 , β 2 , and β S . The two estimators differed for β 3 , β 4 , and β E because β ^ i * = 0 often occurred for i = 3 and 4. In particular, the confidence interval coverages for the variable selection estimator were very high, but the average lengths were shorter than those for the full model. If x 3 was never selected, then β ^ 3 * 0 for all runs, and the confidence interval would be [0, 0] with 100% coverage and zero length.
Note that for the variable selection estimator with H 0 : β E = 0 , the average cutoff values were near 2.7 and 3.0, which are larger than the χ 2 2 cutoff 2.448. Hence, using the standard bootstrap confidence region (16) would result in undercoverage. For H 0 : β S = 1 , the bootstrap estimator often appeared to be approximately multivariate normal. Example 2 illustrates this result with a DD plot.
Example 2. 
We generated x i N 4 ( 0 , I ) and Y i = 1 + x i 1 + x i 2 + x i 3 + e i for i = 1 , , n = 1000 with the e i iid N ( 0 , 1 ) and β = β F = ( 1 , 1 , 1 , 1 , 0 ) T . Then, we examined several bootstrap methods for multiple linear regression variable selection. The nonparametric bootstrap draws n cases ( x j T , Y j ) T with replacement from the n original cases, and then selects variables on the resulting data set, resulting in β ^ I * . If β ^ I * is a × 1 , use zero padding to form the p = 5 × 1 vector β ^ I , 0 * = β ^ 1 * from β ^ I by adding 0s corresponding to the omitted variables. Repeat B = 1000 times to obtain the bootstrap sample β ^ 1 * , , β ^ B * . Typically, the full model I = F or the submodel I = S that omitted x i 4 was selected. The residual bootstrap using the full model residuals was also used, where Y i * = ( 1 x i T ) β ^ + r i * for i = 1 , , n where the r i * are sampled with replacement from the full model residuals r 1 , , r n . Forward selection and backward elimination could be used with the C p or BIC criterion, or lasso could be used to perform the variable selection. Let β ^ S : I , 0 * be obtained from β ^ I , 0 * by leaving out the fifth value. Hence, if β ^ I , 0 * = ( 0.9351 , 1.0252 , 0.9251 , 0.9542 , 0 ) T , then β ^ S : I , 0 * = ( 0.9351 , 1.0252 , 0.9251 , 0.9542 ) T . Figure 2 shows the DD plot for the confidence region corresponding to the β ^ S : I , 0 * using forward selection with the C p criterion. This confidence region corresponds to the test H 0 : β S = b = ( b 1 , b 2 , b 3 , b 4 ) T , e.g., b = 1 . Plots created with backward elimination and lasso were similar. Rathnayake and Olive (2023) [27] obtained the large-sample theory for the variable selection estimators β ^ V S = β ^ I , 0 for multiple linear regression and many other regression methods. The limiting distribution is a complicated non-normal mixture distribution by Theorem 2, but in simulations, where S is known, the β ^ S : I , 0 * often appeared to have an approximate multivariate normal distribution.
A small simulation study was conducted on large-sample 95% confidence regions. The coordinate-wise median was used since this statistic is moderately difficult to bootstrap. We used 5000 runs. Then, the coverage within [0.94, 0.96] suggests that the true coverage is near the nominal coverage 0.95. The simulation used 10 distributions, where xtype = 1 for N p ( 0 , I ) ; xtype = 2, 3, 4, and 5 for ( 1 δ ) N p ( 0 , I ) + δ N p ( 0 , 25 I ) ; xtype = 6, 7, 8, and 9 for a multivariate t d with d = 3, 5, 19, or d, given by the user; and xtype=10 for a log-normal distribution shifted to have the coordinate-wise median = 0. If w corresponds to one of the above distributions, then x = A w with A = d i a g ( 1 , 2 , , p ) . Then, the population coordinate-wise median is 0 for each distribution. Table 2 shows the coverages and average cutoff for four large-sample confidence regions: (10), (19), with n V = B = 1000 , (19) with n V = B = 49 , and (19) with n V = B = 99 . The coverage is the proportion of times that the confidence region contained θ = 0 , where θ is a p × 1 vector. Each confidence region has a cutoff, D = D 2 , that depends on the bootstrap sample, and the average of the 5000 cutoffs is given. Here, D 2 = D ( U B ) 2 for confidence region (10), while D 2 = D ( U V ) 2 for confidence region (19), where the cutoff also depends on n V . The coverages were usually between 0.94 and 0.96. The average cutoffs for the prediction region method’s large-sample 95% confidence region tended to be very close to the average cutoffs for confidence region (19) with n V = B = 1000 . Note that χ 2 , 0.95 2 = 2.4477 and χ 4 , 0.95 2 = 3.0802 are the cutoffs for the standard bootstrap confidence region (15). The ratio of volumes of the two confidence regions is volume (10)/volume (19) = ( D ( U B ) / D ( U V ) ) p .

4. Discussion

The bootstrap was due to Efron (1979) [32]. Also, see Efron (1982) [3] and Bickel and Freedman (1981) [33]. Ghosh and Polansky (2014) and Politis and Romano (1994) [34,35] are useful references for bootstrap confidence regions. For a small dimension p, nonparametric density estimation can be used to construct confidence regions and prediction regions. See, for example, Hall (1987) and Hyndman (1986) [36,37] Visualizing a bootstrap confidence region is useful for checking whether the asymptotic normal approximation for the statistic is good since the plotted points will then tend to cluster tightly about the identity line. Making five plots corresponding to five bootstrap samples can be used to check the variability of the plots and the probability of obtaining a bad sample. For Example 1, most of the bootstrap samples produced plots that had tighter clustering about the identity line than the clustering in Figure 1.
The new bootstrap confidence region (19) used the fact that bootstrap confidence region (10) is simultaneously a prediction region for a future bootstrap statistic T F * and a confidence region for θ with the same asymptotic coverage 1 δ . Hence, increasing the coverage as a prediction region also increases the coverage as a confidence region. The data splitting technique used to increase the coverage only depends on the T i * being iid with respect to the bootstrap distribution. Correction factor (8) increases the coverage, but this calibration technique needed intensive simulation.
Calibrating a bootstrap confidence region is useful for several reasons. For simulations, computation time can be reduced if B can be reduced. Using correction factor (8) is faster than using the two-sample bootstrap of Section 2.1, but the two-sample bootstrap can be used to check the accuracy of (8), as in Table 2 with n V = B . For a nominal 95% prediction region, correction factor (8) increases the coverage to at most 97.5% of the training data. Coverage for test data x f tends to be worse than coverage for training data. Using the cutoff D ( U B ) 2 of (8) gives better coverage than using cutoff D ( U ) 2 with U = B ( 1 δ ) . The two calibration methods in this paper were first applied to prediction regions, and work for bootstrap confidence regions (10) and (11) since those two regions are also prediction regions for T f * .
Plots and simulations were conducted in R. See R Core Team (2020) [38]. Welagedara (2023) [39] lists some R functions for bootstrapping several statistics. The programs used are in the collection of functions slpack.txt. See http://parker.ad.siu.edu/Olive/slpack.txt, accessed on 1 August 2024. The function ddplot4 applied to the bootstrap sample can be used to visualize the bootstrap prediction region method’s confidence region. The function medbootsim was used for Table 2. Some functions for bootstrapping multiple linear regression variable selection with the residual bootstrap are belimboot for backward elimination using C p , bicboot for forward selection using BIC, fselboot for forward selection using C p , lassoboot for lasso variable selection, and vselboot for all of the subsets’ variable selection with C p .

Author Contributions

Conceptualization, W.A.D.M.W. and D.J.O.; methodology, W.A.D.M.W. and D.J.O.; software D.J.O.; validation, W.A.D.M.W. and D.J.O.; formal analysis, W.A.D.M.W. and D.J.O.; investigation, W.A.D.M.W.; writing—original draft, W.A.D.M.W. and D.J.O.; writing—review & editing, W.A.D.M.W. and D.J.O.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

See slpack.txt for programs for simulating the data.

Acknowledgments

The authors thank the editors and referees for their work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Olive, D.J. Robust Multivariate Analysis; Springer: New York, NY, USA, 2017. [Google Scholar]
  2. Frey, J. Data-driven nonparametric prediction intervals. J. Stat. Plan. Inference 2013, 143, 1039–1048. [Google Scholar] [CrossRef]
  3. Efron, B. The Jackknife, the Bootstrap and Other Resampling Plans; SIAM: Philadelphia, PA, USA, 1982. [Google Scholar]
  4. Hall, P. Theoretical comparisons of bootstrap confidence intervals. Ann. Stat. 1988, 16, 927–985. [Google Scholar] [CrossRef]
  5. Pelawa Watagoda, L.C.R.; Olive, D.J. Bootstrapping multiple linear regression after variable selection. Stat. Pap. 2021, 62, 681–700. [Google Scholar] [CrossRef]
  6. Olive, D.J. Asymptotically optimal regression prediction intervals and prediction regions for multivariate data. Int. J. Stat. Probab. 2013, 2, 90–100. [Google Scholar] [CrossRef]
  7. Olive, D.J. Applications of hyperellipsoidal prediction regions. Stat. Pap. 2018, 59, 913–931. [Google Scholar] [CrossRef]
  8. Bickel, P.J.; Ren, J.J. The Bootstrap in hypothesis testing. In State of the Art in Probability and Statistics: Festschrift for William R. van Zwet; de Gunst, M., Klaassen, C., van der Vaart, A., Eds.; The Institute of Mathematical Statistics: Hayward, CA, USA, 2001; pp. 91–112. [Google Scholar]
  9. Rajapaksha, K.W.G.D.H.; Olive, D.J. Wald type tests with the wrong dispersion matrix. Commun. Stat.-Theory Methods 2024, 53, 2236–2251. [Google Scholar] [CrossRef]
  10. Chew, V. Confidence, prediction and tolerance regions for the multivariate normal distribution. J. Am. Stat. Assoc. 1966, 61, 605–617. [Google Scholar] [CrossRef]
  11. Efron, B. Estimation and accuracy after model selection. J. Am. Stat. Assoc. 2014, 109, 991–1007. [Google Scholar] [CrossRef] [PubMed]
  12. Haile, M.G.; Zhang, L.; Olive, D.J. Predicting random walks and a data splitting prediction region. Stats 2024, 7, 23–33. [Google Scholar] [CrossRef]
  13. Barndorff-Nielsen, O.E.; Cox, D.R. Prediction and asymptotics. Bernoulli 1996, 2, 319–340. [Google Scholar] [CrossRef]
  14. Beran, R. Calibrating prediction regions. J. Am. Stat. Assoc. 1990, 85, 715–723. [Google Scholar] [CrossRef]
  15. Fonseca, G.; Giummole, F.; Vidoni, P. A note about calibrated prediction regions and distributions. J. Stat. Plan. Inference 2012, 142, 2726–2734. [Google Scholar] [CrossRef]
  16. Hall, P.; Peng, L.; Tajvidi, N. On prediction intervals based on predictive likelihood or bootstrap methods. Biometrika 1999, 86, 871–880. [Google Scholar] [CrossRef]
  17. Hall, P.; Rieck, A. Improving coverage accuracy of nonparametric prediction intervals. J. R. Stat. Soc. B 2001, 63, 717–725. [Google Scholar] [CrossRef]
  18. Ueki, M.; Fueda, K. Adjusting estimative prediction limits. Biometrika 1996, 94, 509–511. [Google Scholar] [CrossRef]
  19. DiCiccio, T.J.; Efron, B. Bootstrap confidence intervals. Stat. Sci. 1996, 11, 189–228. [Google Scholar] [CrossRef]
  20. Loh, W.Y. Calibrating confidence coefficients. J. Am. Stat. Assoc. 1987, 82, 155–162. [Google Scholar] [CrossRef]
  21. Loh, W.Y. Bootstrap calibration for confidence interval construction and selection. Stat. Sin. 1991, 1, 477–491. [Google Scholar]
  22. Rousseeuw, P.J.; Van Driessen, K. A fast algorithm for the minimum covariance determinant estimator. Technometrics 1999, 41, 212–223. [Google Scholar] [CrossRef]
  23. Kabaila, P. The coverage properties of confidence regions after model selection. Int. Stat. Rev. 2009, 77, 405–414. [Google Scholar] [CrossRef]
  24. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  25. Mallows, C. Some comments on Cp. Technometrics 1973, 15, 661–676. [Google Scholar]
  26. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  27. Rathnayake, R.C.; Olive, D.J. Bootstrapping some GLMs and survival regression models after variable selection. Commun. Stat.-Theory Methods 2023, 52, 2625–2645. [Google Scholar] [CrossRef]
  28. Pötscher, B. Effects of model selection on inference. Econom. Theory 1991, 7, 163–185. [Google Scholar] [CrossRef]
  29. Charkhi, A.; Claeskens, G. Asymptotic post-selection inference for the Akaike information criterion. Biometrika 2018, 105, 645–664. [Google Scholar] [CrossRef]
  30. Claeskens, G.; Hjort, N.L. Model Selection and Model Averaging; Cambridge University Press: New York, NY, USA, 2008. [Google Scholar]
  31. Zhao, P.; Yu, B. On model selection consistency of lasso. J. Mach. Learn. Res. 2006, 7, 2541–2563. [Google Scholar]
  32. Efron, B. Bootstrap methods, another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
  33. Bickel, P.J.; Freedman, D.A. Some asymptotic theory for the bootstrap. Ann. Stat. 1981, 9, 1196–1217. [Google Scholar] [CrossRef]
  34. Ghosh, S.; Polansky, A.M. Smoothed and iterated bootstrap confidence regions for parameter vectors. J. Mult. Anal. 2014, 132, 171–182. [Google Scholar] [CrossRef]
  35. Politis, D.N.; Romano, J.P. Large sample confidence regions based on subsamples under minimal assumptions. Ann. Stat. 1994, 22, 2031–2050. [Google Scholar] [CrossRef]
  36. Hall, P. On the bootstrap and likelihood-based confidence regions. Biometrika 1987, 74, 481–493. [Google Scholar] [CrossRef]
  37. Hyndman, R.J. Computing and graphing highest density regions. Am. Stat. 1996, 50, 120–126. [Google Scholar] [CrossRef]
  38. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020; Available online: www.r-project.org (accessed on 1 August 2024).
  39. Welagedara, W.A.D.M. Model Selection, Data Splitting for ARMA Time Series, and Visualizing Some Bootstrap Confidence Regions. Ph.D. Thesis, Southern Illinois University, Carbondale, IL, USA, 2023. Available online: http://parker.ad.siu.edu/Olive/swelagedara.pdf (accessed on 1 August 2024).
Figure 1. Visualizing the confidence region with a DD plot.
Figure 1. Visualizing the confidence region with a DD plot.
Axioms 13 00659 g001
Figure 2. Visualizing the forward selection confidence region for β S .
Figure 2. Visualizing the forward selection confidence region for β S .
Axioms 13 00659 g002
Table 1. Bootstrapping OLS forward selection with C p , e i N ( 0 , 1 ) .
Table 1. Bootstrapping OLS forward selection with C p , e i N ( 0 , 1 ) .
β 1 β 2 β p 1 β p pr0hyb0br0pr1hyb1br1
reg0.9460.9500.9470.9480.9400.9410.9410.9370.9360.937
len0.3960.3990.3990.3982.4512.4512.4522.4502.4502.451
vs0.9480.9500.9970.9960.9910.9790.9910.9380.9390.940
len0.3950.3980.3230.3232.6992.6993.0022.4502.4502.457
Table 2. Coverages and average cutoffs for some large-sample 95% confidence regions, B = 1000.
Table 2. Coverages and average cutoffs for some large-sample 95% confidence regions, B = 1000.
npDistCR (10)(19), n V = 1000 (19), n V = 49 (19), n V = 99
1002N(0.9430,2.4931)(0.9450,2.5015)(0.9536,2.7127)(0.9452,2.5351)
1002LN(0.9494,2.5025)(0.9488,2.5088)(0.9598,2.7401)(0.9500,2.5539)
1004N(0.9386,3.1738)(0.9384,3.1795)(0.9522,3.3922)(0.9384,3.2177)
1004LN(0.9456,3.2012)(0.9466,3.2046)(0.9598,3.4512)(0.9468,3.2543)
2004N(0.9476,3.1489)(0.9480,3.1575)(0.9590,3.3510)(0.9490,3.1948)
2004LN(0.9432,3.1673)(0.9440,3.1700)(0.9554,3.3861)(0.9440,3.2065)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Welagedara, W.A.D.M.; Olive, D.J. Calibrating and Visualizing Some Bootstrap Confidence Regions. Axioms 2024, 13, 659. https://doi.org/10.3390/axioms13100659

AMA Style

Welagedara WADM, Olive DJ. Calibrating and Visualizing Some Bootstrap Confidence Regions. Axioms. 2024; 13(10):659. https://doi.org/10.3390/axioms13100659

Chicago/Turabian Style

Welagedara, Welagedara Arachchilage Dhanushka M., and David J. Olive. 2024. "Calibrating and Visualizing Some Bootstrap Confidence Regions" Axioms 13, no. 10: 659. https://doi.org/10.3390/axioms13100659

APA Style

Welagedara, W. A. D. M., & Olive, D. J. (2024). Calibrating and Visualizing Some Bootstrap Confidence Regions. Axioms, 13(10), 659. https://doi.org/10.3390/axioms13100659

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop