Re-sampling Methods for Statistical Inference of the 2020s

A special issue of Stats (ISSN 2571-905X).

Deadline for manuscript submissions: closed (31 March 2022) | Viewed by 43554

Special Issue Editor


E-Mail Website
Guest Editor
Department of Statistics, Università degli Studi di Milano‐Bicocca, Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy
Interests: sampling statistics; bootstrap and stochastic simulations; gender statistics; statistical inference

Special Issue Information

Dear Colleagues,

Re-sampling methods are previous century fellows, with the Bootstrap, no doubt, being the most popular among them, now well into its forties. Following the rocketing evolution of technology, computational power and data sharing, statistical inference as we knew it at the beginning of the twenty-first century, has extraordinary expanded its scope and applications, and the global Covid-19 crisis appears to have even accelerated the process. With this Special Issue I am soliciting contributions, advancements and critical reviews on Bootstrap and re-sampling methods that address the statistical needs of the 2020s and envision future research directions. Manuscripts covering, though not limited to, topics in data science, statistical learning, statistical modelling, epidemiology, observational studies, circular economy, sustainable development and inequalities are particularly welcome.

I look forward to receiving your submissions.

Sincerely,

Prof. Dr. Fulvia Mecatti
Guest Editor

Message from Prof. Bradley Efron:

This is a propitious moment for Stats' special issue on resampling methods. Data sets are bigger than ever, computation is faster and cheaper than ever, and the demand for statistical analysis seems to multiply every year. Computer-intensive statistical methods — the substitution of computational power for routine and tedious paper and pencil calculations — is a growth industry in the current scientific environment, particularly as the complexity of our estimators and tests have outpaced theory.

Resampling plans were the original computer-intensive statistical methodology  (so named in Efron and Diaconis' 1983 Scientific American article.) Their widespread adaptation encouraged other computer-based success stories, Markov Chain Monte Carlo being particularly notable. The term "resampling", in its current sense, seems to have been introduced in the title of my 1982 monograph "The jackknife, the bootstrap, and other resampling plans". Besides the jackknife and the bootstrap, several older resampling methods were discussed there: cross-validation, half-sampling, typical value theory, the infinitesimal jackknife, and balanced repeated replications.

The resampling story of the last forty years has been one of new uses more than new methods. The original, modest, goal of computationally attaching standard errors to statistical estimators was expanded to bootstrap confidence intervals (paralleling an ambitious theoretical development of likelihood based intervals.) Bootstrap smoothing ,aka "bagging" or "bootstrap aggregation", aimed at improving unsmooth estimators such as those obtained from model selection, and became central to Leo Breiman's popular machine learning package "random forests". Massive prediction algorithms, especially "deep learning", required new cross-validation techniques carried out at Herculean scales. As we will see in this volume, applications of resampling have spread to an enormous variety of scientific studies, generating a diversity of specialized techniques as well as an improved theoretical understanding of how the methods perform.

To say that this is a good time for publishing a resampling issue doesn't mean it's easy to do so. I'm grateful to Professor Fulvia Mecatti for conceiving, organizing, and carrying out the task so successfully.

Bradley Efron

Stanford

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • estimation
  • big data
  • public health
  • data integration
  • sampling statistics
  • statistical assessment
  • model selection
  • empirical bayes
  • population studies

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:
16 pages, 396 KiB  
Article
Some Empirical Results on Nearest-Neighbour Pseudo-populations for Resampling from Spatial Populations
by Sara Franceschi, Rosa Maria Di Biase, Agnese Marcelli and Lorenzo Fattorini
Stats 2022, 5(2), 385-400; https://doi.org/10.3390/stats5020022 - 15 Apr 2022
Cited by 3 | Viewed by 2276
Abstract
In finite populations, pseudo-population bootstrap is the sole method preserving the spirit of the original bootstrap performed from iid observations. In spatial sampling, theoretical results about the convergence of bootstrap distributions to the actual distributions of estimators are lacking, owing to the failure [...] Read more.
In finite populations, pseudo-population bootstrap is the sole method preserving the spirit of the original bootstrap performed from iid observations. In spatial sampling, theoretical results about the convergence of bootstrap distributions to the actual distributions of estimators are lacking, owing to the failure of spatially balanced sampling designs to converge to the maximum entropy design. In addition, the issue of creating pseudo-populations able to mimic the characteristics of real populations is challenging in spatial frameworks where spatial trends, relationships, and similarities among neighbouring locations are invariably present. In this paper, we propose the use of the nearest-neighbour interpolation of spatial populations for constructing pseudo-populations that converge to real populations under mild conditions. The effectiveness of these proposals with respect to traditional pseudo-populations is empirically checked by a simulation study. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

18 pages, 705 KiB  
Article
Bootstrap Assessment of Crop Area Estimates Using Satellite Pixels Counting
by Cristiano Ferraz, Jacques Delincé, André Leite and Raydonal Ospina
Stats 2022, 5(2), 422-439; https://doi.org/10.3390/stats5020025 - 25 Apr 2022
Viewed by 2061
Abstract
Crop area estimates based on counting pixels over classified satellite images are a promising application of remote sensing to agriculture. However, such area estimates are biased, and their variance is a function of the error rates of the classification rule. To redress the [...] Read more.
Crop area estimates based on counting pixels over classified satellite images are a promising application of remote sensing to agriculture. However, such area estimates are biased, and their variance is a function of the error rates of the classification rule. To redress the bias, estimators (direct and inverse) relying on the so-called confusion matrix have been proposed, but analytic estimators for variances can be tricky to derive. This article proposes a bootstrap method for assessing statistical properties of such estimators based on information from a sample confusion matrix. The proposed method can be applied to any other type of estimator that is built upon confusion matrix information. The resampling procedure is illustrated in a small study to assess the biases and variances of estimates using purely pixel counting and estimates provided by both direct and inverse estimators. The method has the advantage of being simple to implement even when the sample confusion matrix is generated under unequal probability sample design. The results show the limitations of estimates based solely on pixel counting as well as respective advantages and drawbacks of the direct and inverse estimators with respect to their feasibility, unbiasedness, and variance. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

18 pages, 428 KiB  
Article
Opening the Black Box: Bootstrapping Sensitivity Measures in Neural Networks for Interpretable Machine Learning
by Michele La Rocca and Cira Perna
Stats 2022, 5(2), 440-457; https://doi.org/10.3390/stats5020026 - 25 Apr 2022
Cited by 3 | Viewed by 2464
Abstract
Artificial neural networks are powerful tools for data analysis, particularly in the context of highly nonlinear regression models. However, their utility is critically limited due to the lack of interpretation of the model given its black-box nature. To partially address the problem, the [...] Read more.
Artificial neural networks are powerful tools for data analysis, particularly in the context of highly nonlinear regression models. However, their utility is critically limited due to the lack of interpretation of the model given its black-box nature. To partially address the problem, the paper focuses on the important problem of feature selection. It proposes and discusses a statistical test procedure for selecting a set of input variables that are relevant to the model while taking into account the multiple testing nature of the problem. The approach is within the general framework of sensitivity analysis and uses the conditional expectation of functions of the partial derivatives of the output with respect to the inputs as a sensitivity measure. The proposed procedure extensively uses the bootstrap to approximate the test statistic distribution under the null while controlling the familywise error rate to correct for data snooping arising from multiple testing. In particular, a pair bootstrap scheme was implemented in order to obtain consistent results when using misspecified statistical models, a typical characteristic of neural networks. Numerical examples and a Monte Carlo simulation were carried out to verify the ability of the proposed test procedure to correctly identify the set of relevant features. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

17 pages, 322 KiB  
Article
A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs
by Sixia Chen, David Haziza and Zeinab Mashreghi
Stats 2022, 5(2), 521-537; https://doi.org/10.3390/stats5020031 - 6 Jun 2022
Cited by 3 | Viewed by 2277
Abstract
Multi-stage sampling designs are often used in household surveys because a sampling frame of elements may not be available or for cost considerations when data collection involves face-to-face interviews. In this context, variance estimation is a complex task as it relies on the [...] Read more.
Multi-stage sampling designs are often used in household surveys because a sampling frame of elements may not be available or for cost considerations when data collection involves face-to-face interviews. In this context, variance estimation is a complex task as it relies on the availability of second-order inclusion probabilities at each stage. To cope with this issue, several bootstrap algorithms have been proposed in the literature in the context of a two-stage sampling design. In this paper, we describe some of these algorithms and compare them empirically in terms of bias, stability, and coverage probability. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
11 pages, 292 KiB  
Article
Bayesian Bootstrap in Multiple Frames
by Daniela Cocchi, Lorenzo Marchi and Riccardo Ievoli
Stats 2022, 5(2), 561-571; https://doi.org/10.3390/stats5020034 - 15 Jun 2022
Cited by 3 | Viewed by 2135
Abstract
Multiple frames are becoming increasingly relevant due to the spread of surveys conducted via registers. In this regard, estimators of population quantities have been proposed, including the multiplicity estimator. In all cases, variance estimation still remains a matter of debate. This paper explores [...] Read more.
Multiple frames are becoming increasingly relevant due to the spread of surveys conducted via registers. In this regard, estimators of population quantities have been proposed, including the multiplicity estimator. In all cases, variance estimation still remains a matter of debate. This paper explores the potential of Bayesian bootstrap techniques for computing such estimators. The suitability of the method, which is compared to the existing frequentist bootstrap, is shown by conducting a small-scale simulation study and a case study. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

11 pages, 274 KiB  
Article
A Multi-Aspect Permutation Test for Goodness-of-Fit Problems
by Rosa Arboretti, Elena Barzizza, Nicolò Biasetton, Riccardo Ceccato, Livio Corain and Luigi Salmaso
Stats 2022, 5(2), 572-582; https://doi.org/10.3390/stats5020035 - 17 Jun 2022
Cited by 3 | Viewed by 1821
Abstract
Parametric techniques commonly rely on specific distributional assumptions. It is therefore fundamental to preliminarily identify the eventual violations of such assumptions. Therefore, appropriate testing procedures are required for this purpose to deal with a the goodness-of-fit (GoF) problem. This task can be quite [...] Read more.
Parametric techniques commonly rely on specific distributional assumptions. It is therefore fundamental to preliminarily identify the eventual violations of such assumptions. Therefore, appropriate testing procedures are required for this purpose to deal with a the goodness-of-fit (GoF) problem. This task can be quite challenging, especially with small sample sizes and multivariate data. Previous studiesshowed how a GoF problem can be easily represented through a traditional two-sample system of hypotheses. Following this idea, in this paper, we propose a multi-aspect permutation-based test to deal with the multivariate goodness-of-fit, taking advantage of the nonparametric combination (NPC) methodology. A simulation study is then conducted to evaluate the performance of our proposal and to identify the eventual critical scenarios. Finally, a real data application is considered. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
25 pages, 467 KiB  
Article
Resampling Plans and the Estimation of Prediction Error
by Bradley Efron
Stats 2021, 4(4), 1091-1115; https://doi.org/10.3390/stats4040063 - 20 Dec 2021
Cited by 5 | Viewed by 4801
Abstract
This article was prepared for the Special Issue on Resampling methods for statistical inference of the 2020s. Modern algorithms such as random forests and deep learning are automatic machines for producing prediction rules from training data. Resampling plans have been the key [...] Read more.
This article was prepared for the Special Issue on Resampling methods for statistical inference of the 2020s. Modern algorithms such as random forests and deep learning are automatic machines for producing prediction rules from training data. Resampling plans have been the key technology for evaluating a rule’s prediction accuracy. After a careful description of the measurement of prediction error the article discusses the advantages and disadvantages of the principal methods: cross-validation, the nonparametric bootstrap, covariance penalties (Mallows’ Cp and the Akaike Information Criterion), and conformal inference. The emphasis is on a broad overview of a large subject, featuring examples, simulations, and a minimum of technical detail. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

13 pages, 360 KiB  
Article
Conditional Inference in Small Sample Scenarios Using a Resampling Approach
by Clemens Draxler and Andreas Kurz
Stats 2021, 4(4), 837-849; https://doi.org/10.3390/stats4040049 - 15 Oct 2021
Cited by 2 | Viewed by 2504
Abstract
This paper discusses a non-parametric resampling technique in the context of multidimensional or multiparameter hypothesis testing of assumptions of the Rasch model. It is based on conditional distributions and it is suggested in small sample size scenarios as an alternative to the application [...] Read more.
This paper discusses a non-parametric resampling technique in the context of multidimensional or multiparameter hypothesis testing of assumptions of the Rasch model. It is based on conditional distributions and it is suggested in small sample size scenarios as an alternative to the application of asymptotic or large sample theory. The exact sampling distribution of various well-known chi-square test statistics like Wald, likelihood ratio, score, and gradient tests as well as others can be arbitrarily well approximated in this way. A procedure to compute the power function of the tests is also presented. A number of examples of scenarios are discussed in which the power function of the test does not converge to 1 with an increasing deviation of the true values of the parameters of interest from the values specified in the hypothesis to be tested. Finally, an attempt to modify the critical region of the tests is made aiming at improving the power and an R package is provided. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

25 pages, 442 KiB  
Article
The One Standard Error Rule for Model Selection: Does It Work?
by Yuchen Chen and Yuhong Yang
Stats 2021, 4(4), 868-892; https://doi.org/10.3390/stats4040051 - 5 Nov 2021
Cited by 18 | Viewed by 14059
Abstract
Previous research provided a lot of discussion on the selection of regularization parameters when it comes to the application of regularization methods for high-dimensional regression. The popular “One Standard Error Rule” (1se rule) used with cross validation (CV) is to select the most [...] Read more.
Previous research provided a lot of discussion on the selection of regularization parameters when it comes to the application of regularization methods for high-dimensional regression. The popular “One Standard Error Rule” (1se rule) used with cross validation (CV) is to select the most parsimonious model whose prediction error is not much worse than the minimum CV error. This paper examines the validity of the 1se rule from a theoretical angle and also studies its estimation accuracy and performances in applications of regression estimation and variable selection, particularly for Lasso in a regression framework. Our theoretical result shows that when a regression procedure produces the regression estimator converging relatively fast to the true regression function, the standard error estimation formula in the 1se rule is justified asymptotically. The numerical results show the following: 1. the 1se rule in general does not necessarily provide a good estimation for the intended standard deviation of the cross validation error. The estimation bias can be 50–100% upwards or downwards in various situations; 2. the results tend to support that 1se rule usually outperforms the regular CV in sparse variable selection and alleviates the over-selection tendency of Lasso; 3. in regression estimation or prediction, the 1se rule often performs worse. In addition, comparisons are made over two real data sets: Boston Housing Prices (large sample size n, small/moderate number of variables p) and Bardet–Biedl data (large p, small n). Data guided simulations are done to provide insight on the relative performances of the 1se rule and the regular CV. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

19 pages, 369 KiB  
Article
A Bootstrap Variance Estimation Method for Multistage Sampling and Two-Phase Sampling When Poisson Sampling Is Used at the Second Phase
by Jean-François Beaumont and Nelson Émond
Stats 2022, 5(2), 339-357; https://doi.org/10.3390/stats5020019 - 22 Mar 2022
Cited by 9 | Viewed by 5164
Abstract
The bootstrap method is often used for variance estimation in sample surveys with a stratified multistage sampling design. It is typically implemented by producing a set of bootstrap weights that is made available to users and that accounts for the complexity of the [...] Read more.
The bootstrap method is often used for variance estimation in sample surveys with a stratified multistage sampling design. It is typically implemented by producing a set of bootstrap weights that is made available to users and that accounts for the complexity of the sampling design. The Rao–Wu–Yue method is often used to produce the required bootstrap weights. It is valid under stratified with-replacement sampling at the first stage or fixed-size without-replacement sampling provided the first-stage sampling fractions are negligible. Some surveys use designs that do not satisfy these conditions. We propose a simple and unified bootstrap method that addresses this limitation of the Rao–Wu–Yue bootstrap weights. This method is applicable to any multistage sampling design as long as valid bootstrap weights can be produced for each distinct stage of sampling. Our method is also applicable to two-phase sampling designs provided that Poisson sampling is used at the second phase. We use this design to model survey nonresponse and derive bootstrap weights that account for nonresponse weighting. The properties of our bootstrap method are evaluated in three limited simulation studies. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
12 pages, 293 KiB  
Review
Resampling under Complex Sampling Designs: Roots, Development and the Way Forward
by Pier Luigi Conti and Fulvia Mecatti
Stats 2022, 5(1), 258-269; https://doi.org/10.3390/stats5010016 - 8 Mar 2022
Cited by 1 | Viewed by 2422
Abstract
In the present paper, resampling for finite populations under an iid sampling design is reviewed. Our attention is mainly focused on pseudo-population-based resampling due to its properties. A principled appraisal of the main theoretical foundations and results is given and discussed, together with [...] Read more.
In the present paper, resampling for finite populations under an iid sampling design is reviewed. Our attention is mainly focused on pseudo-population-based resampling due to its properties. A principled appraisal of the main theoretical foundations and results is given and discussed, together with important computational aspects. Finally, a discussion on open problems and research perspectives is provided. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Back to TopTop