Submit to Stats Review for Stats Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Re-sampling Methods for Statistical Inference of the 2020s

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Stats (ISSN 2571-905X).

Deadline for manuscript submissions: closed (31 March 2022) | Viewed by 46884

Share This Special Issue

Special Issue Editor

Prof. Dr. Fulvia Mecatti

E-Mail Website
Guest Editor

Department of Statistics, Università degli Studi di Milano‐Bicocca, Via Bicocca degli Arcimboldi 8, 20126 Milano, Italy
Interests: sampling statistics; bootstrap and stochastic simulations; gender statistics; statistical inference

Special Issue Information

Dear Colleagues,

Re-sampling methods are previous century fellows, with the Bootstrap, no doubt, being the most popular among them, now well into its forties. Following the rocketing evolution of technology, computational power and data sharing, statistical inference as we knew it at the beginning of the twenty-first century, has extraordinary expanded its scope and applications, and the global Covid-19 crisis appears to have even accelerated the process. With this Special Issue I am soliciting contributions, advancements and critical reviews on Bootstrap and re-sampling methods that address the statistical needs of the 2020s and envision future research directions. Manuscripts covering, though not limited to, topics in data science, statistical learning, statistical modelling, epidemiology, observational studies, circular economy, sustainable development and inequalities are particularly welcome.

I look forward to receiving your submissions.

Sincerely,

Prof. Dr. Fulvia Mecatti
Guest Editor

Message from Prof. Bradley Efron:

This is a propitious moment for Stats' special issue on resampling methods. Data sets are bigger than ever, computation is faster and cheaper than ever, and the demand for statistical analysis seems to multiply every year. Computer-intensive statistical methods — the substitution of computational power for routine and tedious paper and pencil calculations — is a growth industry in the current scientific environment, particularly as the complexity of our estimators and tests have outpaced theory.

Resampling plans were the original computer-intensive statistical methodology (so named in Efron and Diaconis' 1983 Scientific American article.) Their widespread adaptation encouraged other computer-based success stories, Markov Chain Monte Carlo being particularly notable. The term "resampling", in its current sense, seems to have been introduced in the title of my 1982 monograph "The jackknife, the bootstrap, and other resampling plans". Besides the jackknife and the bootstrap, several older resampling methods were discussed there: cross-validation, half-sampling, typical value theory, the infinitesimal jackknife, and balanced repeated replications.

The resampling story of the last forty years has been one of new uses more than new methods. The original, modest, goal of computationally attaching standard errors to statistical estimators was expanded to bootstrap confidence intervals (paralleling an ambitious theoretical development of likelihood based intervals.) Bootstrap smoothing ,aka "bagging" or "bootstrap aggregation", aimed at improving unsmooth estimators such as those obtained from model selection, and became central to Leo Breiman's popular machine learning package "random forests". Massive prediction algorithms, especially "deep learning", required new cross-validation techniques carried out at Herculean scales. As we will see in this volume, applications of resampling have spread to an enormous variety of scientific studies, generating a diversity of specialized techniques as well as an improved theoretical understanding of how the methods perform.

To say that this is a good time for publishing a resampling issue doesn't mean it's easy to do so. I'm grateful to Professor Fulvia Mecatti for conceiving, organizing, and carrying out the task so successfully.

Bradley Efron

Stanford

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

estimation
big data
public health
data integration
sampling statistics
statistical assessment
model selection
empirical bayes
population studies

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (11 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

16 pages, 396 KiB

Open AccessFeature PaperArticle

Some Empirical Results on Nearest-Neighbour Pseudo-populations for Resampling from Spatial Populations

by Sara Franceschi, Rosa Maria Di Biase, Agnese Marcelli and Lorenzo Fattorini

Stats 2022, 5(2), 385-400; https://doi.org/10.3390/stats5020022 - 15 Apr 2022

Cited by 4 | Viewed by 2386

Abstract

In finite populations, pseudo-population bootstrap is the sole method preserving the spirit of the original bootstrap performed from iid observations. In spatial sampling, theoretical results about the convergence of bootstrap distributions to the actual distributions of estimators are lacking, owing to the failure of spatially balanced sampling designs to converge to the maximum entropy design. In addition, the issue of creating pseudo-populations able to mimic the characteristics of real populations is challenging in spatial frameworks where spatial trends, relationships, and similarities among neighbouring locations are invariably present. In this paper, we propose the use of the nearest-neighbour interpolation of spatial populations for constructing pseudo-populations that converge to real populations under mild conditions. The effectiveness of these proposals with respect to traditional pseudo-populations is empirically checked by a simulation study. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

► Show Figures

Figure 1

18 pages, 705 KiB

Open AccessArticle

Bootstrap Assessment of Crop Area Estimates Using Satellite Pixels Counting

by Cristiano Ferraz, Jacques Delincé, André Leite and Raydonal Ospina

Stats 2022, 5(2), 422-439; https://doi.org/10.3390/stats5020025 - 25 Apr 2022

Viewed by 2179

Abstract

Crop area estimates based on counting pixels over classified satellite images are a promising application of remote sensing to agriculture. However, such area estimates are biased, and their variance is a function of the error rates of the classification rule. To redress the bias, estimators (direct and inverse) relying on the so-called confusion matrix have been proposed, but analytic estimators for variances can be tricky to derive. This article proposes a bootstrap method for assessing statistical properties of such estimators based on information from a sample confusion matrix. The proposed method can be applied to any other type of estimator that is built upon confusion matrix information. The resampling procedure is illustrated in a small study to assess the biases and variances of estimates using purely pixel counting and estimates provided by both direct and inverse estimators. The method has the advantage of being simple to implement even when the sample confusion matrix is generated under unequal probability sample design. The results show the limitations of estimates based solely on pixel counting as well as respective advantages and drawbacks of the direct and inverse estimators with respect to their feasibility, unbiasedness, and variance. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

► Show Figures

Figure 1

18 pages, 428 KiB

Open AccessArticle

Opening the Black Box: Bootstrapping Sensitivity Measures in Neural Networks for Interpretable Machine Learning

by Michele La Rocca and Cira Perna

Stats 2022, 5(2), 440-457; https://doi.org/10.3390/stats5020026 - 25 Apr 2022

Cited by 3 | Viewed by 2736

Abstract

Artificial neural networks are powerful tools for data analysis, particularly in the context of highly nonlinear regression models. However, their utility is critically limited due to the lack of interpretation of the model given its black-box nature. To partially address the problem, the paper focuses on the important problem of feature selection. It proposes and discusses a statistical test procedure for selecting a set of input variables that are relevant to the model while taking into account the multiple testing nature of the problem. The approach is within the general framework of sensitivity analysis and uses the conditional expectation of functions of the partial derivatives of the output with respect to the inputs as a sensitivity measure. The proposed procedure extensively uses the bootstrap to approximate the test statistic distribution under the null while controlling the familywise error rate to correct for data snooping arising from multiple testing. In particular, a pair bootstrap scheme was implemented in order to obtain consistent results when using misspecified statistical models, a typical characteristic of neural networks. Numerical examples and a Monte Carlo simulation were carried out to verify the ability of the proposed test procedure to correctly identify the set of relevant features. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

► Show Figures

Figure 1

17 pages, 322 KiB

Open AccessFeature PaperArticle

A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs

by Sixia Chen, David Haziza and Zeinab Mashreghi

Stats 2022, 5(2), 521-537; https://doi.org/10.3390/stats5020031 - 6 Jun 2022

Cited by 4 | Viewed by 2497

Abstract

Multi-stage sampling designs are often used in household surveys because a sampling frame of elements may not be available or for cost considerations when data collection involves face-to-face interviews. In this context, variance estimation is a complex task as it relies on the availability of second-order inclusion probabilities at each stage. To cope with this issue, several bootstrap algorithms have been proposed in the literature in the context of a two-stage sampling design. In this paper, we describe some of these algorithms and compare them empirically in terms of bias, stability, and coverage probability. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

11 pages, 292 KiB

Open AccessArticle

Bayesian Bootstrap in Multiple Frames

by Daniela Cocchi, Lorenzo Marchi and Riccardo Ievoli

Stats 2022, 5(2), 561-571; https://doi.org/10.3390/stats5020034 - 15 Jun 2022

Cited by 3 | Viewed by 2264

Abstract

Multiple frames are becoming increasingly relevant due to the spread of surveys conducted via registers. In this regard, estimators of population quantities have been proposed, including the multiplicity estimator. In all cases, variance estimation still remains a matter of debate. This paper explores the potential of Bayesian bootstrap techniques for computing such estimators. The suitability of the method, which is compared to the existing frequentist bootstrap, is shown by conducting a small-scale simulation study and a case study. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

► Show Figures

Figure 1

11 pages, 274 KiB

Open AccessFeature PaperArticle

A Multi-Aspect Permutation Test for Goodness-of-Fit Problems

by Rosa Arboretti, Elena Barzizza, Nicolò Biasetton, Riccardo Ceccato, Livio Corain and Luigi Salmaso

Stats 2022, 5(2), 572-582; https://doi.org/10.3390/stats5020035 - 17 Jun 2022

Cited by 3 | Viewed by 1921

Abstract

Parametric techniques commonly rely on specific distributional assumptions. It is therefore fundamental to preliminarily identify the eventual violations of such assumptions. Therefore, appropriate testing procedures are required for this purpose to deal with a the goodness-of-fit (GoF) problem. This task can be quite challenging, especially with small sample sizes and multivariate data. Previous studiesshowed how a GoF problem can be easily represented through a traditional two-sample system of hypotheses. Following this idea, in this paper, we propose a multi-aspect permutation-based test to deal with the multivariate goodness-of-fit, taking advantage of the nonparametric combination (NPC) methodology. A simulation study is then conducted to evaluate the performance of our proposal and to identify the eventual critical scenarios. Finally, a real data application is considered. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

25 pages, 467 KiB

Open AccessFeature PaperEditor’s ChoiceArticle

Resampling Plans and the Estimation of Prediction Error

by Bradley Efron

Stats 2021, 4(4), 1091-1115; https://doi.org/10.3390/stats4040063 - 20 Dec 2021

Cited by 5 | Viewed by 5018

Abstract

This article was prepared for the Special Issue on Resampling methods for statistical inference of the 2020s. Modern algorithms such as random forests and deep learning are automatic machines for producing prediction rules from training data. Resampling plans have been the key technology for evaluating a rule’s prediction accuracy. After a careful description of the measurement of prediction error the article discusses the advantages and disadvantages of the principal methods: cross-validation, the nonparametric bootstrap, covariance penalties (Mallows’

C_{p}

and the Akaike Information Criterion), and conformal inference. The emphasis is on a broad overview of a large subject, featuring examples, simulations, and a minimum of technical detail. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

► Show Figures

Figure 1

13 pages, 360 KiB

Open AccessArticle

Conditional Inference in Small Sample Scenarios Using a Resampling Approach

by Clemens Draxler and Andreas Kurz

Stats 2021, 4(4), 837-849; https://doi.org/10.3390/stats4040049 - 15 Oct 2021

Cited by 2 | Viewed by 2612

Abstract

This paper discusses a non-parametric resampling technique in the context of multidimensional or multiparameter hypothesis testing of assumptions of the Rasch model. It is based on conditional distributions and it is suggested in small sample size scenarios as an alternative to the application of asymptotic or large sample theory. The exact sampling distribution of various well-known chi-square test statistics like Wald, likelihood ratio, score, and gradient tests as well as others can be arbitrarily well approximated in this way. A procedure to compute the power function of the tests is also presented. A number of examples of scenarios are discussed in which the power function of the test does not converge to 1 with an increasing deviation of the true values of the parameters of interest from the values specified in the hypothesis to be tested. Finally, an attempt to modify the critical region of the tests is made aiming at improving the power and an R package is provided. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

► Show Figures

Figure 1

25 pages, 442 KiB

Open AccessArticle

The One Standard Error Rule for Model Selection: Does It Work?

by Yuchen Chen and Yuhong Yang

Stats 2021, 4(4), 868-892; https://doi.org/10.3390/stats4040051 - 5 Nov 2021

Cited by 20 | Viewed by 15360

Abstract

Previous research provided a lot of discussion on the selection of regularization parameters when it comes to the application of regularization methods for high-dimensional regression. The popular “One Standard Error Rule” (1se rule) used with cross validation (CV) is to select the most parsimonious model whose prediction error is not much worse than the minimum CV error. This paper examines the validity of the 1se rule from a theoretical angle and also studies its estimation accuracy and performances in applications of regression estimation and variable selection, particularly for Lasso in a regression framework. Our theoretical result shows that when a regression procedure produces the regression estimator converging relatively fast to the true regression function, the standard error estimation formula in the 1se rule is justified asymptotically. The numerical results show the following: 1. the 1se rule in general does not necessarily provide a good estimation for the intended standard deviation of the cross validation error. The estimation bias can be 50–100% upwards or downwards in various situations; 2. the results tend to support that 1se rule usually outperforms the regular CV in sparse variable selection and alleviates the over-selection tendency of Lasso; 3. in regression estimation or prediction, the 1se rule often performs worse. In addition, comparisons are made over two real data sets: Boston Housing Prices (large sample size n, small/moderate number of variables p) and Bardet–Biedl data (large p, small n). Data guided simulations are done to provide insight on the relative performances of the 1se rule and the regular CV. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

► Show Figures

Figure 1

19 pages, 369 KiB

Open AccessArticle

A Bootstrap Variance Estimation Method for Multistage Sampling and Two-Phase Sampling When Poisson Sampling Is Used at the Second Phase

by Jean-François Beaumont and Nelson Émond

Stats 2022, 5(2), 339-357; https://doi.org/10.3390/stats5020019 - 22 Mar 2022

Cited by 10 | Viewed by 5684

Abstract

The bootstrap method is often used for variance estimation in sample surveys with a stratified multistage sampling design. It is typically implemented by producing a set of bootstrap weights that is made available to users and that accounts for the complexity of the sampling design. The Rao–Wu–Yue method is often used to produce the required bootstrap weights. It is valid under stratified with-replacement sampling at the first stage or fixed-size without-replacement sampling provided the first-stage sampling fractions are negligible. Some surveys use designs that do not satisfy these conditions. We propose a simple and unified bootstrap method that addresses this limitation of the Rao–Wu–Yue bootstrap weights. This method is applicable to any multistage sampling design as long as valid bootstrap weights can be produced for each distinct stage of sampling. Our method is also applicable to two-phase sampling designs provided that Poisson sampling is used at the second phase. We use this design to model survey nonresponse and derive bootstrap weights that account for nonresponse weighting. The properties of our bootstrap method are evaluated in three limited simulation studies. Full article

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

12 pages, 293 KiB

Open AccessReview

Resampling under Complex Sampling Designs: Roots, Development and the Way Forward

by Pier Luigi Conti and Fulvia Mecatti

Stats 2022, 5(1), 258-269; https://doi.org/10.3390/stats5010016 - 8 Mar 2022

Cited by 1 | Viewed by 2534

Abstract

In the present paper, resampling for finite populations under an iid sampling design is reviewed. Our attention is mainly focused on pseudo-population-based resampling due to its properties. A principled appraisal of the main theoretical foundations and results is given and discussed, together with [...] Read more.

(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)

Show export options Show export options

Select all

Export citation of selected articles as:

Displaying articles 1-11

Journal Menu

Journal Browser

Re-sampling Methods for Statistical Inference of the 2020s

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (11 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI