Next Issue
Volume 5, September
Previous Issue
Volume 5, March
 
 

Stats, Volume 5, Issue 2 (June 2022) – 17 articles

Cover Story (view full-size image): Reinforcement learning provides a framework for autonomous learning and decision making for control problems, including quantitative trading, which can simultaneously analyze large volumes of data and make thousands of trades every day. In quantitative trading, transaction costs are important to investors because they are a key determinant of net returns. A new and realistic near-quadratic transaction cost function considering the slippage is designed, together with a convolutional deep Q-learning network with stacked prices strategy. The connection between convolution in deep learning and technical analysis in traditional finance is then addressed. Furthermore, a random perturbation method is proposed to modify the learning network to solve the instability issue intrinsic to the deep Q-learning network. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
11 pages, 274 KiB  
Article
A Multi-Aspect Permutation Test for Goodness-of-Fit Problems
by Rosa Arboretti, Elena Barzizza, Nicolò Biasetton, Riccardo Ceccato, Livio Corain and Luigi Salmaso
Stats 2022, 5(2), 572-582; https://doi.org/10.3390/stats5020035 - 17 Jun 2022
Cited by 2 | Viewed by 1296
Abstract
Parametric techniques commonly rely on specific distributional assumptions. It is therefore fundamental to preliminarily identify the eventual violations of such assumptions. Therefore, appropriate testing procedures are required for this purpose to deal with a the goodness-of-fit (GoF) problem. This task can be quite [...] Read more.
Parametric techniques commonly rely on specific distributional assumptions. It is therefore fundamental to preliminarily identify the eventual violations of such assumptions. Therefore, appropriate testing procedures are required for this purpose to deal with a the goodness-of-fit (GoF) problem. This task can be quite challenging, especially with small sample sizes and multivariate data. Previous studiesshowed how a GoF problem can be easily represented through a traditional two-sample system of hypotheses. Following this idea, in this paper, we propose a multi-aspect permutation-based test to deal with the multivariate goodness-of-fit, taking advantage of the nonparametric combination (NPC) methodology. A simulation study is then conducted to evaluate the performance of our proposal and to identify the eventual critical scenarios. Finally, a real data application is considered. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
11 pages, 292 KiB  
Article
Bayesian Bootstrap in Multiple Frames
by Daniela Cocchi, Lorenzo Marchi and Riccardo Ievoli
Stats 2022, 5(2), 561-571; https://doi.org/10.3390/stats5020034 - 15 Jun 2022
Cited by 3 | Viewed by 1570
Abstract
Multiple frames are becoming increasingly relevant due to the spread of surveys conducted via registers. In this regard, estimators of population quantities have been proposed, including the multiplicity estimator. In all cases, variance estimation still remains a matter of debate. This paper explores [...] Read more.
Multiple frames are becoming increasingly relevant due to the spread of surveys conducted via registers. In this regard, estimators of population quantities have been proposed, including the multiplicity estimator. In all cases, variance estimation still remains a matter of debate. This paper explores the potential of Bayesian bootstrap techniques for computing such estimators. The suitability of the method, which is compared to the existing frequentist bootstrap, is shown by conducting a small-scale simulation study and a case study. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

15 pages, 2751 KiB  
Article
Quantitative Trading through Random Perturbation Q-Network with Nonlinear Transaction Costs
by Tian Zhu and Wei Zhu
Stats 2022, 5(2), 546-560; https://doi.org/10.3390/stats5020033 - 10 Jun 2022
Cited by 4 | Viewed by 2342
Abstract
In recent years, reinforcement learning (RL) has seen increasing applications in the financial industry, especially in quantitative trading and portfolio optimization when the focus is on the long-term reward rather than short-term profit. Sequential decision making and Markov decision processes are rather suited [...] Read more.
In recent years, reinforcement learning (RL) has seen increasing applications in the financial industry, especially in quantitative trading and portfolio optimization when the focus is on the long-term reward rather than short-term profit. Sequential decision making and Markov decision processes are rather suited for this type of application. Through trial and error based on historical data, an agent can learn the characteristics of the market and evolve an algorithm to maximize the cumulative returns. In this work, we propose a novel RL trading algorithm utilizing random perturbation of the Q-network and account for the more realistic nonlinear transaction costs. In summary, we first design a new near-quadratic transaction cost function considering the slippage. Next, we develop a convolutional deep Q-learning network (CDQN) with multiple price input based on this cost functions. We further propose a random perturbation (rp) method to modify the learning network to solve the instability issue intrinsic to the deep Q-learning network. Finally, we use this newly developed CDQN-rp algorithm to make trading decisions based on the daily stock prices of Apple (AAPL), Meta (FB), and Bitcoin (BTC) and demonstrate its strengths over other quantitative trading methods. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

8 pages, 280 KiB  
Article
Evaluation of the Gauss Integral
by Dmitri Martila and Stefan Groote
Stats 2022, 5(2), 538-545; https://doi.org/10.3390/stats5020032 - 10 Jun 2022
Cited by 1 | Viewed by 2705
Abstract
The normal or Gaussian distribution plays a prominent role in almost all fields of science. However, it is well known that the Gauss (or Euler–Poisson) integral over a finite boundary, as is necessary, for instance, for the error function or the cumulative distribution [...] Read more.
The normal or Gaussian distribution plays a prominent role in almost all fields of science. However, it is well known that the Gauss (or Euler–Poisson) integral over a finite boundary, as is necessary, for instance, for the error function or the cumulative distribution of the normal distribution, cannot be expressed by analytic functions. This is proven by the Risch algorithm. Regardless, there are proposals for approximate solutions. In this paper, we give a new solution in terms of normal distributions by applying a geometric procedure iteratively to the problem. Full article
Show Figures

Figure 1

17 pages, 322 KiB  
Article
A Comparison of Existing Bootstrap Algorithms for Multi-Stage Sampling Designs
by Sixia Chen, David Haziza and Zeinab Mashreghi
Stats 2022, 5(2), 521-537; https://doi.org/10.3390/stats5020031 - 06 Jun 2022
Cited by 1 | Viewed by 1533
Abstract
Multi-stage sampling designs are often used in household surveys because a sampling frame of elements may not be available or for cost considerations when data collection involves face-to-face interviews. In this context, variance estimation is a complex task as it relies on the [...] Read more.
Multi-stage sampling designs are often used in household surveys because a sampling frame of elements may not be available or for cost considerations when data collection involves face-to-face interviews. In this context, variance estimation is a complex task as it relies on the availability of second-order inclusion probabilities at each stage. To cope with this issue, several bootstrap algorithms have been proposed in the literature in the context of a two-stage sampling design. In this paper, we describe some of these algorithms and compare them empirically in terms of bias, stability, and coverage probability. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
14 pages, 540 KiB  
Article
Goodness-of-Fit and Generalized Estimating Equation Methods for Ordinal Responses Based on the Stereotype Model
by Daniel Fernández, Louise McMillan, Richard Arnold, Martin Spiess and Ivy Liu
Stats 2022, 5(2), 507-520; https://doi.org/10.3390/stats5020030 - 01 Jun 2022
Cited by 1 | Viewed by 1993
Abstract
Background: Data with ordinal categories occur in many diverse areas, but methodologies for modeling ordinal data lag severely behind equivalent methodologies for continuous data. There are advantages to using a model specifically developed for ordinal data, such as making fewer assumptions and [...] Read more.
Background: Data with ordinal categories occur in many diverse areas, but methodologies for modeling ordinal data lag severely behind equivalent methodologies for continuous data. There are advantages to using a model specifically developed for ordinal data, such as making fewer assumptions and having greater power for inference. Methods: The ordered stereotype model (OSM) is an ordinal regression model that is more flexible than the popular proportional odds ordinal model. The primary benefit of the OSM is that it uses numeric encoding of the ordinal response categories without assuming the categories are equally-spaced. Results: This article summarizes two recent advances in the OSM: (1) three novel tests to assess goodness-of-fit; (2) a new Generalized Estimating Equations approach to estimate the model for longitudinal studies. These methods use the new spacing of the ordinal categories indicated by the estimated score parameters of the OSM. Conclusions: The recent advances presented can be applied to several fields. We illustrate their use with the well-known arthritis clinical trial dataset. These advances fill a gap in methodologies available for ordinal responses and may be useful for practitioners in many applied fields. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure 1

13 pages, 414 KiB  
Article
The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection
by Norah Alyabs and Sy Han Chiou
Stats 2022, 5(2), 494-506; https://doi.org/10.3390/stats5020029 - 10 May 2022
Viewed by 1640
Abstract
The limit of detection (LOD) is commonly encountered in observational studies when one or more covariate values fall outside the measuring ranges. Although the complete-case (CC) approach is widely employed in the presence of missing values, it could result in biased estimations or [...] Read more.
The limit of detection (LOD) is commonly encountered in observational studies when one or more covariate values fall outside the measuring ranges. Although the complete-case (CC) approach is widely employed in the presence of missing values, it could result in biased estimations or even become inapplicable in small sample studies. On the other hand, approaches such as the missing indicator (MDI) approach are attractive alternatives as they preserve sample sizes. This paper compares the effectiveness of different alternatives to the CC approach under different LOD settings with a survival outcome. These alternatives include substitution methods, multiple imputation (MI) methods, MDI approaches, and MDI-embedded MI approaches. We found that the MDI approach outperformed its competitors regarding bias and mean squared error in small sample sizes through extensive simulation. Full article
(This article belongs to the Special Issue Survival Analysis: Models and Applications)
Show Figures

Figure 1

17 pages, 356 KiB  
Article
Bayesian Semiparametric Regression Analysis of Multivariate Panel Count Data
by Chunling Wang and Xiaoyan Lin
Stats 2022, 5(2), 477-493; https://doi.org/10.3390/stats5020028 - 10 May 2022
Viewed by 1721
Abstract
Panel count data often occur in a long-term recurrent event study, where the exact occurrence time of the recurrent events is unknown, but only the occurrence count between any two adjacent observation time points is recorded. Most traditional methods only handle panel count [...] Read more.
Panel count data often occur in a long-term recurrent event study, where the exact occurrence time of the recurrent events is unknown, but only the occurrence count between any two adjacent observation time points is recorded. Most traditional methods only handle panel count data for a single type of event. In this paper, we propose a Bayesian semiparameteric approach to analyze panel count data for multiple types of events. For each type of recurrent event, the proportional mean model is adopted to model the mean count of the event, where its baseline mean function is approximated by monotone I-splines. The correlation between multiple types of events is modeled by common frailty terms and scale parameters. Unlike many frequentist estimating equation methods, our approach is based on the observed likelihood and makes no assumption on the relationship between the recurrent process and the observation process. Under the Poisson counting process assumption, we develop an efficient Gibbs sampler based on novel data augmentation for the Markov chain Monte Carlo sampling. Simulation studies show good estimation performance of the baseline mean functions and the regression coefficients; meanwhile, the importance of including the scale parameter to flexibly accommodate the correlation between events is also demonstrated. Finally, a skin cancer data example is fully analyzed to illustrate the proposed methods. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
Show Figures

Figure A1

19 pages, 1768 KiB  
Article
Repeated-Measures Analysis in the Context of Heteroscedastic Error Terms with Factors Having Both Fixed and Random Levels
by Lyson Chaka and Peter Njuho
Stats 2022, 5(2), 458-476; https://doi.org/10.3390/stats5020027 - 06 May 2022
Viewed by 1941
Abstract
The design and analysis of experiments which involve factors each consisting of both fixed and random levels fit into linear mixed models. The assumed linear mixed-model design matrix takes either a full-rank or less-than-full-rank form. The complexity of the data structures of such [...] Read more.
The design and analysis of experiments which involve factors each consisting of both fixed and random levels fit into linear mixed models. The assumed linear mixed-model design matrix takes either a full-rank or less-than-full-rank form. The complexity of the data structures of such experiments falls in the model-selection and parameter-estimation process. The fundamental consideration in the estimation process of linear models is the special case in which elements of the error vector are assumed equal and uncorrelated. However, different assumptions on the structure of the variance–covariance matrix of error vector in the estimation of parameters of a linear mixed model may be considered. We conceptualise a repeated-measures design with multiple between-subjects factors, in which each of these factors has both fixed and random levels. We focus on the construction of linear mixed-effects models, the estimation of variance components, and hypothesis testing in which the default covariance structure of homoscedastic error terms is not appropriate. We illustrate the proposed approach using longitudinal data fitted to a three-factor linear mixed-effects model. The novelty of this approach lies in the exploration of the fixed and random levels of the same factor and in the subsequent interaction effects of the fixed levels. In addition, we assess the differences between levels of the same factor and determine the proportion of the total variation accounted for by the random levels of the same factor. Full article
Show Figures

Figure 1

18 pages, 428 KiB  
Article
Opening the Black Box: Bootstrapping Sensitivity Measures in Neural Networks for Interpretable Machine Learning
by Michele La Rocca and Cira Perna
Stats 2022, 5(2), 440-457; https://doi.org/10.3390/stats5020026 - 25 Apr 2022
Cited by 2 | Viewed by 1674
Abstract
Artificial neural networks are powerful tools for data analysis, particularly in the context of highly nonlinear regression models. However, their utility is critically limited due to the lack of interpretation of the model given its black-box nature. To partially address the problem, the [...] Read more.
Artificial neural networks are powerful tools for data analysis, particularly in the context of highly nonlinear regression models. However, their utility is critically limited due to the lack of interpretation of the model given its black-box nature. To partially address the problem, the paper focuses on the important problem of feature selection. It proposes and discusses a statistical test procedure for selecting a set of input variables that are relevant to the model while taking into account the multiple testing nature of the problem. The approach is within the general framework of sensitivity analysis and uses the conditional expectation of functions of the partial derivatives of the output with respect to the inputs as a sensitivity measure. The proposed procedure extensively uses the bootstrap to approximate the test statistic distribution under the null while controlling the familywise error rate to correct for data snooping arising from multiple testing. In particular, a pair bootstrap scheme was implemented in order to obtain consistent results when using misspecified statistical models, a typical characteristic of neural networks. Numerical examples and a Monte Carlo simulation were carried out to verify the ability of the proposed test procedure to correctly identify the set of relevant features. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

18 pages, 705 KiB  
Article
Bootstrap Assessment of Crop Area Estimates Using Satellite Pixels Counting
by Cristiano Ferraz, Jacques Delincé, André Leite and Raydonal Ospina
Stats 2022, 5(2), 422-439; https://doi.org/10.3390/stats5020025 - 25 Apr 2022
Viewed by 1581
Abstract
Crop area estimates based on counting pixels over classified satellite images are a promising application of remote sensing to agriculture. However, such area estimates are biased, and their variance is a function of the error rates of the classification rule. To redress the [...] Read more.
Crop area estimates based on counting pixels over classified satellite images are a promising application of remote sensing to agriculture. However, such area estimates are biased, and their variance is a function of the error rates of the classification rule. To redress the bias, estimators (direct and inverse) relying on the so-called confusion matrix have been proposed, but analytic estimators for variances can be tricky to derive. This article proposes a bootstrap method for assessing statistical properties of such estimators based on information from a sample confusion matrix. The proposed method can be applied to any other type of estimator that is built upon confusion matrix information. The resampling procedure is illustrated in a small study to assess the biases and variances of estimates using purely pixel counting and estimates provided by both direct and inverse estimators. The method has the advantage of being simple to implement even when the sample confusion matrix is generated under unequal probability sample design. The results show the limitations of estimates based solely on pixel counting as well as respective advantages and drawbacks of the direct and inverse estimators with respect to their feasibility, unbiasedness, and variance. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

14 pages, 1158 KiB  
Article
Omnibus Tests for Multiple Binomial Proportions via Doubly Sampled Framework with Under-Reported Data
by Dewi Rahardja
Stats 2022, 5(2), 408-421; https://doi.org/10.3390/stats5020024 - 23 Apr 2022
Viewed by 2043
Abstract
Previously, Rahardja (2020) paper (in the first reference list) developed a (pairwise) multiple comparison procedure (MCP) to determine which (proportions) pairs of Multiple Binomial Proportions (with under-reported data), the significant differences came from. Generally, such an MCP test (developed by Rahardja, 2020) is [...] Read more.
Previously, Rahardja (2020) paper (in the first reference list) developed a (pairwise) multiple comparison procedure (MCP) to determine which (proportions) pairs of Multiple Binomial Proportions (with under-reported data), the significant differences came from. Generally, such an MCP test (developed by Rahardja, 2020) is the second part of a two-stage sequential test. In this paper, we derived two omnibus tests (i.e., the overall equality of multiple proportions test) as the first part of the above two-stage sequential test (with under-reported data), in general. Using two likelihood-based approaches, we acquire two Wald-type (Omnibus) tests to compare Multiple Binomial Proportions (in the presence of under-reported data). Our closed-form algorithm is easy to implement and not computationally burdensome. We applied our algorithm to a vehicle-accident data example. Full article
(This article belongs to the Special Issue Multivariate Statistics and Applications)
Show Figures

Figure 1

7 pages, 423 KiB  
Article
Has the Market Started to Collapse or Will It Resist?
by Yao Kuang and Raphael Douady
Stats 2022, 5(2), 401-407; https://doi.org/10.3390/stats5020023 - 23 Apr 2022
Viewed by 5490
Abstract
Many people are concerned about the stock market in 2022 as it faces several threats, from rising inflation rates to geopolitical events. The S&P 500 Index has already dropped about 10% from the peak in early January 2022 until the end of February [...] Read more.
Many people are concerned about the stock market in 2022 as it faces several threats, from rising inflation rates to geopolitical events. The S&P 500 Index has already dropped about 10% from the peak in early January 2022 until the end of February 2022. This paper aims at updating the crisis indicator to predict when the market may experience a significant drawdown, which we developed in Crisis Risk Prediction with Concavity from Polymodel (2022). This indicator uses regime switching and Polymodel theory to calculate the market concavity. We found that concavity had not increased in the past 6 months. We conclude that at present, the market does not bear inherent dynamic instability. This does not exclude a possible collapse which would be due to external events unrelated to financial markets. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Quantitative Finance)
Show Figures

Figure 1

16 pages, 396 KiB  
Article
Some Empirical Results on Nearest-Neighbour Pseudo-populations for Resampling from Spatial Populations
by Sara Franceschi, Rosa Maria Di Biase, Agnese Marcelli and Lorenzo Fattorini
Stats 2022, 5(2), 385-400; https://doi.org/10.3390/stats5020022 - 15 Apr 2022
Cited by 2 | Viewed by 1632
Abstract
In finite populations, pseudo-population bootstrap is the sole method preserving the spirit of the original bootstrap performed from iid observations. In spatial sampling, theoretical results about the convergence of bootstrap distributions to the actual distributions of estimators are lacking, owing to the failure [...] Read more.
In finite populations, pseudo-population bootstrap is the sole method preserving the spirit of the original bootstrap performed from iid observations. In spatial sampling, theoretical results about the convergence of bootstrap distributions to the actual distributions of estimators are lacking, owing to the failure of spatially balanced sampling designs to converge to the maximum entropy design. In addition, the issue of creating pseudo-populations able to mimic the characteristics of real populations is challenging in spatial frameworks where spatial trends, relationships, and similarities among neighbouring locations are invariably present. In this paper, we propose the use of the nearest-neighbour interpolation of spatial populations for constructing pseudo-populations that converge to real populations under mild conditions. The effectiveness of these proposals with respect to traditional pseudo-populations is empirically checked by a simulation study. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Show Figures

Figure 1

14 pages, 399 KiB  
Article
ordinalbayes: Fitting Ordinal Bayesian Regression Models to High-Dimensional Data Using R
by Kellie J. Archer, Anna Eames Seffernick, Shuai Sun and Yiran Zhang
Stats 2022, 5(2), 371-384; https://doi.org/10.3390/stats5020021 - 15 Apr 2022
Cited by 1 | Viewed by 2184
Abstract
The stage of cancer is a discrete ordinal response that indicates the aggressiveness of disease and is often used by physicians to determine the type and intensity of treatment to be administered. For example, the FIGO stage in cervical cancer is based on [...] Read more.
The stage of cancer is a discrete ordinal response that indicates the aggressiveness of disease and is often used by physicians to determine the type and intensity of treatment to be administered. For example, the FIGO stage in cervical cancer is based on the size and depth of the tumor as well as the level of spread. It may be of clinical relevance to identify molecular features from high-throughput genomic assays that are associated with the stage of cervical cancer to elucidate pathways related to tumor aggressiveness, identify improved molecular features that may be useful for staging, and identify therapeutic targets. High-throughput RNA-Seq data and corresponding clinical data (including stage) for cervical cancer patients have been made available through The Cancer Genome Atlas Project (TCGA). We recently described penalized Bayesian ordinal response models that can be used for variable selection for over-parameterized datasets, such as the TCGA-CESC dataset. Herein, we describe our ordinalbayes R package, available from the Comprehensive R Archive Network (CRAN), which enhances the runjags R package by enabling users to easily fit cumulative logit models when the outcome is ordinal and the number of predictors exceeds the sample size, P>N, such as for TCGA and other high-throughput genomic data. We demonstrate the use of this package by applying it to the TCGA cervical cancer dataset. Our ordinalbayes package can be used to fit models to high-dimensional datasets, and it effectively performs variable selection. Full article
(This article belongs to the Special Issue Statistics, Analytics, and Inferences for Discrete Data)
13 pages, 694 KiB  
Article
Multiple Imputation of Composite Covariates in Survival Studies
by Lily Clements, Alan C. Kimber and Stefanie Biedermann
Stats 2022, 5(2), 358-370; https://doi.org/10.3390/stats5020020 - 29 Mar 2022
Viewed by 1901
Abstract
Missing covariate values are a common problem in survival studies, and the method of choice when handling such incomplete data is often multiple imputation. However, it is not obvious how this can be used most effectively when an incomplete covariate is a function [...] Read more.
Missing covariate values are a common problem in survival studies, and the method of choice when handling such incomplete data is often multiple imputation. However, it is not obvious how this can be used most effectively when an incomplete covariate is a function of other covariates. For example, body mass index (BMI) is the ratio of weight and height-squared. In this situation, the following question arises: Should a composite covariate such as BMI be imputed directly, or is it advantageous to impute its constituents, weight and height, first and to construct BMI afterwards? We address this question through a carefully designed simulation study that compares various approaches to multiple imputation of composite covariates in a survival context. We discuss advantages and limitations of these approaches for various types of missingness and imputation models. Our results are a first step towards providing much needed guidance to practitioners for analysing their incomplete survival data effectively. Full article
(This article belongs to the Special Issue Survival Analysis: Models and Applications)
Show Figures

Figure 1

19 pages, 369 KiB  
Article
A Bootstrap Variance Estimation Method for Multistage Sampling and Two-Phase Sampling When Poisson Sampling Is Used at the Second Phase
by Jean-François Beaumont and Nelson Émond
Stats 2022, 5(2), 339-357; https://doi.org/10.3390/stats5020019 - 22 Mar 2022
Cited by 4 | Viewed by 3543
Abstract
The bootstrap method is often used for variance estimation in sample surveys with a stratified multistage sampling design. It is typically implemented by producing a set of bootstrap weights that is made available to users and that accounts for the complexity of the [...] Read more.
The bootstrap method is often used for variance estimation in sample surveys with a stratified multistage sampling design. It is typically implemented by producing a set of bootstrap weights that is made available to users and that accounts for the complexity of the sampling design. The Rao–Wu–Yue method is often used to produce the required bootstrap weights. It is valid under stratified with-replacement sampling at the first stage or fixed-size without-replacement sampling provided the first-stage sampling fractions are negligible. Some surveys use designs that do not satisfy these conditions. We propose a simple and unified bootstrap method that addresses this limitation of the Rao–Wu–Yue bootstrap weights. This method is applicable to any multistage sampling design as long as valid bootstrap weights can be produced for each distinct stage of sampling. Our method is also applicable to two-phase sampling designs provided that Poisson sampling is used at the second phase. We use this design to model survey nonresponse and derive bootstrap weights that account for nonresponse weighting. The properties of our bootstrap method are evaluated in three limited simulation studies. Full article
(This article belongs to the Special Issue Re-sampling Methods for Statistical Inference of the 2020s)
Previous Issue
Next Issue
Back to TopTop