Small Area Estimation: Theories, Methods and Applications

A special issue of Stats (ISSN 2571-905X).

Deadline for manuscript submissions: closed (30 June 2022) | Viewed by 24388

Special Issue Editor


E-Mail Website
Guest Editor
Department of Mathematical Sciences, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609, USA
Interests: small area estimation; survey sampling; categorical data analysis; nonresponse; agricultural statistics; data science; statistical education

Special Issue Information

Dear Colleagues,

Small area estimation is important to many government agencies across the world as it is a vehicle used to build official statistics. It has grown to enormous importance over the past fifty years and is in practice today as the demand for small area statistics has significantly increased worldwide. Currently, there is a strong need for formulating policies and programs for the allocation of government funds and regional planning.

Recent advances in statistical methodology and computation, coupled with strong demand for disaggregated estimates, have led to the field of small area estimation expanding in many directions. Robust estimators and generalized linear mixed models accommodate non-normal distributions in unit-level responses. Non-parametric models allow for further flexibility in describing distributional forms and indirect relationships to covariates. Measurement error models and variable selection methods aid in the use of increasingly complex auxiliary data sources.  Spatio-temporal and multivariate models capture dependencies among areas and variables. Proper studies on nonresponse and selection bias guard against biased predictors when the selection mechanisms are related to the characteristic of interest. The bootstrap and the Bayesian paradigm facilitate the construction of accurate prediction intervals and mean square errors of estimators, even for complex models, where parameters are nonlinear functions of response variables and there are clustering effects. Developments in statistical methods have enabled many important applications that rely on small area estimates. With all these new methods, judicious pooling of small areas is essential. Non-probability sampling is an emerging area.

I would like this Special Issue to contain papers in small area estimation that have strong theories, methods, and applications, and a mix of these broad areas. Therefore, I welcome papers that develop innovative small area methods or that demonstrate sound applications of small area estimation to problems of practical interest. Some prominent researchers in small area estimation have already agreed to submit papers for possible publication. I look forward to receiving your submission.

Prof. Dr. Balgobin Nandram
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • correlated data
  • nonlinear model model
  • non-parametric model
  • hierarchical and multi-level models
  • nonresponse
  • selection bias
  • Bayesian methodology
  • bootstrap and resampling
  • spatio-temporal models

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

18 pages, 511 KiB  
Article
Smoothing County-Level Sampling Variances to Improve Small Area Models’ Outputs
by Lu Chen, Luca Sartore, Habtamu Benecha, Valbona Bejleri and Balgobin Nandram
Stats 2022, 5(3), 898-915; https://doi.org/10.3390/stats5030052 - 11 Sep 2022
Cited by 1 | Viewed by 1801
Abstract
The use of hierarchical Bayesian small area models, which take survey estimates along with auxiliary data as input to produce official statistics, has increased in recent years. Survey estimates for small domains are usually unreliable due to small sample sizes, and the corresponding [...] Read more.
The use of hierarchical Bayesian small area models, which take survey estimates along with auxiliary data as input to produce official statistics, has increased in recent years. Survey estimates for small domains are usually unreliable due to small sample sizes, and the corresponding sampling variances can also be imprecise and unreliable. This affects the performance of the model (i.e., the model will not produce an estimate or will produce a low-quality modeled estimate), which results in a reduced number of official statistics published by a government agency. To mitigate the unreliable sampling variances, these survey-estimated variances are typically modeled against the direct estimates wherever a relationship between the two is present. However, this is not always the case. This paper explores different alternatives to mitigate the unreliable (beyond some threshold) sampling variances. A Bayesian approach under the area-level model set-up and a distribution-free technique based on bootstrap sampling are proposed to update the survey data. An application to the county-level corn yield data from the County Agricultural Production Survey of the United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) is used to illustrate the proposed approaches. The final county-level model-based estimates for small area domains, produced based on updated survey data from each method, are compared with county-level model-based estimates produced based on the original survey data and the official statistics published in 2016. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

17 pages, 872 KiB  
Article
Model-Based Estimates for Farm Labor Quantities
by Lu Chen, Nathan B. Cruze and Linda J. Young
Stats 2022, 5(3), 738-754; https://doi.org/10.3390/stats5030043 - 3 Aug 2022
Cited by 1 | Viewed by 2306
Abstract
The United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) conducts the Farm Labor Survey to produce estimates of the number of workers, duration of the workweek, and wage rates for all agricultural workers. Traditionally, expert opinion is used to integrate [...] Read more.
The United States Department of Agriculture’s (USDA’s) National Agricultural Statistics Service (NASS) conducts the Farm Labor Survey to produce estimates of the number of workers, duration of the workweek, and wage rates for all agricultural workers. Traditionally, expert opinion is used to integrate auxiliary information, such as the previous year’s estimates, with the survey’s direct estimates. Alternatively, implementing small area models for integrating survey estimates with additional sources of information provides more reliable official estimates and valid measures of uncertainty for each type of estimate. In this paper, several hierarchical Bayesian subarea-level models are developed in support of different estimates of interest in the Farm Labor Survey. A 2020 case study illustrates the improvement of the direct survey estimates for areas with small sample sizes by using auxiliary information and borrowing information across areas and subareas. The resulting framework provides a complete set of coherent estimates for all required geographic levels. These methods were incorporated into the official Farm Labor publication for the first time in 2020. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

25 pages, 1526 KiB  
Article
A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency
by Weijia Ren, Jianzhu Li, Andreea Erciulescu, Tom Krenzke and Leyla Mohadjer
Stats 2022, 5(3), 689-713; https://doi.org/10.3390/stats5030041 - 27 Jul 2022
Cited by 3 | Viewed by 2496
Abstract
In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. [...] Read more.
In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

16 pages, 395 KiB  
Article
Multivariate Global-Local Priors for Small Area Estimation
by Tamal Ghosh, Malay Ghosh, Jerry J. Maples and Xueying Tang
Stats 2022, 5(3), 673-688; https://doi.org/10.3390/stats5030040 - 25 Jul 2022
Viewed by 2145
Abstract
It is now widely recognized that small area estimation (SAE) needs to be model-based. Global-local (GL) shrinkage priors for random effects are important in sparse situations where many areas’ level effects do not have a significant impact on the response beyond what is [...] Read more.
It is now widely recognized that small area estimation (SAE) needs to be model-based. Global-local (GL) shrinkage priors for random effects are important in sparse situations where many areas’ level effects do not have a significant impact on the response beyond what is offered by covariates. We propose in this paper a hierarchical multivariate model with GL priors. We prove the propriety of the posterior density when the regression coefficient matrix has an improper uniform prior. Some concentration inequalities are derived for the tail probabilities of the shrinkage estimators. The proposed method is illustrated via both data analysis and simulations. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

15 pages, 659 KiB  
Article
Analysis of Household Pulse Survey Public-Use Microdata via Unit-Level Models for Informative Sampling
by Alexander Sun, Paul A. Parker and Scott H. Holan
Stats 2022, 5(1), 139-153; https://doi.org/10.3390/stats5010010 - 7 Feb 2022
Cited by 8 | Viewed by 4287
Abstract
The Household Pulse Survey, recently released by the U.S. Census Bureau, gathers information about the respondents’ experiences regarding employment status, food security, housing, physical and mental health, access to health care, and education disruption. Design-based estimates are produced for all 50 states and [...] Read more.
The Household Pulse Survey, recently released by the U.S. Census Bureau, gathers information about the respondents’ experiences regarding employment status, food security, housing, physical and mental health, access to health care, and education disruption. Design-based estimates are produced for all 50 states and the District of Columbia (DC), as well as 15 Metropolitan Statistical Areas (MSAs). Using public-use microdata, this paper explores the effectiveness of using unit-level model-based estimators that incorporate spatial dependence for the Household Pulse Survey. In particular, we consider Bayesian hierarchical model-based spatial estimates for both a binomial and a multinomial response under informative sampling. Importantly, we demonstrate that these models can be easily estimated using Hamiltonian Monte Carlo through the Stan software package. In doing so, these models can readily be implemented in a production environment. For both the binomial and multinomial responses, an empirical simulation study is conducted, which compares spatial and non-spatial models. Finally, using public-use Household Pulse Survey micro-data, we provide an analysis that compares both design-based and model-based estimators and demonstrates a reduction in standard errors for the model-based approaches. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

11 pages, 310 KiB  
Article
Selection of Auxiliary Variables for Three-Fold Linking Models in Small Area Estimation: A Simple and Effective Method
by Song Cai and J.N.K. Rao
Stats 2022, 5(1), 128-138; https://doi.org/10.3390/stats5010009 - 5 Feb 2022
Cited by 5 | Viewed by 2436
Abstract
Model-based estimation of small area means can lead to reliable estimates when the area sample sizes are small. This is accomplished by borrowing strength across related areas using models linking area means to related covariates and random area effects. The effective selection of [...] Read more.
Model-based estimation of small area means can lead to reliable estimates when the area sample sizes are small. This is accomplished by borrowing strength across related areas using models linking area means to related covariates and random area effects. The effective selection of variables to be included in the linking model is important in small area estimation. The main purpose of this paper is to extend the earlier work on variable selection for area level and two-fold subarea level models to three-fold sub-subarea models linking sub-subarea means to related covariates and random effects at the area, sub-area, and sub-subarea levels. The proposed variable selection method transforms the sub-subarea means to reduce the linking model to a standard regression model and applies commonly used criteria for variable selection, such as AIC and BIC, to the reduced model. The resulting criteria depend on the unknown sub-subarea means, which are then estimated using the sample sub-subarea means. Then, the estimated selection criteria are used for variable selection. Simulation results on the performance of the proposed variable selection method relative to methods based on area level and two-fold subarea level models are also presented. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
12 pages, 1476 KiB  
Article
Estimating the RMSE of Small Area Estimates without the Tears
by Diane Hindmarsh and David Steel
Stats 2021, 4(4), 931-942; https://doi.org/10.3390/stats4040054 - 17 Nov 2021
Cited by 1 | Viewed by 2949
Abstract
Small area estimation (SAE) methods can provide information that conventional direct survey estimation methods cannot. The use of small area estimates based on linear and generalized linear mixed models is still very limited, possibly because of the perceived complexity of estimating the root [...] Read more.
Small area estimation (SAE) methods can provide information that conventional direct survey estimation methods cannot. The use of small area estimates based on linear and generalized linear mixed models is still very limited, possibly because of the perceived complexity of estimating the root mean square errors (RMSEs) of the estimates. This paper outlines a study used to determine the conditions under which the estimated RMSEs, produced as part of statistical output (‘plug-in’ estimates of RMSEs) could be considered appropriate for a practical application of SAE methods where one of the main requirements was to use SAS software. We first show that the estimated RMSEs created using an EBLUP model in SAS and those obtained using a parametric bootstrap are similar to the published estimated RMSEs for the corn data in the seminal paper by Battese, Harter and Fuller. We then compare plug-in estimates of RMSEs from SAS procedures used to create EBLUP and EBP estimators against estimates of RMSEs obtained from a parametric bootstrap. For this comparison we created estimates of current smoking in males for 153 local government areas (LGAs) using data from the NSW Population Health Survey in Australia. Demographic variables from the survey data were included as covariates, with LGA-level population proportions, obtained mainly from the Australian Census used for prediction. For the EBLUP, the estimated plug-in estimates of RMSEs can be used, provided the sample size for the small area is more than seven. For the EBP, the plug-in estimates of RMSEs are suitable for all in-sample areas; out-of-sample areas need to use estimated RMSEs that use the parametric bootstrap. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

20 pages, 9243 KiB  
Article
A Bayesian Approach to Linking a Survey and a Census via Small Areas
by Balgobin Nandram
Stats 2021, 4(2), 509-528; https://doi.org/10.3390/stats4020031 - 9 Jun 2021
Cited by 1 | Viewed by 2418
Abstract
We predict the finite population proportion of a small area when individual-level data are available from a survey and more extensive household-level (not individual-level) data (covariates but not responses) are available from a census. The census and the survey consist of the same [...] Read more.
We predict the finite population proportion of a small area when individual-level data are available from a survey and more extensive household-level (not individual-level) data (covariates but not responses) are available from a census. The census and the survey consist of the same strata and primary sampling units (PSU, or wards) that are matched, but the households are not matched. There are some common covariates at the household level in the survey and the census and these covariates are used to link the households within wards. There are also covariates at the ward level, and the wards are the same in the survey and the census. Using a two-stage procedure, we study the multinomial counts in the sampled households within the wards and a projection method to infer about the non-sampled wards. This is accommodated by a multinomial-Dirichlet–Dirichlet model, a three-stage hierarchical Bayesian model for multinomial counts, as it is necessary to account for heterogeneity among the households. The key theoretical contribution of this paper is to develop a computational algorithm to sample the joint posterior density of the multinomial-Dirichlet–Dirichlet model. Specifically, we obtain samples from the distributions of the proportions for each multinomial cell. The second key contribution is to use two projection procedures (parametric based on the nested error regression model and non-parametric based on iterative re-weighted least squares), on these proportions to link the survey to the census, thereby providing a copy of the census counts. We compare the multinomial-Dirichlet–Dirichlet (heterogeneous) model and the multinomial-Dirichlet (homogeneous) model without household effects via these two projection methods. An example of the second Nepal Living Standards Survey is presented. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

Other

Jump to: Research

17 pages, 1120 KiB  
Project Report
Using Small Area Estimation to Produce Official Statistics
by Linda J. Young and Lu Chen
Stats 2022, 5(3), 881-897; https://doi.org/10.3390/stats5030051 - 8 Sep 2022
Cited by 4 | Viewed by 2651
Abstract
The USDA National Agricultural Statistics Service (NASS) and other federal statistical agencies have used probability-based surveys as the foundation for official statistics for over half a century. Non-survey data that can be used to improve the accuracy and precision of estimates such as [...] Read more.
The USDA National Agricultural Statistics Service (NASS) and other federal statistical agencies have used probability-based surveys as the foundation for official statistics for over half a century. Non-survey data that can be used to improve the accuracy and precision of estimates such as administrative, remotely sensed, and retail data have become increasingly available. Both frequentist and Bayesian models are used to combine survey and non-survey data in a principled manner. NASS has recently adopted Bayesian subarea models for three of its national programs: farm labor, crop county estimates, and cash rent county estimates. Each program provides valuable estimates at multiple scales of geography. For each program, technical challenges had to be met and a strenuous review completed before models could be adopted as the foundation for official statistics. Moving models out of the research phase into production required major changes in the production process and a cultural shift. With the implemented models, NASS now has measures of uncertainty, transparency, and reproducibility of its official statistics. Full article
(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)
Show Figures

Figure 1

Back to TopTop