Smoothing County-Level Sampling Variances to Improve Small Area Models’ Outputs
Abstract
:1. Introduction
2. Motivation and Background
2.1. County Agricultural Production Survey
2.2. Exploring the Relationships between Direct Estimates and Their Variances
3. Generating Survey Variances of the Yields for Anomalous Counties
3.1. A Bayesian Approach for Generating Sampling Variances in the Yields for Anomalous Counties
3.2. A Non-Parametric Approach to Generating Sampling Variances for Anomalous Counties
4. Case Study
4.1. Subarea-Level Models of the NASS
4.2. Results
- A survey;
- The original model in Equation (5);
- The updated model in Equation (4) using improved sampling variances based on the Bayesian method as the input;
- The updated model in Equation (4) using improved sampling variances based on the bootstrap method as the input.
5. Discussion and Final Remarks
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Mathematical Feasures for Model (2)
Appendix B. Gibbs Sampler for Equation (2)
- 1.
- ,where ;
- 2.
- ;
- 3.
- , where is the size of .
Appendix C. RJAGS Codes for the Models in Equations (4) and (5)
- model{
- Xbeta <- cX%*%beta
- for(j in 1:n){
- #county
- thetahatij[j] ~ dnorm(thetaij[j], 1/vhat.dirij[j])
- thetaij[j] ~ dnorm(thetaij0[j], sigma2u.inv)
- thetaij0[j] <- Xbeta[j] + vi[id[j]]
- }
- ## Priors:
- for (i in 1:m){
- vi[i] ~ dnorm(0,sigma2v.inv)
- }
- sigma2v ~ dunif(0, 10^10)
- sigma2v.inv <- pow(sigma2v, -1)
- sigma2u ~ dunif(0, 10^10)
- sigma2u.inv <- pow(sigma2u, -1)
- beta ~ dmnorm(betahat,Sigmahatbeta.inv)
- Sigmahatbeta.inv <- inverse(Sigmahatbeta*10^3)
- }
- model{
- Xbeta <- cX%*%beta
- for(j in 1:nt){
- #county
- thetahatij[nc[j]] ~ dnorm(thetaij[nc[j]],1/vhat.dirij[nc[j]])
- thetaij[nc[j]] ~ dnorm(thetaij0[nc[j]],sigma2u.inv)
- thetaij0[nc[j]] <- Xbeta[nc[j]] + vi[id[nc[j]]]
- }
- for(k in 1:ntm){
- thetahatij[nm[k]] ~ dnorm(thetaij[nm[k]], tau)
- thetaij[nm[k]] ~ dnorm(thetaij0[nm[k]], sigma2u.inv)
- thetaij0[nm[k]] <- Xbeta[nm[k]] + vi[id[nm[k]]]
- }
- ## Priors:
- for (i in 1:m){
- vi[i] ~ dnorm(0,sigma2v.inv)
- }
- sigma2v ~ dunif(0, 10^10)
- sigma2v.inv <- pow(sigma2v, -1)
- sigma2u ~ dunif(0, 10^10)
- sigma2u.inv <- pow(sigma2u, -1)
- beta ~ dmnorm(betahat,Sigmahatbeta.inv)
- Sigmahatbeta.inv <- inverse(Sigmahatbeta*10^3)
- tau ~ dgamma(0.001,0.001)
- vhat <- pow(tau,-1)
- }
Appendix D. R Code for the Bootstrap Methodology
- getBSEstim <- function(ss, sfip, scale = 2, BS_ITER = 1000L) {
- tapply(ss, sfip, function(xx) {
- or <- order(xx)
- nn <- length(xx)
- fn <- fivenum(xx)
- iq <- IQR(xx)
- cc <- c(max(fn[1], fn[2] - scale * iq),
- min(fn[5], fn[4] + scale * iq))
- smp <- xx[xx > cc[1L] & xx < cc[2L]]
- estim <- replicate(BS_ITER, {
- sort(sample(smp, size = nn, replace = TRUE))
- })
- estim <- as.vector(rowMeans(estim))
- estim[or] <- sort(estim)
- return(list(bounds = cc, sample = xx, estim = estim))
- })
- }
- sigmas <- as.numeric(dta$se2_G)
- thetas <- as.numeric(dta$ratio2_G)
- res <- getBSEstim(sigmas[thetas > 0],
- dta$state_cd[thetas > 0], 1.5)
- dta$BS_estim <- 0 * sigmas
- ina <- is.na(sigmas)
- for (i in names(res)) {
- wh <- dta$state_cd == i & thetas > 0
- dta[!ina & wh, ‘‘BS_estim’’] <- res[[i]]$estim
- }
References
- Fay, R.E.; Herriot, R.A. Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data. J. Am. Stat. Assoc. 1979, 74, 269–277. [Google Scholar] [CrossRef]
- Young, L.J.; Chen, L. Using Small Area Estimation to Produce Official Statistics. Stats 2022, 5, 881–897. [Google Scholar] [CrossRef]
- Nandram, B.; Cruze, N.B.; Erciulescu, A.L.; Chen, L. Bayesian Small Area Models under Inequality Constraints with Benchmarking and Double Shrinkage. In Research Report, National Agricultural Statistics Service; RDD-22-02; USDA: Washington, DC, USA, 2022. [Google Scholar]
- Chen, L.; Nandram, B.; Cruze, N.B. Hierarchical Bayesian Model with Inequality Constraints for US County Estimates. J. Off. Stat. 2022; accepted. [Google Scholar]
- Erciulescu, A.L.; Cruze, N.B.; Nandram, B. Statistical Challenges in Combining Survey and Auxiliary Data to Produce Official Statistics. J. Off. Stat. 2020, 36, 63–88. [Google Scholar] [CrossRef]
- Erciulescu, A.L.; Cruze, N.B.; Nandram, B. Benchmarking a Triplet of Official Estimates. Environ. Ecol. Stat. 2018, 25, 523–547. [Google Scholar] [CrossRef]
- Bejleri, V.; Cruze, N.; Erciulescu, A.L.; Benecha, H.; Nandram, B. Mitigating Standard Errors of County-Level Survey Estimates When Data are Sparse. In JSM Proceedings, Survey Research Methods Section; American Statistical Association: Alexandria, VA, USA, 2018. [Google Scholar]
- Bell, W. Examining Sensitivity of Small Area Inferences to Uncertainty about Sampling Error Variances. In JSM Proceedings, Survey Research Section; American Statistical Association: Alexandria, VA, USA, 2008; pp. 327–334. [Google Scholar]
- Purcell, N.J.; Kish, L. Estimation for small domains. Biometrics 1979, 35, 365–384. [Google Scholar] [CrossRef]
- Wolter, K. Introduction to Variance Estimation; Springer: New York, NY, USA, 1985. [Google Scholar]
- Maiti, T.; Ren, H.; Sinha, S. Prediction error of small area predictors shrinking both means and variances. Scand. J. Stat. 2014, 41, 775–790. [Google Scholar] [CrossRef]
- Sugasawa, S.; Tamae, H.; Kubokawa, T. Bayesian estimators for small area models shrinking both means and variances. Scand. J. Stat. 2017, 44, 150–167. [Google Scholar] [CrossRef]
- Gershunskaya, J.; Savitsky, T.D. Model-based screening for robust estimation in the presence of deviations from linearity in small domain models. J. Surv. Stat. Methodol. 2020, 8, 181–205. [Google Scholar] [CrossRef]
- Erciulescu, A.L.; Cruze, N.B.; Nandram, B. Model-Based County Level Crop Estimates Incorporating Auxiliary Sources of Information. J. R. Stat. Soc. Ser. A Stat. Soc. 2019, 182, 283–303. [Google Scholar] [CrossRef]
- Kott, P.S. The delete-a-group jackknife. J. Off. Stat. 2001, 17, 521. [Google Scholar]
- Tukey, J.W. Exploratory Data Analysis; Pearson: Reading, MA, USA, 1977; Volume 2. [Google Scholar]
- Schwertman, N.C.; Owens, M.A.; Adnan, R. A simple more general boxplot method for identifying outliers. Comput. Stat. Data Anal. 2004, 47, 165–174. [Google Scholar] [CrossRef]
- Fuller, W.A.; Goyeneche, J. Estimation of the State Variance Component. 1998; unpublished work. [Google Scholar]
- Torabi, M.; Rao, J.N.K. On Small Area Estimation under a Sub-Area Level Model. J. Multivar. Anal. 2014, 127, 36–55. [Google Scholar] [CrossRef]
- Rao, J.N.K.; Molina, I. Small Area Estimation; John Wiley and Sons, Inc.: Hoboken, NJ, USA, 2015. [Google Scholar] [CrossRef]
- Browne, W.J.; Draper, D. A Comparison of Bayesian and Likelihood-based Methods for Fitting Multilevel Models. Bayesian Anal. 2006, 1, 473–514. [Google Scholar] [CrossRef]
- Gelman, A. Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper). Bayesian Anal. 2006, 1, 515–534. [Google Scholar] [CrossRef]
- Plummer, M. JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. In Proceedings of the 3rd International Workshop on Distributed Statistical Computing, Vienna, Austria, 20–22 March 2003. [Google Scholar]
- Gelman, A.; Rubin, D.B. Inference from Iterative Simulation Using Multiple Sequences. Stat. Sci. 1992, 7, 457–472. [Google Scholar] [CrossRef]
- Geweke, J. Evaluating the Accuracy of Sampling-Based Approaches to the Calculation of Posterior Moments. Bayesian Stat. 1992, 4, 169–193. [Google Scholar]
Statistics | Survey | Bayesian | Bootstrap | |
---|---|---|---|---|
Anomalous Counties | Min | 0.00 | 10.09 | 1.74 |
1st Qu. | 0.00 | 83.13 | 60.33 | |
Median | 6.25 × 10 | 124.05 | 142.36 | |
3rd Qu. | 0.71 | 167.16 | 289.66 | |
Max | 5264.01 | 1426.81 | 1685.06 |
Statistics | Survey | Original | Bayesian | Bootstrap | |
---|---|---|---|---|---|
Anomalous Counties | Min | 0.00 | 0.02 | 0.02 | 0.01 |
1st Qu. | 3.63 | 4.96 | 2.62 | 2.43 | |
Median | 21.05 | 11.65 | 7.93 | 6.71 | |
3rd Qu. | 100.00 | 24.00 | 21.95 | 22.60 | |
Max | 140.66 | 199.70 | 187.43 | 195.60 | |
All Counties | Min | 0.00 | 0.00 | 0.00 | 0.00 |
1st Qu. | 0.60 | 0.60 | 0.57 | 0.51 | |
Median | 1.85 | 1.73 | 1.62 | 1.48 | |
3rd Qu. | 5.45 | 5.13 | 4.46 | 4.06 | |
Max | 140.66 | 199.70 | 187.43 | 195.60 |
Survey | Original Method | Bayesian Method | Bootstrap Method | |
---|---|---|---|---|
Published | 0.8627 | 0.9577 | 0.9566 | 0.9485 |
Survey | 0.8611 | 0.8716 | 0.8894 | |
Original Method | 0.9897 | 0.9790 | ||
Bayesian Method | 0.9913 |
Statistics | Survey | Original | Bayesian | Bootstrap | |
---|---|---|---|---|---|
Anomalous Counties | Min | 1.01 × 10 | 9.29 | 6.15 | 4.23 |
1st Qu. | 5.00 × 10 | 14.13 | 8.85 | 8.27 | |
Median | 0.25 | 16.36 | 10.85 | 9.88 | |
3rd Qu. | 61.25 | 21.07 | 13.63 | 10.47 | |
Max | 97.70 | 46.42 | 39.36 | 16.06 | |
All Counties | Min | 1.01 × 10 | 0.70 | 0.70 | 0.76 |
1st Qu. | 2.59 | 2.51 | 2.54 | 2.54 | |
Median | 4.41 | 4.26 | 4.30 | 4.20 | |
3rd Qu. | 8.27 | 7.56 | 7.71 | 7.31 | |
Max | 97.70 | 46.42 | 52.93 | 40.09 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, L.; Sartore, L.; Benecha, H.; Bejleri, V.; Nandram, B. Smoothing County-Level Sampling Variances to Improve Small Area Models’ Outputs. Stats 2022, 5, 898-915. https://doi.org/10.3390/stats5030052
Chen L, Sartore L, Benecha H, Bejleri V, Nandram B. Smoothing County-Level Sampling Variances to Improve Small Area Models’ Outputs. Stats. 2022; 5(3):898-915. https://doi.org/10.3390/stats5030052
Chicago/Turabian StyleChen, Lu, Luca Sartore, Habtamu Benecha, Valbona Bejleri, and Balgobin Nandram. 2022. "Smoothing County-Level Sampling Variances to Improve Small Area Models’ Outputs" Stats 5, no. 3: 898-915. https://doi.org/10.3390/stats5030052
APA StyleChen, L., Sartore, L., Benecha, H., Bejleri, V., & Nandram, B. (2022). Smoothing County-Level Sampling Variances to Improve Small Area Models’ Outputs. Stats, 5(3), 898-915. https://doi.org/10.3390/stats5030052