Regression Models in Complex Survey Sampling for Sensitive Quantitative Variables
Abstract
:1. Introduction
2. Randomized Response Survey Designs for Quantitative Variables
3. Regression for RR Models
3.1. Estimation of the Regression Coefficients
- A.1. The survey design satisfies for any .
- A.2. The survey design ensures that is asymptotically normally distributed with mean and entries of the variance-covariance matrix at the order for any .
- A.3. The survey design satisfies and for any .
3.2. The Homoscedastic Linear Model
3.3. The Ratio Model
4. Simulation Study
5. Real Application
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Arnab, R. Randomized response trials: A unified approach for qualitative data. Commun. Stat. Theory Methods 1996, 25, 1173–1183. [Google Scholar] [CrossRef]
- Barabesi, L.; Marcheselli, M. A practical implementation and Bayesian estimation in Franklin’s randomized response procedure. Commun. Stat. Simul. Comput. 2006, 35, 563–573. [Google Scholar] [CrossRef]
- Barabesi, L. A design-based randomized response procedure for the estimation of population proportion and sensitivity level. J. Stat. Plan. Inference 2008, 138, 2398–2408. [Google Scholar] [CrossRef]
- Perri, P. Modified randomized devices for Simmons’ model. Model Assist. Stat. Appl. 2008, 3, 233–239. [Google Scholar] [CrossRef]
- Lee, C.; Sedory, S.; Singh, S. Estimating at least seven measures of qualitative variables from a single sample using randomized response technique. Stat. Probab. Lett. 2013, 83, 399–409. [Google Scholar] [CrossRef]
- Liu, Y.; Tian, G. Multi-category parallel models in the design of surveys with sensitive questions. Stat. Interface 2013, 6, 137–142. [Google Scholar] [CrossRef] [Green Version]
- Greenberg, B.; Kuebler, R.; Abernathy, J.; Horvitz, D. Application of the randomized response technique in obtaining quantitative data. J. Am. Stat. Assoc. 1971, 66, 243–250. [Google Scholar] [CrossRef]
- Eriksson, S. A new model for randomized response. Int. Stat. Rev. 1973, 41, 40–43. [Google Scholar] [CrossRef]
- Pollock, K.; Bek, Y. A comparison of three randomized response models for quantitative data. J. Am. Stat. Assoc. 1976, 71, 884–886. [Google Scholar] [CrossRef]
- Eichhorn, B.; Hayre, L. Scrambled randomized response methods for obtaining sensitive quantitative data. J. Stat. Plan. Inference 1983, 7, 307–316. [Google Scholar] [CrossRef]
- Bar-Lev, S.; Bobovitch, E.; Boukai, B. A note on randomized response models for quantitative data. Metrika 2004, 60, 255–260. [Google Scholar] [CrossRef]
- Gjestvang, R.; Singh, S. A new randomized response model. J. R. Stat. Soc. B 2006, 68, 523–530. [Google Scholar] [CrossRef]
- Saha, A. A simple randomized response technique in complex surveys. Metron 2007, LXV, 59–66. [Google Scholar]
- Singh, S.; Kim, J. A pseudo-empirical log-likelihood estimator using scrambled responses. Statist. Probab. Lett. 2007, 81, 345–351. [Google Scholar] [CrossRef]
- Huang, K. Estimation for sensitive characteristics using optional randomized response technique. Qual. Quant. 2008, 42, 679–686. [Google Scholar] [CrossRef]
- Bouza, C. Ranked set sampling and randomized response procedures for estimating the mean of a sensitive quantitative character. Metrika 2009, 70, 267–277. [Google Scholar] [CrossRef]
- Diana, G.; Perri, P. A new scrambled response models for estimating the mean of a sensitive quantitative character. J. Appl. Stat. 2010, 37, 1875–1890. [Google Scholar] [CrossRef]
- Diana, G.; Perri, P. Calibration-based approach to sensitive data: A simulation study. J. Appl. Stat. 2012, 39, 53–65. [Google Scholar] [CrossRef]
- Gupta, S.; Shabbir, J.; Sehra, S. Mean and sensitivity estimation in optional randomized response models. J. Stat. Plan. Inference 2010, 140, 2870–2874. [Google Scholar] [CrossRef]
- Odumade, O.; Singh, S. An alternative to the Bar-Lev, Bobovitch, and Boukai randomized response model. Sociol. Methods Res. 2010, 20, 1–16. [Google Scholar] [CrossRef]
- Arcos, A.; Rueda, M.; Singh, S. A generalized approach to randomised response for quantitative variables. Qual. Quant. 2015, 49, 1239–1256. [Google Scholar] [CrossRef]
- Fox, J.; Tracy, P. Randomized Response: A Method for Sensitive Survey; Sage Publication, Inc.: Thousand Oaks, CA, USA, 1986. [Google Scholar]
- Chaudhuri, A.; Mukerjee, R. Randomized Response: Theory and Techniques; Marcel Dekker, Inc.: New York, NY, USA, 1988. [Google Scholar]
- Chaudhuri, A. Randomized Response and Indirect Questioning Techniques in Surveys; Chapman & Hall: London, UK, 2011. [Google Scholar]
- Chaudhuri, A.; Christofides, T. Indirect Questioning in Sample Surveys; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Chaudhuri, A.; Christofides, T.; Rao, C. Data Gathering, Analysis and Protection of Privacy through Randomized Response Techniques: Qualitative and Quantitative Human Traits; Elsevier: Amsterdam, The Netherlands, 2016; Volume 34. [Google Scholar]
- Scheers, N.; Dayton, C. Improved estimation of academic cheating behavior using the randomized response technique. Res. High. Educ. 1987, 26, 61–69. [Google Scholar] [CrossRef]
- Blair, G.; Imai, K.; Zhou, Y. Design and Analysis of randomized response technique. J. Am. Stat. Assoc. 2005, 110, 1304–1319. [Google Scholar] [CrossRef]
- Van den Hout, A.; van der Heijden, P.; Gilchrist, R. The logistic regression model with response variables subject to randomized response. Comput. Stat. Data Anal. 2007, 51, 6060–6069. [Google Scholar] [CrossRef] [Green Version]
- Fox, J.; Veen, D.; Klotzke, K. Generalized Linear Mixed Models for Randomized Responses. Methodology 2019, 15, 1–18. [Google Scholar] [CrossRef]
- Hsieh, S.; Lee, S.; Shen, P. Logistic regression analysis of randomized response data with missing covariates. J. Stat. Plan. Inference 2010, 140, 927–940. [Google Scholar] [CrossRef]
- Singh, S.; Joarder, A.; King, M. Regression analysis using scrambled response. Aust. N. Z. J. Stat. 1996, 38, 201–211. [Google Scholar] [CrossRef]
- Van der Hout, A.; Kooiman, P. Estimating the linear regression model with categorical covariates subject to randomized response. Comput. Stat. Data Anal. 2006, 50, 3311–3323. [Google Scholar] [CrossRef]
- Arnab, R. Non-negative variance estimator in randomized response surveys. Commun. Stat. Theory Method 1994, 23, 1743–1752. [Google Scholar] [CrossRef]
- Barabesi, L.; Diana, G.; Perri, P. Design-based distribution function estimation for stigmatized populations. Metrika 2013, 76, 919–935. [Google Scholar] [CrossRef]
- Hájek, J. Comment on An essay on the logical foundations of survey sampling by Basu, D. In Foundations of Statistical Inference; Godambe, V.P., Sprott, D.A., Eds.; Springer: Berlin/Heidelberg, Germany, 1971. [Google Scholar]
- Särndal, C.E.; Swensson, B.; Wretman, J. Model Assisted Survey Sampling (Springer Series in Statistics); Springer: Berlin/Heidelberg, Germany, 1992. [Google Scholar]
- Binder, D. On the Variances of Asymptotically Normal Estimators from Complex Surveys. Int. Stat. Rev. Rev. Int. Stat. 1983, 51, 279–292. [Google Scholar] [CrossRef]
- Lumley, T. Package ‘survey’: Analysis of Complex Survey Samples. Available online: https://cran.r-project.org/web/packages/survey/index.html (accessed on 15 December 2020).
- Tukey, J. Bias and confidence in not-quite large samples. Ann. Math. Stat. 1958, 29, 614. [Google Scholar]
- Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat. 1979, 7, 1–26. [Google Scholar] [CrossRef]
- Wolter, K. Introduction to Variance Estimation; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Arnab, R.; Cobo, B. Variance jackknife estimation for randomized response surveys: A simulation study and an application to explore cheating in exams and bullying. Comput. Math. Methods 2020, 2, e1073. [Google Scholar] [CrossRef] [Green Version]
- Rueda, M.; Cobo, B.; Perri, P.F. Randomized response estimation in multiple frame surveys. Int. J. Comput. Math. 2020, 97, 189–206. [Google Scholar] [CrossRef]
- Booth, J.; Butler, R.; Hall, P. Bootstrap methods for finite populations. J. Am. Stat. Assoc. 1994, 89, 1282–1289. [Google Scholar] [CrossRef]
- Antal, E.; Tillé, Y. A direct bootstrap method for complex sampling designs from a finite population. J. Am. Stat. Assoc. 2011, 106, 534–543. [Google Scholar] [CrossRef] [Green Version]
- Antal, E.; Tillé, Y. A new resampling method for sampling designs without replacement: The doubled half bootstrap. Comput. Stat. 2014, 29, 1345–1363. [Google Scholar] [CrossRef] [Green Version]
- Zhao, P.; Haziza, D.; Wu, C. Survey weighted estimating equation inference with nuisance functionals. J. Econom. 2020, 216, 516–536. [Google Scholar] [CrossRef]
- Wu, C.; Thompson, M.E. Resampling and Replication Methods. In Sampling Theory and Practice. ICSA Book Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar] [CrossRef]
- Goldfeld, K. Package ‘simstudy’: Simulation of Study Data. Available online: https://cran.r-project.org/web/packages/simstudy/index.html (accessed on 15 December 2020).
- Tillé, Y.; Matei, A. Package ‘sampling’: Survey Sampling. Available online: https://cran.r-project.org/web/packages/sampling/index.html (accessed on 15 December 2020).
BBB Method | EH Method | |||||||
---|---|---|---|---|---|---|---|---|
n | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE |
250 | 4.374 | 9.152 | 1.51 | 1.44 | 7.83 | 14.73 | 2.89 | 2.25 |
500 | 2.99 | 4.13 | 0.56 | 0.07 | 6.06 | 7.07 | 1.89 | 1.08 |
750 | 1.46 | 2.2 | 0.07 | 0.86 | 1.56 | 3.27 | 1.22 | 0.89 |
Asymptotic Variance | Jackknife | Bootstrap | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
n | L | Cov | L | Cov | L | Cov | L | Cov | L | Cov | L | Cov |
BBB method | ||||||||||||
250 | 0.161 | 0.967 | 0.085 | 0.952 | 0.122 | 0.936 | 0.066 | 0.931 | 0.129 | 0.954 | 0.070 | 0.940 |
500 | 0.116 | 0.969 | 0.060 | 0.965 | 0.085 | 0.926 | 0.045 | 0.924 | 0.095 | 0.950 | 0.051 | 0.953 |
750 | 0.082 | 0.982 | 0.043 | 0.971 | 0.058 | 0.911 | 0.031 | 0.905 | 0.070 | 0.960 | 0.038 | 0.966 |
EH model | ||||||||||||
250 | 0.189 | 0.952 | 0.101 | 0.956 | 0.153 | 0.922 | 0.083 | 0.930 | 0.163 | 0.933 | 0.089 | 0.939 |
500 | 0.133 | 0.957 | 0.069 | 0.954 | 0.107 | 0.931 | 0.057 | 0.930 | 0.120 | 0.958 | 0.064 | 0.960 |
750 | 0.092 | 0.976 | 0.049 | 0.958 | 0.072 | 0.912 | 0.039 | 0.920 | 0.087 | 0.964 | 0.047 | 0.964 |
n | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE |
BBB method | ||||||||||||
250 | 0.667 | 1.023 | 0.616 | 1.017 | 0.076 | 0.082 | 0.062 | 0.093 | 0.039 | 0.099 | 0.061 | 0.118 |
500 | 0.616 | 0.619 | 0.530 | 0.546 | 0.143 | 0.077 | 0.139 | 0.074 | 0.081 | 0.094 | 0.091 | 0.095 |
750 | 0.562 | 0.450 | 0.484 | 0.382 | 0.228 | 0.070 | 0.231 | 0.071 | 0.126 | 0.075 | 0.130 | 0.071 |
EH model | ||||||||||||
250 | 0.391 | 0.489 | 0.397 | 0.534 | 0.109 | 0.043 | 0.071 | 0.044 | 0.009 | 0.048 | 0.057 | 0.061 |
500 | 0.353 | 0.251 | 0.303 | 0.238 | 0.129 | 0.042 | 0.119 | 0.039 | 0.094 | 0.052 | 0.109 | 0.053 |
750 | 0.263 | 0.145 | 0.244 | 0.149 | 0.233 | 0.040 | 0.222 | 0.032 | 0.121 | 0.046 | 0.141 | 0.050 |
BBB Method | EH Method | |||||||
---|---|---|---|---|---|---|---|---|
n | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE |
250 | 0.042 | 0.090 | 0.083 | 0.092 | 0.083 | 0.050 | 0.085 | 0.051 |
500 | 0.128 | 0.047 | 0.158 | 0.048 | 0.132 | 0.026 | 0.129 | 0.027 |
750 | 0.168 | 0.029 | 0.201 | 0.030 | 0.119 | 0.016 | 0.116 | 0.017 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rueda, M.d.M.; Cobo, B.; Arcos, A. Regression Models in Complex Survey Sampling for Sensitive Quantitative Variables. Mathematics 2021, 9, 609. https://doi.org/10.3390/math9060609
Rueda MdM, Cobo B, Arcos A. Regression Models in Complex Survey Sampling for Sensitive Quantitative Variables. Mathematics. 2021; 9(6):609. https://doi.org/10.3390/math9060609
Chicago/Turabian StyleRueda, María del Mar, Beatriz Cobo, and Antonio Arcos. 2021. "Regression Models in Complex Survey Sampling for Sensitive Quantitative Variables" Mathematics 9, no. 6: 609. https://doi.org/10.3390/math9060609