# Regression Models in Complex Survey Sampling for Sensitive Quantitative Variables

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Randomized Response Survey Designs for Quantitative Variables

## 3. Regression for RR Models

#### 3.1. Estimation of the Regression Coefficients

- A.1. The survey design satisfies $\widehat{\mathbf{U}}\left(\beta \right)-\mathbf{U}\left(\beta \right)={O}_{p}\left({n}^{-1/2}\right)$ for any $\beta \in \Theta $.
- A.2. The survey design ensures that $\widehat{\mathbf{U}}\left(\beta \right)$ is asymptotically normally distributed with mean $\mathbf{U}\left(\beta \right)$ and entries of the variance-covariance matrix at the order ${n}^{-1}$ for any $\beta \in \Theta $.
- A.3. The survey design satisfies $\frac{\partial \widehat{\mathbf{U}}}{\partial \beta}={O}_{p}\left(1\right)$ and $\frac{{\partial}^{2}\widehat{\mathbf{U}}}{\partial \beta \partial {\beta}^{\prime}}={O}_{p}\left(1\right)$ for any $\beta \in \Theta $.

**Theorem**

**1.**

**Proof.**

**Remark**

**1.**

**Remark**

**2.**

#### 3.2. The Homoscedastic Linear Model

#### 3.3. The Ratio Model

## 4. Simulation Study

**R**. In the first study, the variables were simulated using the

**R**-package simstudy ([50]) and the samples were selected with sampling package discussed in ([51]).

## 5. Real Application

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Arnab, R. Randomized response trials: A unified approach for qualitative data. Commun. Stat. Theory Methods
**1996**, 25, 1173–1183. [Google Scholar] [CrossRef] - Barabesi, L.; Marcheselli, M. A practical implementation and Bayesian estimation in Franklin’s randomized response procedure. Commun. Stat. Simul. Comput.
**2006**, 35, 563–573. [Google Scholar] [CrossRef] - Barabesi, L. A design-based randomized response procedure for the estimation of population proportion and sensitivity level. J. Stat. Plan. Inference
**2008**, 138, 2398–2408. [Google Scholar] [CrossRef] - Perri, P. Modified randomized devices for Simmons’ model. Model Assist. Stat. Appl.
**2008**, 3, 233–239. [Google Scholar] [CrossRef] - Lee, C.; Sedory, S.; Singh, S. Estimating at least seven measures of qualitative variables from a single sample using randomized response technique. Stat. Probab. Lett.
**2013**, 83, 399–409. [Google Scholar] [CrossRef] - Liu, Y.; Tian, G. Multi-category parallel models in the design of surveys with sensitive questions. Stat. Interface
**2013**, 6, 137–142. [Google Scholar] [CrossRef] [Green Version] - Greenberg, B.; Kuebler, R.; Abernathy, J.; Horvitz, D. Application of the randomized response technique in obtaining quantitative data. J. Am. Stat. Assoc.
**1971**, 66, 243–250. [Google Scholar] [CrossRef] - Eriksson, S. A new model for randomized response. Int. Stat. Rev.
**1973**, 41, 40–43. [Google Scholar] [CrossRef] - Pollock, K.; Bek, Y. A comparison of three randomized response models for quantitative data. J. Am. Stat. Assoc.
**1976**, 71, 884–886. [Google Scholar] [CrossRef] - Eichhorn, B.; Hayre, L. Scrambled randomized response methods for obtaining sensitive quantitative data. J. Stat. Plan. Inference
**1983**, 7, 307–316. [Google Scholar] [CrossRef] - Bar-Lev, S.; Bobovitch, E.; Boukai, B. A note on randomized response models for quantitative data. Metrika
**2004**, 60, 255–260. [Google Scholar] [CrossRef] - Gjestvang, R.; Singh, S. A new randomized response model. J. R. Stat. Soc. B
**2006**, 68, 523–530. [Google Scholar] [CrossRef] - Saha, A. A simple randomized response technique in complex surveys. Metron
**2007**, LXV, 59–66. [Google Scholar] - Singh, S.; Kim, J. A pseudo-empirical log-likelihood estimator using scrambled responses. Statist. Probab. Lett.
**2007**, 81, 345–351. [Google Scholar] [CrossRef] - Huang, K. Estimation for sensitive characteristics using optional randomized response technique. Qual. Quant.
**2008**, 42, 679–686. [Google Scholar] [CrossRef] - Bouza, C. Ranked set sampling and randomized response procedures for estimating the mean of a sensitive quantitative character. Metrika
**2009**, 70, 267–277. [Google Scholar] [CrossRef] - Diana, G.; Perri, P. A new scrambled response models for estimating the mean of a sensitive quantitative character. J. Appl. Stat.
**2010**, 37, 1875–1890. [Google Scholar] [CrossRef] - Diana, G.; Perri, P. Calibration-based approach to sensitive data: A simulation study. J. Appl. Stat.
**2012**, 39, 53–65. [Google Scholar] [CrossRef] - Gupta, S.; Shabbir, J.; Sehra, S. Mean and sensitivity estimation in optional randomized response models. J. Stat. Plan. Inference
**2010**, 140, 2870–2874. [Google Scholar] [CrossRef] - Odumade, O.; Singh, S. An alternative to the Bar-Lev, Bobovitch, and Boukai randomized response model. Sociol. Methods Res.
**2010**, 20, 1–16. [Google Scholar] [CrossRef] - Arcos, A.; Rueda, M.; Singh, S. A generalized approach to randomised response for quantitative variables. Qual. Quant.
**2015**, 49, 1239–1256. [Google Scholar] [CrossRef] - Fox, J.; Tracy, P. Randomized Response: A Method for Sensitive Survey; Sage Publication, Inc.: Thousand Oaks, CA, USA, 1986. [Google Scholar]
- Chaudhuri, A.; Mukerjee, R. Randomized Response: Theory and Techniques; Marcel Dekker, Inc.: New York, NY, USA, 1988. [Google Scholar]
- Chaudhuri, A. Randomized Response and Indirect Questioning Techniques in Surveys; Chapman & Hall: London, UK, 2011. [Google Scholar]
- Chaudhuri, A.; Christofides, T. Indirect Questioning in Sample Surveys; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Chaudhuri, A.; Christofides, T.; Rao, C. Data Gathering, Analysis and Protection of Privacy through Randomized Response Techniques: Qualitative and Quantitative Human Traits; Elsevier: Amsterdam, The Netherlands, 2016; Volume 34. [Google Scholar]
- Scheers, N.; Dayton, C. Improved estimation of academic cheating behavior using the randomized response technique. Res. High. Educ.
**1987**, 26, 61–69. [Google Scholar] [CrossRef] - Blair, G.; Imai, K.; Zhou, Y. Design and Analysis of randomized response technique. J. Am. Stat. Assoc.
**2005**, 110, 1304–1319. [Google Scholar] [CrossRef] - Van den Hout, A.; van der Heijden, P.; Gilchrist, R. The logistic regression model with response variables subject to randomized response. Comput. Stat. Data Anal.
**2007**, 51, 6060–6069. [Google Scholar] [CrossRef] [Green Version] - Fox, J.; Veen, D.; Klotzke, K. Generalized Linear Mixed Models for Randomized Responses. Methodology
**2019**, 15, 1–18. [Google Scholar] [CrossRef] - Hsieh, S.; Lee, S.; Shen, P. Logistic regression analysis of randomized response data with missing covariates. J. Stat. Plan. Inference
**2010**, 140, 927–940. [Google Scholar] [CrossRef] - Singh, S.; Joarder, A.; King, M. Regression analysis using scrambled response. Aust. N. Z. J. Stat.
**1996**, 38, 201–211. [Google Scholar] [CrossRef] - Van der Hout, A.; Kooiman, P. Estimating the linear regression model with categorical covariates subject to randomized response. Comput. Stat. Data Anal.
**2006**, 50, 3311–3323. [Google Scholar] [CrossRef] - Arnab, R. Non-negative variance estimator in randomized response surveys. Commun. Stat. Theory Method
**1994**, 23, 1743–1752. [Google Scholar] [CrossRef] - Barabesi, L.; Diana, G.; Perri, P. Design-based distribution function estimation for stigmatized populations. Metrika
**2013**, 76, 919–935. [Google Scholar] [CrossRef] - Hájek, J. Comment on An essay on the logical foundations of survey sampling by Basu, D. In Foundations of Statistical Inference; Godambe, V.P., Sprott, D.A., Eds.; Springer: Berlin/Heidelberg, Germany, 1971. [Google Scholar]
- Särndal, C.E.; Swensson, B.; Wretman, J. Model Assisted Survey Sampling (Springer Series in Statistics); Springer: Berlin/Heidelberg, Germany, 1992. [Google Scholar]
- Binder, D. On the Variances of Asymptotically Normal Estimators from Complex Surveys. Int. Stat. Rev. Rev. Int. Stat.
**1983**, 51, 279–292. [Google Scholar] [CrossRef] - Lumley, T. Package ‘survey’: Analysis of Complex Survey Samples. Available online: https://cran.r-project.org/web/packages/survey/index.html (accessed on 15 December 2020).
- Tukey, J. Bias and confidence in not-quite large samples. Ann. Math. Stat.
**1958**, 29, 614. [Google Scholar] - Efron, B. Bootstrap methods: Another look at the jackknife. Ann. Stat.
**1979**, 7, 1–26. [Google Scholar] [CrossRef] - Wolter, K. Introduction to Variance Estimation; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Arnab, R.; Cobo, B. Variance jackknife estimation for randomized response surveys: A simulation study and an application to explore cheating in exams and bullying. Comput. Math. Methods
**2020**, 2, e1073. [Google Scholar] [CrossRef] [Green Version] - Rueda, M.; Cobo, B.; Perri, P.F. Randomized response estimation in multiple frame surveys. Int. J. Comput. Math.
**2020**, 97, 189–206. [Google Scholar] [CrossRef] - Booth, J.; Butler, R.; Hall, P. Bootstrap methods for finite populations. J. Am. Stat. Assoc.
**1994**, 89, 1282–1289. [Google Scholar] [CrossRef] - Antal, E.; Tillé, Y. A direct bootstrap method for complex sampling designs from a finite population. J. Am. Stat. Assoc.
**2011**, 106, 534–543. [Google Scholar] [CrossRef] [Green Version] - Antal, E.; Tillé, Y. A new resampling method for sampling designs without replacement: The doubled half bootstrap. Comput. Stat.
**2014**, 29, 1345–1363. [Google Scholar] [CrossRef] [Green Version] - Zhao, P.; Haziza, D.; Wu, C. Survey weighted estimating equation inference with nuisance functionals. J. Econom.
**2020**, 216, 516–536. [Google Scholar] [CrossRef] - Wu, C.; Thompson, M.E. Resampling and Replication Methods. In Sampling Theory and Practice. ICSA Book Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar] [CrossRef]
- Goldfeld, K. Package ‘simstudy’: Simulation of Study Data. Available online: https://cran.r-project.org/web/packages/simstudy/index.html (accessed on 15 December 2020).
- Tillé, Y.; Matei, A. Package ‘sampling’: Survey Sampling. Available online: https://cran.r-project.org/web/packages/sampling/index.html (accessed on 15 December 2020).

**Figure 1.**Boxplot for ${\widehat{\beta}}_{W1}$ and ${\widehat{\beta}}_{W2}$ in SRSS in the BBB model (

**left**) and EH model (

**right**).

**Figure 2.**Boxplot for AV, JK and BS variances of ${\widehat{\beta}}_{W1}$ and ${\widehat{\beta}}_{W2}$ in SRSS in the BBB and EH models.

**Table 1.**Absolute relative bias and relative mean squared error in percent for ${\widehat{\beta}}_{W1}$ and ${\widehat{\beta}}_{W2}$ in SRSS for the BBB and EH models.

BBB Method | EH Method | |||||||
---|---|---|---|---|---|---|---|---|

${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{1}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{2}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{1}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{2}}$ | |||||

n | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE |

250 | 4.374 | 9.152 | 1.51 | 1.44 | 7.83 | 14.73 | 2.89 | 2.25 |

500 | 2.99 | 4.13 | 0.56 | 0.07 | 6.06 | 7.07 | 1.89 | 1.08 |

750 | 1.46 | 2.2 | 0.07 | 0.86 | 1.56 | 3.27 | 1.22 | 0.89 |

**Table 2.**Average length and coverage, relative bias and relative mean squared error for AV, JK and BS variances of ${\widehat{\beta}}_{W1}$ and ${\widehat{\beta}}_{W2}$ in SRSS for the BBB and EH models.

Asymptotic Variance | Jackknife | Bootstrap | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{1}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{2}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{1}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{2}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{1}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}\mathbf{2}}$ | |||||||

n | L | Cov | L | Cov | L | Cov | L | Cov | L | Cov | L | Cov |

BBB method | ||||||||||||

250 | 0.161 | 0.967 | 0.085 | 0.952 | 0.122 | 0.936 | 0.066 | 0.931 | 0.129 | 0.954 | 0.070 | 0.940 |

500 | 0.116 | 0.969 | 0.060 | 0.965 | 0.085 | 0.926 | 0.045 | 0.924 | 0.095 | 0.950 | 0.051 | 0.953 |

750 | 0.082 | 0.982 | 0.043 | 0.971 | 0.058 | 0.911 | 0.031 | 0.905 | 0.070 | 0.960 | 0.038 | 0.966 |

EH model | ||||||||||||

250 | 0.189 | 0.952 | 0.101 | 0.956 | 0.153 | 0.922 | 0.083 | 0.930 | 0.163 | 0.933 | 0.089 | 0.939 |

500 | 0.133 | 0.957 | 0.069 | 0.954 | 0.107 | 0.931 | 0.057 | 0.930 | 0.120 | 0.958 | 0.064 | 0.960 |

750 | 0.092 | 0.976 | 0.049 | 0.958 | 0.072 | 0.912 | 0.039 | 0.920 | 0.087 | 0.964 | 0.047 | 0.964 |

n | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE |

BBB method | ||||||||||||

250 | 0.667 | 1.023 | 0.616 | 1.017 | 0.076 | 0.082 | 0.062 | 0.093 | 0.039 | 0.099 | 0.061 | 0.118 |

500 | 0.616 | 0.619 | 0.530 | 0.546 | 0.143 | 0.077 | 0.139 | 0.074 | 0.081 | 0.094 | 0.091 | 0.095 |

750 | 0.562 | 0.450 | 0.484 | 0.382 | 0.228 | 0.070 | 0.231 | 0.071 | 0.126 | 0.075 | 0.130 | 0.071 |

EH model | ||||||||||||

250 | 0.391 | 0.489 | 0.397 | 0.534 | 0.109 | 0.043 | 0.071 | 0.044 | 0.009 | 0.048 | 0.057 | 0.061 |

500 | 0.353 | 0.251 | 0.303 | 0.238 | 0.129 | 0.042 | 0.119 | 0.039 | 0.094 | 0.052 | 0.109 | 0.053 |

750 | 0.263 | 0.145 | 0.244 | 0.149 | 0.233 | 0.040 | 0.222 | 0.032 | 0.121 | 0.046 | 0.141 | 0.050 |

**Table 3.**Absolute relative bias and relative mean squared error in percent for ${\widehat{\beta}}_{R}$ and ${\widehat{\beta}}_{W}$ in SRS for the BBB and EH models.

BBB Method | EH Method | |||||||
---|---|---|---|---|---|---|---|---|

${\widehat{\mathbf{\beta}}}_{\mathbf{R}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{R}}$ | ${\widehat{\mathbf{\beta}}}_{\mathbf{W}}$ | |||||

n | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE | |RB| | RMSE |

250 | 0.042 | 0.090 | 0.083 | 0.092 | 0.083 | 0.050 | 0.085 | 0.051 |

500 | 0.128 | 0.047 | 0.158 | 0.048 | 0.132 | 0.026 | 0.129 | 0.027 |

750 | 0.168 | 0.029 | 0.201 | 0.030 | 0.119 | 0.016 | 0.116 | 0.017 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Rueda, M.d.M.; Cobo, B.; Arcos, A.
Regression Models in Complex Survey Sampling for Sensitive Quantitative Variables. *Mathematics* **2021**, *9*, 609.
https://doi.org/10.3390/math9060609

**AMA Style**

Rueda MdM, Cobo B, Arcos A.
Regression Models in Complex Survey Sampling for Sensitive Quantitative Variables. *Mathematics*. 2021; 9(6):609.
https://doi.org/10.3390/math9060609

**Chicago/Turabian Style**

Rueda, María del Mar, Beatriz Cobo, and Antonio Arcos.
2021. "Regression Models in Complex Survey Sampling for Sensitive Quantitative Variables" *Mathematics* 9, no. 6: 609.
https://doi.org/10.3390/math9060609