1. Introduction
Research in socioeconomics and biometrics often involves exploring highly sensitive issues such as substance abuse, induced abortion, HIV status, sexual behavior, domestic violence, illegitimacy of birth, impaired driving, and social welfare fraud. When addressed through traditional face-to-face interviews, these topics frequently result in inaccurate responses or complete non-response due to fear of stigma or social judgment. This presents a significant challenge in collecting accurate and reliable data. To address these limitations, specifically to ensure greater data accuracy, protect respondent confidentiality, and reduce high non-response rates—Randomized Response (RR) techniques have been developed. The Randomized Response Technique (RRT) was first introduced by [
1] as a method to mitigate the bias introduced by direct questioning on sensitive topics. RRT incorporates a deliberate element of randomness into the survey process, allowing respondents to maintain anonymity and feel less pressure to provide socially desirable answers. This randomization helps encourage truthful responses, thereby enhancing both the privacy of respondents and the reliability of the data collected. Since Warner’s original work, the technique has been further developed and refined by several researchers. The applicability of the Randomized Response Technique (RRT) in real-life settings has been demonstrated by numerous researchers across a variety of sensitive topics. For example, ref. [
2] investigated the illegitimacy of offspring; ref. [
3] studied the incidence of induced abortions; ref. [
4] examined drug usage; ref. [
5] focused on drinking and driving; and van der [
6] explored social security fraud.
Over time, the Randomized Response Technique (RRT) has undergone significant refinements aimed at improving respondent cooperation and enhancing the accuracy of estimates for sensitive survey variables. Initial efforts to address the collection of quantitative sensitive data were made by [
7], who extended the unrelated question model of [
2] to handle numerical responses. Ref. [
8] introduced additive and multiplicative models, allowing respondents to mask their true values by incorporating a random variable drawn from a known distribution. This idea was further expanded by [
9], who formalized the multiplicative version as the scrambled response method. Ref. [
10] advanced the methodology by proposing an optional randomized response model that demonstrated improved efficiency over earlier methods. Ref. [
11] contributed a generalization of existing models through a design parameter, resulting in estimators with uniformly lower variance under mild assumptions. Further advancements in RRT have focused on incorporating auxiliary information to enhance estimation precision. Ref. [
12] studied such techniques within the framework of sampling with unequal probabilities. In a subsequent study, ref. [
13] proposed estimators for the mean of a sensitive quantitative variable, enabling data collection without compromising respondent confidentiality. Building on their foundational work, several researchers have extended and refined ORR models. Notable contributions include those by [
14,
15,
16,
17] among others who proposed various improvements aimed at enhancing estimation efficiency, increasing respondent cooperation, and expanding the applicability of RRT in diverse survey contexts.
There are two primary approaches to the Randomized Response Technique (RRT): the Compulsory Randomized Response Technique (CRRT) and the Optional Randomized Response Technique (ORRT). In CRRT, every respondent is required to provide a randomized (scrambled) response, regardless of the nature of the question. In contrast, ORRT allows respondents to choose whether to answer a question directly or to use a randomization device to provide a scrambled response, depending on their perception of the question’s sensitivity. This flexibility in ORRT is particularly valuable because the perceived sensitivity of a question can vary across individuals, what one respondent considers sensitive, another may be willing to answer directly. To accommodate this variation, ORRT provides respondents with a randomization mechanism (supplied by the interviewer), which they may choose to use if they feel uncomfortable answering a question directly.
Partial RRT (PRRT) is another model where some of the respondents provide a true response without using RRT and the rest provide a randomized response. PRRT looks similar to the ORRT but is fundamentally different. In PRRT, the researcher decides what proportion of the respondents will provide a true response. This proportion is assumed known. In ORRT, the proportion of respondents who provide a true response without using RRT is unknown. PRRT has been discussed by many authors such as [
10,
14,
18,
19,
20,
21]. Beyond the sensitivity of survey topics, measurement error represents a significant concern in many data collection contexts. For example, the diagnosis of conditions such as hepatitis, breast cancer, or AIDS often relies on medical tests—like imaging procedures or blood analyses—that are not perfectly accurate. Such tests can yield erroneous results due to calibration issues or inherent limitations in the diagnostic tools. Likewise, variables such as income, expenditure, agricultural input levels (e.g., fertilizer or water usage) are also susceptible to reporting errors, even when randomized response techniques are not employed. Recognizing the importance of this issue, a number of researchers have investigated the role of measurement error within sampling theory. Key contributions include those of [
16,
17,
22,
23,
24,
25,
26] among others. Their work has helped to highlight and quantify the potential distortions that measurement errors can introduce in survey estimates.
The present study aims to address the sensitivity of the study characteristic by applying three randomized response techniques—Compulsory Randomized Response Technique (CRRT), Optional Randomized Response Technique (ORRT), and Partial Randomized Response Technique (PRRT)—in conjunction with various calibration estimators, including ratio-type and exponential-type estimators, under measurement error. A three-stage simulation study is conducted using real COVID-19 infection data to assess the performance of the proposed estimators: (i) under measurement error, (ii) without measurement error, and (iii) without both randomization and measurement error. The findings indicate that the ORRT model consistently outperforms CRRT and PRRT in terms of efficiency, underscoring its reliability and practical effectiveness in handling sensitive survey data.
2. Survey Design and Notations
Assume a finite population consisting of N identifiable units. Let Y and X respectively be the sensitive variable under study and a non-sensitive auxiliary variable. The population means of Y and X are denoted by and , respectively. Our aim is to estimate in the presence of measurement errors.
Let a sample of size n be selected from the population using a sampling design d having individual and joint probabilities and . Let
Based on this sampling design, we intend to apply CRRT, ORRT and PRRT to handle sensitivity of the study variable in detail.
PRRT, CRRT and ORRT Models
Let Z be the coded response variable corresponding to sensitive variable Y. Let and be mutually independent scrambling variables which are also independent of the sensitive variable Y such that and Let P be the probability that a respondent provides a direct response without using randomization, as in ORRT and PRRT.
Following the approach of [
17], a randomized response technique is employed to estimate the population mean of a sensitive variable. Their original model is adapted here to allow respondents the option to either report a direct response or a randomized (scrambled) response, depending on the outcome of a randomization device.
Each respondent is asked to rotate a spinner bearing the following two instructions:
- –
Report the sensitive variable Y (with probability P);
- –
Report the scrambled response () (with probability 1 − P).
can be written as
Assuming the mutual independence of the random variables in Equation (
1), and taking expectation, we get
Hence, the population mean of the sensitive variable under PRRT (with known
P) is given by
For
in Equation (
1), the model becomes CRRT. So, the response received from
respondent using CRRT is given as
In the above equation, taking the mean on both sides, we get
Hence, the population mean of sensitive variable under CRRT is given by
Proposed ORRT Model: The response provided by the respondent
j is given as
such that
Note that
P is an unknown parameter in this case, solely based on what proportion of the population considers the question sensitive. Taking the mean,
Note that the crucial element of this randomization is that the unknown
P is not needed in estimating
.
Equations (
2), (
4) and (
5) easily provide
estimators through estimators of
, which can be estimated by the sample mean of reported responses.
We now consider the case where the observed coded response variables and auxiliary variable are subject to measurement errors. For the PRRT, CRRT, and ORRT frameworks, let
and
represent the observed counterparts of the true variables
Z and
X respectively. The measurement error model specifies the relationship between these observed values and their true underlying counterparts. The classical additive measurement error model is given as follows:
Assume that the observational errors
and
are normally distributed with with mean 0 and variances
and
respectively. Additionally, the observational error between the study and the auxiliary variable are assumed to be correlated.
3. Proposed Estimators in Presence of Measurement Error
Horvitz–Thompson [27] estimators for coded response variable: First, in the proposed sampling framework, to estimate a sensitive population mean, the Horvitz–Thompson [
27] estimator in the presence of measurement error is as given below
Calibration estimators for coded response variable: Calibration is widely regarded as one of the most effective and commonly employed techniques in survey sampling for parameter estimation, as it enables efficient and reliable inference by incorporating auxiliary information. The success of calibration largely hinges on the quality of these auxiliary variables, particularly their accuracy, availability, and, most crucially, their correlation with the study variable. When auxiliary variables are strongly correlated with the variable of interest, they can significantly reduce the variance of estimators and help mitigate non-sampling errors, including measurement error. Ref. [
28] introduced the calibration framework using a chi-square-type distance function to adjust the original survey weights such that they conform to known population totals. Building on this foundational approach, calibration estimators have been extended to accommodate scenarios involving measurement error, especially when estimating sensitive or coded response variables. These advancements have expanded the applicability of calibration techniques to more complex and error-prone survey environments.To enhance the performance of the traditional Horvitz–Thompson estimator [
27], we adopt a calibration approach that refines the initial design weights. Specifically, the original weight
is replaced by a calibrated weight
obtained using known auxiliary information. In this context, we propose the following basic calibration estimator, ratio type calibration estimator and exponential type calibration estimator in presence of measurement error:
and
The following distance measure is considered in order to find the calibration weights
:
Calibration constraint based on sample
is given as
where
is arbitrarily chosen constant. Our objective is to determine the calibrated weight
such that it remains as close as possible to the original design weight
by minimizing the distance function
, subject to the calibration constraint specified in Equation (
11). This leads to an optimization problem that can be addressed through the minimization of the following Lagrangian function:
Differentiating
in Equation (
12) with respect to the calibration weight
and equating to zero, the calibration weight is obtained as
Solving above Equation (
13), the Lagrange multiplier
is obtained as
and substituting the value of
in Equation (
13), the calibration weight is obtained as
Substituting
from Equation (
15) into Equations (
7), (
8) and (
9) respectively, we get the calibrated estimators under measurement error as follows:
and
with
,
and
.
5. Properties of Proposed Estimator in the Presence of Measurement Error
In order to study the properties of the estimators
we use the following notation and the corresponding results:
Using these notations, the proposed estimators in Equation (
6) under the SRSWOR sampling design can be written as
Squaring both sides, taking expectation, and ignoring the finite population correction factor, we have
Using these notationin in Equation (
19), we have
Squaring both sides of Equation (
23) and using first order approximations, we get
Taking expectation on both sides of above Equation (
24), we obtain the variance of
for large
N as
Differentiating Equation (
25) with respect to
and equationg to zero, we get the optimum value of
as
given by
Substituting, the optimum value of
in Equation (
25), we obtain
where,
On similar lines, as followed for
, we have the following variance expression for the proposed ratio type calibrated estimator
and exponential type calibrated estimator
in the presence of measurement error:
where,
where,
7. Simulation Study
To evaluate the behavior of the proposed calibration estimators which are used under PRRT, CRRT and ORRT models in presence of measurement error and to compare them, a simulation study has been carried out.
For this purpose, a real population comprising of
districts of southern states in India (Andhra Pradesh, Karnataka, Kerala and Tamil Nadu) has been considered. [
Population Source: https://mohfw.gov.in] (accessed on 3 August 2025).
The variables considered in the study are:
The positivity rate of COVID-19 in ith district in the week from 21 June to 27 June 2021;
The positivity rate of COVID-19 in ith district in the week from 18 June to 24 June 2021.
The two scrambling variables and are assumed to follow Normal distribution such that , and .
The artificial data for u and v have also been generated from Normal distribution with mean 0 and variance 4 each using MATLAB (version 7.4). The parameters of considered population are: and
For the real data considered above, a simulation study is carried out using MATLAB. 10,000 independent replications of the entire framework were carried out.
For the data considered above, the proposed calibration estimator
has been compared to Horvitz–Thompson estimator
, ratio type calibration estimator
and exponential type calibration estimator
under PRRT, CRRT and ORRT models in the presence of measurement error. Therefore, the simulated percent relative efficiencies (PRE) for the same are defined as
where
Similarly, and
are defined.
The simulation results obtained for
and 3 are presented in
Table 2 and Table 4 respectively.
For the proposed calibrated estimator, to assess the impact of measurement error under the CRRT, PRRT, and ORRT models, the variances of the calibration estimators
and
and
E are compared with the corresponding variances of estimators obtained in the absence of measurement error. The relevant Percent Relative Efficiency (PRE) was computed through a simulation study as follows:
where
and
can be obtained from
,
and
respectively by substituting
The simulation results of
and 3 are shown in
Table 3 and
Table 4 respectively, where
Similarly, and can be computed.
Further, to assess the impact of the randomization process, the variances of the estimators
and
are compared with their counterparts obtained in the absence of both randomization and measurement error, as given by
The estimators
can be obtained from
and
respectively by taking
and dealing with
Y directly instead of coding the response by randomization. The estimator variance is given by
The simulation results of
and 3 are shown in
Table 4 and
Table 5 respectively.
Table 2 and
Table 4 present the calibration estimator, ratio-type calibration estimator, and exponential-type calibration estimator for the PRRT, CRRT, and ORRT models under measurement error for sensitive mean estimation. These results demonstrate the feasibility of applying calibration estimators in the presence of measurement error. It is also observed that the Percent Relative Efficiency (PRE) exceeds 100 when the proposed calibrated estimator
is compared with the Horvitz-Thompson type estimator
the ratio-type calibrated estimator
and the exponential-type calibrated estimator
under CRRT, PRRT and ORRT models. Furthermore, for varying
P it can be seen that
under CRRT, PRRT and ORRT for both the choices of
n. Additionally, the exponential-type calibration estimator exhibits higher efficiency than the ratio-type calibrated estimator when compared within the same calibration framework under the PRRT and ORRT models. In terms of overall efficiency, the proposed ORRT model outperforms the PRRT model.
However,
Table 3 and
Table 4 clearly demonstrate that the variances of measurement errors have a significant impact under the ORRT, PRRT, and CRRT models. It is observed that the PRE values decrease when calibrated estimators are compared in the presence of measurement error versus their absence.
Table 5 compares the proposed estimators with their direct versions, which do not involve measurement error or randomization. It is evident that all
values are below 100 for the ORRT, PRRT, and CRRT models. This suggests that calibration estimators under these models do not achieve higher efficiency than estimators based on direct responses without measurement errors or randomization. Nevertheless, randomization is indispensable in practice, as the sensitive nature of the questions often results in biased responses or non-response when direct questioning is applied.