Next Article in Journal
Inbound Truck Scheduling for Workload Balancing in Cross-Docking Terminals
Previous Article in Journal
Finite Element Dynamic Modeling of Smart Structures and Adaptive Backstepping Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Some Calibration Estimators of the Mean of a Sensitive Variable Under Measurement Error

1
Department of Mathematics and Statistics, University of North Carolina at Greensboro, Greensboro, NC 27412, USA
2
Department of Applied Sciences, Bharati Vidyapeeth’s College of Engineering, New Delhi 110063, India
3
Department of Mathematical Sciences, Durham University, Durham DH1 3LE, UK
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(15), 2532; https://doi.org/10.3390/math13152532
Submission received: 9 July 2025 / Revised: 25 July 2025 / Accepted: 1 August 2025 / Published: 6 August 2025

Abstract

This study explores the estimation of the mean of a sensitive variable using calibration estimators under measurement error. Three randomized response techniques are evaluated: Partial Randomized Response Technique, Compulsory Randomized Response Technique, and Optional Randomized Response Technique. Theoretical properties of the proposed estimators are analyzed, and a simulation study using real COVID-19 infection data is conducted. Results indicate that the Optional Randomized Response Technique outperforms Partial Randomized Response Technique and Compulsory Randomized Response Technique in terms of efficiency, underscoring its effectiveness and practical utility for improving data quality in sensitive survey settings.

1. Introduction

Research in socioeconomics and biometrics often involves exploring highly sensitive issues such as substance abuse, induced abortion, HIV status, sexual behavior, domestic violence, illegitimacy of birth, impaired driving, and social welfare fraud. When addressed through traditional face-to-face interviews, these topics frequently result in inaccurate responses or complete non-response due to fear of stigma or social judgment. This presents a significant challenge in collecting accurate and reliable data. To address these limitations, specifically to ensure greater data accuracy, protect respondent confidentiality, and reduce high non-response rates—Randomized Response (RR) techniques have been developed. The Randomized Response Technique (RRT) was first introduced by [1] as a method to mitigate the bias introduced by direct questioning on sensitive topics. RRT incorporates a deliberate element of randomness into the survey process, allowing respondents to maintain anonymity and feel less pressure to provide socially desirable answers. This randomization helps encourage truthful responses, thereby enhancing both the privacy of respondents and the reliability of the data collected. Since Warner’s original work, the technique has been further developed and refined by several researchers. The applicability of the Randomized Response Technique (RRT) in real-life settings has been demonstrated by numerous researchers across a variety of sensitive topics. For example, ref. [2] investigated the illegitimacy of offspring; ref. [3] studied the incidence of induced abortions; ref. [4] examined drug usage; ref. [5] focused on drinking and driving; and van der [6] explored social security fraud.
Over time, the Randomized Response Technique (RRT) has undergone significant refinements aimed at improving respondent cooperation and enhancing the accuracy of estimates for sensitive survey variables. Initial efforts to address the collection of quantitative sensitive data were made by [7], who extended the unrelated question model of [2] to handle numerical responses. Ref. [8] introduced additive and multiplicative models, allowing respondents to mask their true values by incorporating a random variable drawn from a known distribution. This idea was further expanded by [9], who formalized the multiplicative version as the scrambled response method. Ref. [10] advanced the methodology by proposing an optional randomized response model that demonstrated improved efficiency over earlier methods. Ref. [11] contributed a generalization of existing models through a design parameter, resulting in estimators with uniformly lower variance under mild assumptions. Further advancements in RRT have focused on incorporating auxiliary information to enhance estimation precision. Ref. [12] studied such techniques within the framework of sampling with unequal probabilities. In a subsequent study, ref. [13] proposed estimators for the mean of a sensitive quantitative variable, enabling data collection without compromising respondent confidentiality. Building on their foundational work, several researchers have extended and refined ORR models. Notable contributions include those by [14,15,16,17] among others who proposed various improvements aimed at enhancing estimation efficiency, increasing respondent cooperation, and expanding the applicability of RRT in diverse survey contexts.
There are two primary approaches to the Randomized Response Technique (RRT): the Compulsory Randomized Response Technique (CRRT) and the Optional Randomized Response Technique (ORRT). In CRRT, every respondent is required to provide a randomized (scrambled) response, regardless of the nature of the question. In contrast, ORRT allows respondents to choose whether to answer a question directly or to use a randomization device to provide a scrambled response, depending on their perception of the question’s sensitivity. This flexibility in ORRT is particularly valuable because the perceived sensitivity of a question can vary across individuals, what one respondent considers sensitive, another may be willing to answer directly. To accommodate this variation, ORRT provides respondents with a randomization mechanism (supplied by the interviewer), which they may choose to use if they feel uncomfortable answering a question directly.
Partial RRT (PRRT) is another model where some of the respondents provide a true response without using RRT and the rest provide a randomized response. PRRT looks similar to the ORRT but is fundamentally different. In PRRT, the researcher decides what proportion of the respondents will provide a true response. This proportion is assumed known. In ORRT, the proportion of respondents who provide a true response without using RRT is unknown. PRRT has been discussed by many authors such as [10,14,18,19,20,21]. Beyond the sensitivity of survey topics, measurement error represents a significant concern in many data collection contexts. For example, the diagnosis of conditions such as hepatitis, breast cancer, or AIDS often relies on medical tests—like imaging procedures or blood analyses—that are not perfectly accurate. Such tests can yield erroneous results due to calibration issues or inherent limitations in the diagnostic tools. Likewise, variables such as income, expenditure, agricultural input levels (e.g., fertilizer or water usage) are also susceptible to reporting errors, even when randomized response techniques are not employed. Recognizing the importance of this issue, a number of researchers have investigated the role of measurement error within sampling theory. Key contributions include those of [16,17,22,23,24,25,26] among others. Their work has helped to highlight and quantify the potential distortions that measurement errors can introduce in survey estimates.
The present study aims to address the sensitivity of the study characteristic by applying three randomized response techniques—Compulsory Randomized Response Technique (CRRT), Optional Randomized Response Technique (ORRT), and Partial Randomized Response Technique (PRRT)—in conjunction with various calibration estimators, including ratio-type and exponential-type estimators, under measurement error. A three-stage simulation study is conducted using real COVID-19 infection data to assess the performance of the proposed estimators: (i) under measurement error, (ii) without measurement error, and (iii) without both randomization and measurement error. The findings indicate that the ORRT model consistently outperforms CRRT and PRRT in terms of efficiency, underscoring its reliability and practical effectiveness in handling sensitive survey data.

2. Survey Design and Notations

Assume a finite population U = ( U 1 , U 2 , , U N ) consisting of N identifiable units. Let Y and X respectively be the sensitive variable under study and a non-sensitive auxiliary variable. The population means of Y and X are denoted by Y ¯ and X ¯ , respectively. Our aim is to estimate Y ¯ in the presence of measurement errors.
Let a sample s n of size n be selected from the population using a sampling design d having individual and joint probabilities π i = P [ U i ( s n ) ] and π i j = P [ U i , U j ( s n ) ] . Let Δ i j = π i j π i π j .
Based on this sampling design, we intend to apply CRRT, ORRT and PRRT to handle sensitivity of the study variable in detail.

PRRT, CRRT and ORRT Models

Let Z be the coded response variable corresponding to sensitive variable Y. Let S 1 and S 2 be mutually independent scrambling variables which are also independent of the sensitive variable Y such that E ( S i ) = S ¯ i and V ( S i ) = σ S i 2 ; i = 1 , 2 . Let P be the probability that a respondent provides a direct response without using randomization, as in ORRT and PRRT.
Following the approach of [17], a randomized response technique is employed to estimate the population mean of a sensitive variable. Their original model is adapted here to allow respondents the option to either report a direct response or a randomized (scrambled) response, depending on the outcome of a randomization device.
Each respondent is asked to rotate a spinner bearing the following two instructions:
Report the sensitive variable Y (with probability P);
Report the scrambled response ( Y S 1 + S 2 ) (with probability 1 − P).
  • Here, S 1 and S 2 are scrambling variables, assumed to be independent of Y . The response received from j t h respondent is denoted by
Z j = Y j with probability P Y j S 1 j + S 2 j with probability 1 P
Z j can be written as
Z j = Y j A + ( Y j S 1 j + S 2 j ) ( 1 A ) , A B e r n o u l l i ( P ) .
Assuming the mutual independence of the random variables in Equation (1), and taking expectation, we get
E ( Z j ) = E ( Y j ) E ( A ) + E ( Y j S 1 j + S 2 j ) E ( 1 A )
Z ¯ = Y ¯ P + ( Y ¯ S ¯ 1 + S ¯ 2 ) ( 1 P ) .
Hence, the population mean of the sensitive variable under PRRT (with known P) is given by
Y ¯ = Z ¯ ( 1 P ) S ¯ 2 P + ( 1 P ) S ¯ 1 .
For P = 0 in Equation (1), the model becomes CRRT. So, the response received from j t h respondent using CRRT is given as
Z j = Y j S 1 j + S 2 j
In the above equation, taking the mean on both sides, we get
Z ¯ = Y ¯ S ¯ 1 + S ¯ 2 .
Hence, the population mean of sensitive variable under CRRT is given by
Y ¯ = Z ¯ S ¯ 2 S ¯ 1 .
Proposed ORRT Model: The response provided by the respondent j is given as
Z j = Y j with probability P Y j S 1 j + S 1 j S 2 j S ¯ 1 S ¯ 2 S ¯ 1 with probability 1 P
such that
Z j = A Y j + ( 1 A ) [ Y j S 1 j + S 1 j S 2 j S ¯ 1 S ¯ 2 S ¯ 1 ] , A B e r n o u l l i ( P ) .
Note that P is an unknown parameter in this case, solely based on what proportion of the population considers the question sensitive. Taking the mean,
E ( Z j ) = P E [ Y j ] + ( 1 P ) E [ Y j S 1 j + S 1 j S 2 j S ¯ 1 S ¯ 2 S ¯ 1 ] = P Y ¯ + ( 1 P ) [ Y ¯ + S ¯ 1 S ¯ 2 S ¯ 1 S ¯ 1 S ¯ 2 S ¯ 1 ] = Y ¯ .
Note that the crucial element of this randomization is that the unknown P is not needed in estimating Y ¯ .
Equations (2), (4) and (5) easily provide Y ¯ estimators through estimators of Z ¯ , which can be estimated by the sample mean of reported responses.
We now consider the case where the observed coded response variables and auxiliary variable are subject to measurement errors. For the PRRT, CRRT, and ORRT frameworks, let z e and x e represent the observed counterparts of the true variables Z and X respectively. The measurement error model specifies the relationship between these observed values and their true underlying counterparts. The classical additive measurement error model is given as follows:
z e i = Z i + u i , and x e i = X i + v i ; for i = 1 , 2 , , N .
Assume that the observational errors u i and v i are normally distributed with with mean 0 and variances σ u 2 and σ v 2 respectively. Additionally, the observational error between the study and the auxiliary variable are assumed to be correlated.

3. Proposed Estimators in Presence of Measurement Error

Horvitz–Thompson [27] estimators for coded response variable: First, in the proposed sampling framework, to estimate a sensitive population mean, the Horvitz–Thompson [27] estimator in the presence of measurement error is as given below
T ^ h m e = 1 N i ϵ s n α i z e i ; α i = 1 π i
Calibration estimators for coded response variable: Calibration is widely regarded as one of the most effective and commonly employed techniques in survey sampling for parameter estimation, as it enables efficient and reliable inference by incorporating auxiliary information. The success of calibration largely hinges on the quality of these auxiliary variables, particularly their accuracy, availability, and, most crucially, their correlation with the study variable. When auxiliary variables are strongly correlated with the variable of interest, they can significantly reduce the variance of estimators and help mitigate non-sampling errors, including measurement error. Ref. [28] introduced the calibration framework using a chi-square-type distance function to adjust the original survey weights such that they conform to known population totals. Building on this foundational approach, calibration estimators have been extended to accommodate scenarios involving measurement error, especially when estimating sensitive or coded response variables. These advancements have expanded the applicability of calibration techniques to more complex and error-prone survey environments.To enhance the performance of the traditional Horvitz–Thompson estimator [27], we adopt a calibration approach that refines the initial design weights. Specifically, the original weight α i is replaced by a calibrated weight w i obtained using known auxiliary information. In this context, we propose the following basic calibration estimator, ratio type calibration estimator and exponential type calibration estimator in presence of measurement error:
T C m e = 1 N i s n w i z e i ,
T R m e = 1 N i s n w i z e i X i x e i
and
T E m e = 1 N i s n w i z e i exp X i x e i X i + x e i
The following distance measure is considered in order to find the calibration weights w i :
D ^ 1 ( w i , α i ) = i = 1 n ( w i α i ) 2 α i q i .
Calibration constraint based on sample s n is given as
1 N i s n w i x e i = X ¯
where q i is arbitrarily chosen constant. Our objective is to determine the calibrated weight w i such that it remains as close as possible to the original design weight α i by minimizing the distance function D ^ 1 ( w i , α i ) , subject to the calibration constraint specified in Equation (11). This leads to an optimization problem that can be addressed through the minimization of the following Lagrangian function:
L 1 m e = D ^ 1 ( w i , α i ) 2 λ 1 1 N i s n w i x e i X ¯ .
Differentiating L 1 m e in Equation (12) with respect to the calibration weight w i and equating to zero, the calibration weight is obtained as
w i = α i + λ 1 x e i α i q i .
Solving above Equation (13), the Lagrange multiplier λ 1 is obtained as
λ 1 = X ¯ i s n α i x e i α i q i x e i 2
and substituting the value of λ 1 in Equation (13), the calibration weight is obtained as
w i = α i + α i q i X ¯ i s n α i x e i x e i i s n x e i 2 q i α i .
Substituting w i from Equation (15) into Equations (7), (8) and (9) respectively, we get the calibrated estimators under measurement error as follows:
T ^ C m e = 1 N i s n z e i + b 1 X ¯ 1 N i s n α i x e i ,
T ^ R m e = 1 N i s n z e i X i x e i + b 2 X ¯ 1 N i s n α i x e i
and
T ^ E m e = 1 N i s n z e i exp X i x e i X i + x e i + b 3 X ¯ 1 N i s n α i x e i
with b 1 = i s n x e i z e i α i q i i s n x e i 2 α q i , b 2 = i s n x e i z e i X i x e i α i q i i s n x ¯ e i 2 α i q i and b 3 = i s n x e i z e i exp X i x e i X i + x e i α i q i i s n x ¯ e i 2 α i q i .

4. Study Under Simple Random Sampling Without Replacement (SRSWOR)

To study the proposed calibration estimators under SRSWOR sampling scheme, following inclusion probabilities under SRSWOR are given for the basic design weight α i be:
π i = N n , π i j = n ( n 1 ) N ( N 1 )
Now assuming q i = 1 , the calibrated estimators T ^ j m e ; j { C , R , E } under SRSWOR scheme is denoted by T - ^ j m e which takes the following forms:
T ^ C m e = z ¯ e n + B 1 X ¯ x ¯ e n with B 1 = i s n x e i z e i i s n x e i 2
T ^ R m e = z ¯ e n X ¯ x ¯ e n + B 2 X ¯ x ¯ e n with B 2 = i s n x e i z e i X i x e i i s n x ¯ e i 2
T ^ E m e = z ¯ e n exp X ¯ x ¯ e n X ¯ + x ¯ e n + B 3 X ¯ x ¯ e n with B 3 = i s n x e i z e i exp X i x e i X i + x e i i s n x ¯ e i 2 .

5. Properties of Proposed Estimator in the Presence of Measurement Error

In order to study the properties of the estimators T ^ j m e ; j { C , R , E } , we use the following notation and the corresponding results:
Q 0 = z ¯ e n Z ¯ 1 , Q 1 = x ¯ e n X ¯ 1 such that , E ( Q i ) = 0 ; i = 0 , and 1 .
E ( Q 0 2 ) = 1 n 1 N S z 2 + S u 2 Z ¯ 2 , E ( Q 1 2 ) = 1 n 1 N S x 2 + S v 2 Z ¯ 2 and E ( Q 0 Q 1 ) = 1 n 1 N ρ z x S z S x + ρ u v S u S v Z ¯ X ¯ .
Using these notations, the proposed estimators in Equation (6) under the SRSWOR sampling design can be written as
T ^ h m e = Z ¯ ( 1 + Q 0 ) and
T ^ h m e Z ¯ = Z ¯ Q 0 .
Squaring both sides, taking expectation, and ignoring the finite population correction factor, we have
V ( T ^ h m e ) 1 n S u 2 + S z 2 .
Using these notationin in Equation (19), we have
T ^ C m e = Z ¯ ( 1 + Q 0 ) B 1 ( X ¯ X ¯ ( 1 + Q 1 ) ) and
T ^ C m e Z ¯ = Z ¯ Q 0 B 1 ( X ¯ X ¯ ( 1 + Q 1 ) ) .
Squaring both sides of Equation (23) and using first order approximations, we get
[ T ^ C m e Z ¯ ] 2 [ Z ¯ ( Q 0 ) B 1 ( X ¯ X ¯ ( 1 + Q 1 ) ) ] 2 .
Taking expectation on both sides of above Equation (24), we obtain the variance of T ^ C m e for large N as
V [ T ^ C m e ] = 1 N [ ( S u 2 + S z 2 ) + [ B 1 ] 2 ( S v 2 + S x 2 ) 2 B 1 ( S z S x ρ z x + S u S v ρ u v ) ] .
Differentiating Equation (25) with respect to B 1 and equationg to zero, we get the optimum value of B 1 as B 1 * given by
B 1 * = ( S u S v ρ u v + S z S x ρ z x ) ( S v 2 + S x 2 ) .
Substituting, the optimum value of B 1 * in Equation (25), we obtain
V [ T ^ C m e ] o p t . = 1 n θ 0 ,
where, θ 0 = ( S u 2 + S z 2 ) + ( B 1 * ) 2 ( S v 2 + S x 2 ) 2 B 1 * ( S z S x ρ z x + S u S v ρ u v ) .
On similar lines, as followed for T ^ C m e , we have the following variance expression for the proposed ratio type calibrated estimator T ^ R m e and exponential type calibrated estimator T ^ E m e in the presence of measurement error:
V [ T ^ R m e ] o p t . = 1 n [ ( S u 2 + S z 2 ) + ( B 2 * ) 2 ( S v 2 + S x 2 ) + ( S v 2 + S x 2 ) 2 B 2 * ( S z S x ρ z x + S u S v ρ u v ) 2 S z S x ρ z x + S u S v ρ u v ]
where, B 2 * = ( 2 S u S v ρ u v S z S x ρ z x ) ( S v 2 + S x 2 ) .
V [ T ^ E m e ] o p t . = 1 n [ ( S u 2 + S z 2 ) + 1 4 ( S v 2 + S x 2 ) + ( B 3 * ) 2 ( S v 2 + S x 2 ) + ( S v 2 + S x 2 ) 2 B 3 * ( S z S x ρ z x + S u S v ρ u v ) S z S x ρ z x + S u S v ρ u v ]
where, B 3 * = ( 2 S u S v ρ u v S z S x ρ z x ) 1 2 ( S v 2 + S x 2 ) ( S v 2 + S x 2 ) .

6. Estimators of the Mean for Sensitive Variables

Substituting the population mean of the coded response variable Z ¯ in Equations (2), (4) and (5) with the Horvitz–Thompson estimator T ^ h m e and calibration estimators T ^ j m e ; j = C , R and E , yields the mean estimators ( Y ¯ ^ h ) P R R T , ( Y ¯ ^ j ) P R R T and ( Y ¯ ^ h ) O R R T , ( Y ¯ ^ j ) O R R T under PRRT and ORRT models, respectively, as well as ( Y ¯ ^ h ) C R R T and ( Y ¯ ^ j ) C R R T under the CRRT model. These estimators are detailed in Table 1.

7. Simulation Study

To evaluate the behavior of the proposed calibration estimators which are used under PRRT, CRRT and ORRT models in presence of measurement error and to compare them, a simulation study has been carried out.
For this purpose, a real population comprising of N = 94 districts of southern states in India (Andhra Pradesh, Karnataka, Kerala and Tamil Nadu) has been considered. [Population Source: https://mohfw.gov.in] (accessed on 3 August 2025).
The variables considered in the study are:
Y i : The positivity rate of COVID-19 in ith district in the week from 21 June to 27 June 2021;
X i : The positivity rate of COVID-19 in ith district in the week from 18 June to 24 June 2021.
The two scrambling variables S 1 and S 2 are assumed to follow Normal distribution such that S 1 N o r m a l ( 1 , 1 ) , and S 2 N o r m a l ( 1 , 2 ) .
The artificial data for u and v have also been generated from Normal distribution with mean 0 and variance 4 each using MATLAB (version 7.4). The parameters of considered population are: N = 94 , Y ¯ = 5.1756 , X ¯ = 5.1050 , ρ y x = 0.95 , and n = { 30 , 50 } .
For the real data considered above, a simulation study is carried out using MATLAB. 10,000 independent replications of the entire framework were carried out.
For the data considered above, the proposed calibration estimator T ^ C m e has been compared to Horvitz–Thompson estimator T ^ h m e , ratio type calibration estimator T ^ R m e and exponential type calibration estimator T ^ E m e under PRRT, CRRT and ORRT models in the presence of measurement error. Therefore, the simulated percent relative efficiencies (PRE) for the same are defined as
P R E k 1 = V [ ( Y ¯ ^ h ) k ] V [ ( Y ¯ ^ C ) k ] × 100 , P R E k 2 = V [ ( Y ¯ ^ R ) k ] V [ ( Y ¯ ^ C ) k ] × 100 and P R E k 3 = V [ ( Y ¯ ^ E ) k ] V [ ( Y ¯ ^ C ) k ] × 100 ; k = 1 for P R R T 2 for C R R T 3 for O R R T
where V [ ( Y ¯ ^ h ) C R R T ] = 1 10 , 000 i = 1 10 , 000 [ ( Y ¯ ^ h i ) C R R T ] Z ¯ i 2 .
Similarly, V [ ( Y ¯ ^ h ) P R R T ] , V [ ( Y ¯ ^ h ) O R R T ] , V [ ( Y ¯ ^ j ) C R R T ] and V [ ( Y ¯ ^ j ) P R R T ] , V [ ( Y ¯ ^ j ) O R R T ] ;
j = { C , R , E } are defined.
The simulation results obtained for P R E k r ; r = 1 , 2 and 3 are presented in Table 2 and Table 4 respectively.
For the proposed calibrated estimator, to assess the impact of measurement error under the CRRT, PRRT, and ORRT models, the variances of the calibration estimators V [ ( Y ¯ ^ j ) C R R T ] , V [ ( Y ¯ ^ j ) P R R T ] and V [ ( Y ¯ ^ j ) O R R T ] ; j = C , R and E are compared with the corresponding variances of estimators obtained in the absence of measurement error. The relevant Percent Relative Efficiency (PRE) was computed through a simulation study as follows:
P R E k 1 = V [ ( t ^ c ) k ] V [ ( Y ¯ ^ C ) k ] × 100 , P R E k 2 = V [ ( t ^ r ) k ] V [ ( Y ¯ ^ R ) k ] × 100 and P R E k 3 = V [ ( t ^ e ) k ] V [ ( Y ¯ ^ E ) k ] × 100 ; k = 1 for P R R T 2 for C R R T 3 for O R R T .
where ( t ^ s ) C R R T , ( t ^ s ) P R R T and ( t ^ s ) O R R T ; s = { c , r , e } can be obtained from [ ( Y ¯ ^ j ) C R R T ] , [ ( Y ¯ ^ j ) P R R T ] and [ ( Y ¯ ^ j ) O R R T ] respectively by substituting u = v = 0 .
The simulation results of P R E k r ; r = 1 , 2 and 3 are shown in Table 3 and Table 4 respectively, where V [ ( t ^ s ) C R R T ] = 1 10 , 000 i = 1 10 , 000 ( t ^ s i ) C R R T Y ¯ i 2 .
Similarly, V ( t ^ s ) P R R T and V ( t ^ s ) O R R T ; s = { c , r , e } can be computed.
Further, to assess the impact of the randomization process, the variances of the estimators [ ( Y ¯ ^ j ) C R R T ] , [ ( Y ¯ ^ j ) P R R T ] and [ ( Y ¯ ^ j ) O R R T ] are compared with their counterparts obtained in the absence of both randomization and measurement error, as given by
P R E k 1 = V [ ( T ^ c d ) k ] V [ ( Y ¯ ^ C ) k ] × 100 , P R E k 2 = V [ ( T ^ r d ) k ] V [ ( Y ¯ ^ R ) k ] × 100 and P R E k 3 = V [ ( T ^ e d ) k ] V [ ( Y ¯ ^ E ) k ] × 100 ; k = 1 for P R R T 2 for C R R T 3 for O R R T .
The estimators ( T ^ s d ) ; s = { c , r , e } can be obtained from [ ( Y ¯ ^ j ) C R R T ] , [ ( Y ¯ ^ j ) P R R T ] and [ ( Y ¯ ^ j ) O R R T ] ; j = { C , R , E } respectively by taking u = v = 0 and dealing with Y directly instead of coding the response by randomization. The estimator variance is given by
V [ ( T ^ s d ) ] = 1 10 , 000 i = 1 10 , 000 ( T ^ s d i ) Y ¯ i 2 .
The simulation results of P R E k r ; r = 1 , 2 and 3 are shown in Table 4 and Table 5 respectively.
Some noteworthy results from Table 2, Table 3, Table 4 and Table 5 are as follows:
Table 2 and Table 4 present the calibration estimator, ratio-type calibration estimator, and exponential-type calibration estimator for the PRRT, CRRT, and ORRT models under measurement error for sensitive mean estimation. These results demonstrate the feasibility of applying calibration estimators in the presence of measurement error. It is also observed that the Percent Relative Efficiency (PRE) exceeds 100 when the proposed calibrated estimator T ^ C m e is compared with the Horvitz-Thompson type estimator T ^ h m e , the ratio-type calibrated estimator T ^ R m e and the exponential-type calibrated estimator T ^ E m e under CRRT, PRRT and ORRT models. Furthermore, for varying P it can be seen that P R E 11 < P R E 12 < P R E 13 under CRRT, PRRT and ORRT for both the choices of n. Additionally, the exponential-type calibration estimator exhibits higher efficiency than the ratio-type calibrated estimator when compared within the same calibration framework under the PRRT and ORRT models. In terms of overall efficiency, the proposed ORRT model outperforms the PRRT model.
However, Table 3 and Table 4 clearly demonstrate that the variances of measurement errors have a significant impact under the ORRT, PRRT, and CRRT models. It is observed that the PRE values decrease when calibrated estimators are compared in the presence of measurement error versus their absence.
Table 5 compares the proposed estimators with their direct versions, which do not involve measurement error or randomization. It is evident that all P R E values are below 100 for the ORRT, PRRT, and CRRT models. This suggests that calibration estimators under these models do not achieve higher efficiency than estimators based on direct responses without measurement errors or randomization. Nevertheless, randomization is indispensable in practice, as the sensitive nature of the questions often results in biased responses or non-response when direct questioning is applied.

8. Concluding Remarks

This study confirms that estimating the population mean of a sensitive variable is achievable even in the presence of measurement error. The use of calibration estimators has been effective across the CRRT, PRRT, and ORRT models. Among these, the PRRT and ORRT models, particularly when combined with ratio-type and exponential calibration estimators, yield more efficient estimates than the corresponding CRRT models. Furthermore, the proposed ORRT model consistently outperforms the PRRT model under measurement error conditions. While a decline in precision is observed for all estimators under CRRT, PRRT, and ORRT when measurement error is introduced, compared to estimators without error and randomization, this outcome is consistent with theoretical expectations. Overall, the findings suggest that the proposed ORRT model is the most suitable choice for estimating a quantitative sensitive variable in the presence of measurement error.

Author Contributions

Methodology, P.T.; Software, P.T.; Validation, S.G. and P.T.; Writing—original draft, P.T.; Writing—review & editing, S.G. and F.C.; Supervision, S.G. and F.C.; Funding acquisition, S.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We sincerely thank the honorable reviewers for their valuable comments and suggestions, which have significantly improved the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Warner, S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef]
  2. Greenberg, B.G.; Abul-Ela, A.L.A.; Simmons, W.R.; Horvitz, D.G. The unrelated question randomized response model: Theoretical framework. J. Am. Stat. Assoc. 1969, 64, 520–539. [Google Scholar] [CrossRef]
  3. Abernathy, J.R.; Greenberg, B.G.; Horvitz, D.G. Estimates of induced abortionin urban North Carolina. Demography 1970, 7, 19–29. [Google Scholar] [CrossRef] [PubMed]
  4. Goodstadt, M.S.; Gruson, V. The randomized response technique; a test on drug use. J. Am. Stat. Assoc. 1975, 70, 814–818. [Google Scholar] [CrossRef]
  5. Folsom, S.A. The two alternative questions randomized response model for human surveys. J. Am. Stat. Assoc. 1973, 68, 525–530. [Google Scholar] [CrossRef]
  6. Van der Heijden, P.G.M.; Van Gils, G.; Bouts, J.; Hox, J.J. A comparison of randomized response, CASAQ, and direct questioning; eliciting sensitive information in the context of social security fraud. Kwant. Methoden 1965, 59, 15–34. [Google Scholar]
  7. Greenberg, B.G.; Kubler, R.R.; Horvitz, D.G. Application of the randomized response technique in obtaining quantitative data. J. Am. Stat. Assoc. 1971, 66, 243–250. [Google Scholar] [CrossRef]
  8. Pollock, K.H.; Bek, Y. A comparison of three randomized response models for quantitative data. J. Am. Stat. Assoc. 1976, 71, 356–884. [Google Scholar] [CrossRef]
  9. Eichhorn, B.H.; Hayre, L.S. Scrambled randomized response methods for obtaining sensitive quantitative data. J. Stat. Plan. Inference 1983, 4, 307–316. [Google Scholar] [CrossRef]
  10. Gupta, S.; Gupta, B.; Singh, S. Estimation of the Sensitivity Level of Personal Interview Survey Questions. J. Stat. Plan. Inference 2002, 100, 239–247. [Google Scholar] [CrossRef]
  11. Bar-Lev, S.K.; Bobovitch, E.; Boukai, B. A note on randomized response models for quantitative data. Metrika 2004, 60, 255–260. [Google Scholar] [CrossRef]
  12. Chaudhuri, A.; Mukherjee, R. Randomized Response: Theory and Techniques; Marcel Dekker: New York, NY, USA, 1988. [Google Scholar]
  13. Diana, G.; Perri, P.F. A class of estimators for quantitative sensitive data. Stat. Pap. 2011, 52, 633–650. [Google Scholar] [CrossRef]
  14. Pal, S. Unbiasedly estimating the total of a stigmatizing variable from a complex survey on permitting options for direct or randomized responses. Stat. Pap. 2008, 49, 157–164. [Google Scholar] [CrossRef]
  15. Arcos, A.; Rueda, M.; Singh, S.A. A generalized approach to randomized response for quantitative variables. Qual. Quant. 2015, 49, 1239–1256. [Google Scholar] [CrossRef]
  16. Khalil, S.; Zhang, Q.; Gupta, S. Mean estimation of sensitive variables under measurement errors using optional RRT models. Commun. Stat. Simul. Comput. 2021, 50, 1417–1426. [Google Scholar] [CrossRef]
  17. Zhang, Q.; Khalil, S.; Gupta, S. Mean Estimation of Sensitive Variables Under Nonresponse and Measurement Errors Using Optional RRT Models. J. Stat. Theory Pract. 2021, 15, 1–15. [Google Scholar] [CrossRef]
  18. Mangat, N.S.; Singh, S. An optional randomised response sampling technique. J. Indian Stat. Assoc. 1994, 32, 71–75. [Google Scholar]
  19. Huang, K.-C. Estimation for sensitive characteristics using optional randomized response tech nique. Qual. Quant. 2008, 5, 679–686. [Google Scholar] [CrossRef]
  20. Chaudhuri, A.; Dihidar, K. Estimating means of stigmatizing qualitative and quantitative variables from discretionary responses randomized or direct. Sankhya 2009, 71, 123–136. [Google Scholar]
  21. Sanaullah, A.; Saleem, I.; Gupta, S.; Hanif, M. Mean estimation with generalized scrambling using two-phase sampling. Commun. Stat.-Simul. Comput. 2020, 51, 5643–5657. [Google Scholar] [CrossRef]
  22. Gregoire, T.G.; Salas, C. Ratio Estimation with Measurement Error in the Auxiliary Variate. Biometrics 2009, 65, 590–598. [Google Scholar] [CrossRef] [PubMed]
  23. Shalabh; Tsai, J.R. Ratio and product methods of estimation of population mean in the presence of correlated measurement error. Commun. Stat. Simul. Comput. 2017, 46, 5566–5593. [Google Scholar] [CrossRef]
  24. Khalil, S.; ul Amin, M.N.; Hanif, M. Estimation of population mean for a sensitive variable in the presence of measurement error. J. Stat. Manag. Syst. 2018, 21, 81–91. [Google Scholar] [CrossRef]
  25. Kumar, S.; Kour, S.P. The joint influence of estimation of sensitive variable under measurement error and non-response using ORRT models. J. Stat. Comput. Simul. 2022, 92, 3583–3604. [Google Scholar] [CrossRef]
  26. Tiwari, K.K.; Bhougal, S.; Kumar, S.; Rather, K.U.I. Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error. J. Stat. Theory Pract. 2022, 16, 28. [Google Scholar] [CrossRef]
  27. Horvitz, D.G.; Thompson, D.J. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 1952, 47, 663–685. [Google Scholar] [CrossRef]
  28. Deville, J.C.; Sarndal, C.E. Calibration estimators in survey sampling. J. Am. Stat. Assoc. 1992, 87, 376–382. [Google Scholar] [CrossRef]
Table 1. Proposed estimators for sensitive population mean and their variance.
Table 1. Proposed estimators for sensitive population mean and their variance.
ModelEstimatorsVariance
PRRT ( Y ¯ ^ h ) P R R T = T ^ h m e ( 1 P ) S ¯ 2 P + ( 1 P ) S ¯ 1 V [ ( Y ¯ ^ h ) P R R T ] = V ( T ^ h m e ) o p t . [ P + ( 1 P ) S ¯ 1 ] 2
PRRT ( Y ¯ ^ j ) P R R T = T ^ j m e ( 1 P ) S ¯ 2 P + ( 1 P ) S ¯ 1 V [ ( Y ¯ ^ j ) P R R T ] = V ( T ^ j m e ) o p t . [ P + ( 1 P ) S ¯ 1 ] 2
CRRT ( Y ¯ ^ h ) C R R T = T ^ h m e S ¯ 2 S ¯ 1 V [ ( Y ¯ ^ h ) C R R T ] = V ( T ^ h m e ) o p t . [ S ¯ 1 ] 2
CRRT ( Y ¯ ^ j ) C R R T = T ^ j m e S ¯ 2 S ¯ 1 V [ ( Y ¯ ^ j ) C R R T ] = V ( T ^ j m e ) o p t . [ S ¯ 1 ] 2
ORRT ( Y ¯ ^ h ) O R R T = T ^ h m e V [ ( Y ¯ ^ h ) O R R T ] = V ( T ^ h m e ) o p t .
ORRT ( Y ¯ ^ j ) O R R T = T ^ j m e V [ ( Y ¯ ^ j ) O R R T ] = V ( T ^ j m e ) o p t .
where j = C , R and E.
Table 2. Simulation results for PRE of proposed calibrated estimators in presence of measurement error under PRRT model and ORRT model.
Table 2. Simulation results for PRE of proposed calibrated estimators in presence of measurement error under PRRT model and ORRT model.
PRRT ORRT
n = 30 n = 50 n = 30 n = 50
P PRE 11 PRE 12 PRE 13 PRE 11 PRE 12 PRE 13 PRE 11 PRE 12 PRE 13 PRE 11 PRE 12 PRE 13
0.1 100.99 121.40 114.88 110.04 121.80 115.04 100.46 122.69 115.52 110.05 122.75 115.59
0.2 101.31 206.12 292.21 134.31 267.49 382.51 196.24 210.91 490.04 197.33 259.00 11832
0.3 104.61 221.89 327.49 130.21 302.26 428.59 196.84 209.96 577.73 198.42 261.48 12949
0.4 100.01 236.24 278.00 115.90 310.46 456.92 197.07 213.66 597.31 198.07 268.99 14618
0.5 100.00 240.67 402.10 100.89 310.12 467.81 197.10 218.70 609.62 198.53 276.86 17370
0.6 100.01 252.70 449.29 100.57 304.99 492.58 196.45 221.57 638.42 198.69 290.69 20650
0.7 106.42 269.80 541.33 164.62 317.71 566.12 198.81 233.88 694.59 199.83 312.93 28052
0.8 129.34 286.14 661.14 180.63 334.13 697.81 201.18 248.76 787.28 202.47 354.26 43653
0.9 180.23 298.98 809.73 301.00 386.81 975.38 209.18 301.49 795.05 210.67 430.47 83663
Table 3. Simulation results for PRE of proposed calibrated estimators in absence of measurement error under PRRT model and ORRT model.
Table 3. Simulation results for PRE of proposed calibrated estimators in absence of measurement error under PRRT model and ORRT model.
PRRT ORRT
n = 30 n = 50 n = 30 n = 50
P PRE 11 PRE 12 PRE 13 PRE 11 PRE 12 PRE 13 PRE 11 PRE 12 PRE 13 PRE 11 PRE 12 PRE 13
0.1 95.03 51.12 20.00 90.46 37.22 20.04 101.11 39.89 20.14 102.97 36.80 20.09
0.2 105.26 39.84 20.05 114.57 36.71 20.03 100.93 39.44 20.12 103.40 36.48 20.87
0.3 115.84 39.56 20.04 137.63 36.82 20.04 101.79 39.27 20.11 103.11 36.34 20.07
0.4 117.76 38.87 20.04 147.41 36.98 20.04 102.24 39.19 20.10 104.38 36.43 20.07
0.5 117.27 38.11 20.03 148.56 37.03 20.03 102.03 38.64 20.08 104.48 35.72 20.06
0.6 110.03 37.20 20.03 139.45 36.79 20.03 102.71 38.34 20.07 103.82 35.52 20.04
0.7 101.14 35.84 20.02 123.95 35.75 20.02 102.79 37.45 20.05 103.96 34.81 20.03
0.8 75.11 34.06 20.03 101.40 34.15 20.01 96.95 36.11 20.03 97.91 33.88 20.02
0.9 34.87 31.73 20.00 52.39 31.81 20.00 72.49 33.53 20.01 75.66 32.26 20.00
Table 4. Simulation results for PRE of proposed calibrated estimators in presence of measurement error, in absence of measurement error and randomization under CRRT model.
Table 4. Simulation results for PRE of proposed calibrated estimators in presence of measurement error, in absence of measurement error and randomization under CRRT model.
PRE 11 PRE 12 PRE 13 PRE 11 PRE 12 PRE 13 PRE 11 PRE 12 PRE 13
n = 30 195.86 300.54 435.38 112.94 51.52 20.00 63.73 77.28 20.01
n = 50 197.56 589.51 601.33 129.64 38.07 20.06 48.81 80.81 20.03
Table 5. Simulation results for PRE of proposed calibrated estimators in absence of measurement error and randomized model with respect to PRRT and ORRT models respectively for n = 30 .
Table 5. Simulation results for PRE of proposed calibrated estimators in absence of measurement error and randomized model with respect to PRRT and ORRT models respectively for n = 30 .
PRRT ORRT
P PRE 11 PRE 12 PRE 13 PRE 11 PRE 12 PRE 13
0.1 30.06 20.06 10.00 64.17 30.29 20.16
0.2 30.08 20.00 10.00 66.53 25.23 20.13
0.3 30.10 20.00 10.01 56.98 21.27 20.11
0.4 30.14 20.01 10.01 37.53 17.86 20.08
0.5 30.20 20.01 10.01 93.27 13.60 20.06
0.6 30.31 20.02 10.02 57.19 10.17 20.04
0.7 30.54 20.04 10.02 14.91 26.91 20.02
0.8 31.17 20.07 10.04 71.20 23.87 20.01
0.9 33.30 20.16 10.05 28.85 21.42 20.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gupta, S.; Trisandhya, P.; Coolen, F. Some Calibration Estimators of the Mean of a Sensitive Variable Under Measurement Error. Mathematics 2025, 13, 2532. https://doi.org/10.3390/math13152532

AMA Style

Gupta S, Trisandhya P, Coolen F. Some Calibration Estimators of the Mean of a Sensitive Variable Under Measurement Error. Mathematics. 2025; 13(15):2532. https://doi.org/10.3390/math13152532

Chicago/Turabian Style

Gupta, Sat, Pidugu Trisandhya, and Frank Coolen. 2025. "Some Calibration Estimators of the Mean of a Sensitive Variable Under Measurement Error" Mathematics 13, no. 15: 2532. https://doi.org/10.3390/math13152532

APA Style

Gupta, S., Trisandhya, P., & Coolen, F. (2025). Some Calibration Estimators of the Mean of a Sensitive Variable Under Measurement Error. Mathematics, 13(15), 2532. https://doi.org/10.3390/math13152532

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop