Impact of Auxiliary Information and Measurement Errors on Mean Estimation with Mixture Optional Enhanced Trust (MOET) Randomized Response Model

Parker, Michael; Gupta, Sat; Khalil, Sadia

doi:10.3390/axioms14030183

Open AccessArticle

Impact of Auxiliary Information and Measurement Errors on Mean Estimation with Mixture Optional Enhanced Trust (MOET) Randomized Response Model

by

Michael Parker

^1,*,

Sat Gupta

¹

and

Sadia Khalil

²

¹

Department of Mathematics and Statistics, UNC Greensboro, Greensboro, NC 27413, USA

²

Department of Statistics, Lahore College for Women University, Lahore 44444, Pakistan

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(3), 183; https://doi.org/10.3390/axioms14030183

Submission received: 29 January 2025 / Revised: 25 February 2025 / Accepted: 26 February 2025 / Published: 28 February 2025

(This article belongs to the Special Issue New Perspectives in Mathematical Statistics)

Download

Browse Figures

Versions Notes

Abstract

Randomized response technique (RRT) surveys are designed to secure honest answers to sensitive questions. In this study, we consider the important issue of measurement error (ME). While non-response, a common culprit for survey inaccuracy, is a lesser issue in RRT studies because they are conducted through face-to-face interviews, measurement error is of particular significance. RRT models are generally more complex than other survey methods, sometimes requiring that respondents follow ordered instructions, draw cards from decks, and/or perform simple mathematical calculations. All of these steps can result in measurement errors, and when such error is high, estimation efficiency will suffer. In this study, we consider the impact of measurement error on a Mixture Optional Enhanced Trust (MOET) RRT model proposed in 2024, and we propose new estimators for this model that take measurement error into account. We also study the extent to which measurement error can be tolerated before it is so large that it overwhelms and undermines the benefit that RRT was implemented to yield in the first place (the reduction in or elimination of social desirability bias-related untruthfulness). We also draw attention to a surprising finding—that the presence of measurement error inadvertently serves to provide additional scrambling, thereby leading to an increase in privacy.

Keywords:

randomized response technique (RRT); respondent privacy; social desirability bias (SDB); unified measure of model quality (UM); ratio estimator; auxiliary variable

MSC:

62D05

1. Introduction

When the subject of surveys is sensitive in nature (potentially embarrassing, shameful, or even illegal) respondents may not be truthful when they answer. Warner (1969) and Greenberg (1969) first pioneered models that provided some means through which respondents could hide their true responses to such sensitive questions, thereby removing their incentive to lie [1,2]. While these first models were binary in nature (are designed for questions with yes/no responses), there are also sensitive questions that are of a quantitative nature. While misrepresentation in responses to quantitative questions is less black and white, it is clear that a social desirability bias (SDB) impacts them too, and as Lanke (2017) described, causes respondents “to underreport socially undesirable attributes… and to over report more desirable attributes” [3]. Both Warner (1971) and Greenberg (1971) therefore followed up their models with new models that were applicable to sensitive questions with quantitative responses [4,5].

Following Warner’s and Greenberg’s inventions, many new RRT models were introduced, and they generally added ever greater levels of complexity. For instance, Pollock and Bek (1976), Eichhorn and Hayre (1983), and Diana, G and Perri (2011) all proposed models that included additive and/or multiplicative scrambling [6,7,8]. Perri (2008) combined Warner and Greenberg features into a single blank card model [9].

All of this complexity was intended to foster greater levels of scrambling and therefore to encourage more truthful RRT responses, but it came at a significant price: the potential for error. A flurry of research involving measurement error followed. Blattman et al. (2016) developed a survey validation technique that uses qualitative work to check for measurement error in potentially sensitive behaviors [10]. Sharma and Singh (2015) proposed the use of auxiliary information to improve efficiency, assuming that non-response and measurement error are present in both the study and auxiliary variables [11]. Khalil et al. (2018) proposed a generalized estimator in the presence of measurement error [12]. Makhdum et al. analyzed scenarios where non-response and measurement error are simultaneously present, and Singh and Vishwakarma (2019) proposed a method to measure the combined effect of measurement error and non-response when auxiliary information is used [13,14]. Priyanka et al. (2023) investigated the impact of measurement error in RRT successive sampling [15]. Beyond just accounting for ME, Audu et al. (2020) proposed a class of estimators that has superior efficiency when measurement errors are present [16].

In Section 2 of this study, we review the Mixture Optional Enhanced Trust (MOET) model proposed by Parker et al. (2024), as well as the ratio estimator for the MOET model proposed by Gupta et al. (2024); these will serve as the basis of this study [17,18]. Then, in Section 3, we derive basic and ratio estimators that reflect the impact of measurement error. In Section 4, we recognize that ME reduces efficiency, and study (1) the circumstances under which the ME resulting from the collection of auxiliary information is so large that it undermines the benefit of collecting such information, and (2) the circumstances under which ME is so large that it undermines the RRT’s overall benefit (the elimination of bias).

We then, in Section 5, turn our attention to an aspect of auxiliary information that has not been adequately explored. It is well known that auxiliary information can improve efficiency. However, auxiliary comes at a cost—its presence may reduce privacy. Indeed, if auxiliary information (represented by

X

) was perfectly correlated with the response to the sensitive question (

Y

), then knowledge of

X

would lead directly to knowledge of

Y

. We explore this dynamic, and we also recognize that at the same time auxiliary information reduces privacy, measurement error inadvertently increases privacy. In Section 6, we provide simulations that validate the estimators developed in Section 3 and explore the behavior of these estimators.

2. MOET Model (2024)

Here, we present the model that will serve as the basis of our study. The Mixture Optional Enhanced Trust (MOET) model was first proposed by Parker et al. (2024) [17], and features several recently innovated RRT features—mixture, optionality, and enhanced trust. Figure 1 provides a diagram of the model. In this section, we discuss certain common RRT manipulations that could trigger measurement error (ME) in RRT models like this one. Then, we discuss the basic and ratio mean estimators for this model, as proposed in earlier papers.

Z = \{\begin{array}{l} Y + S & with probability & W α A \\ T Y + S & with probability & W (1 - A) (α + p - α p) \\ Y & with probability & W (1 - α) p A + (1 - W) \\ R & with probability & W (1 - α) (1 - p) \end{array} .

(1)

In the MOET model decision tree, two of the decisions (those involving α and

p

) must be made by a random process where the outcome is unknown to the researcher; therefore, the respondent themselves has to, in some sense, perform the randomization. This might be achieved by the respondent shuffling and picking a card from a deck. We mention this because although the process may be very simple, it none the less presents an opportunity for confusion and error. Note that ME includes any unintentional error (including confusion) that results in a recorded response being different from the respondent’s intended response. Additionally, some respondents will need to scramble their true responses by addition and/or multiplication (

Z = Y + S

or

Z = T Y + S

); again, these calculations must be performed by the respondent. There are many ways to keep these calculations simple (making it certain that

T

and

S

take on only positive whole-number values, providing a calculator or charts of computations to the respondent, etc.) but, nevertheless, the fact that calculations must be performed by the respondent presents the opportunity for ME.

MOET Parker Model Review [17]

We now review MOET model estimators. Detailed below are the basic mean estimator that Parker et al. (2024) proposed based on a split sample approach and an expression for the model’s efficiency (MSE), which will be used as a basis for comparison later in this study [17]. To distinguish it from the ratio estimator (which will also be considered in this study), we will call this mean estimator the basic mean estimator (or the basic estimator); the basic estimator along with its mean square error (MSE) is given in Equations (2) and (3).

{\hat{µ}}_{Y} = \frac{1 - p_{1}}{p_{2} - p_{1}} {\bar{Z}}_{2} - \frac{1 - p_{2}}{p_{2} - p_{1}} {\bar{Z}}_{1} .

(2)

M S E ({\hat{µ}}_{Y}) = {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} V a r ({\bar{Z}}_{2}) + {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} V a r ({\bar{Z}}_{1}),

(3)

where

{V a r (\bar{Z}}_{i}) = \frac{2}{n} \{W [1 - λ_{i} - A ϕ_{i}] σ_{S}^{2} + [W (1 - λ_{i}) ((1 - A) σ_{T}^{2} + 1) + 1 - W] (σ_{Y}^{2} + µ_{Y}^{2}) + W λ_{i} (σ_{R}^{2} + µ_{R}^{2}) - {[µ_{Y} + (µ_{R} - µ_{Y}) W λ_{i}]}^{2}\}, \begin{array}{l} λ_{i} = (1 - α) (1 - p_{i}), i = 1, 2, \\ ϕ_{i} = p_{i} (1 - α), i = 1, 2, \\ p_{1} \neq p_{2} . \end{array}

In Equations (2)–(6), the following symbols are used:

$Y :$ The true response to the sensitive question. This random variable has mean $µ_{Y}$ and variance $σ_{Y}^{2}$ .
$Z_{i} :$ The response collected from the respondent in the ith sub-sample, $i = 1,2$ . This random variable has mean $µ_{Z i}$ and variance $σ_{Z i}^{2}$ .
$S :$ an additive scrambling variable with mean $µ_{S} = 0$ and variance $σ_{S}^{2}$ .
$T :$ A multiplicative scrambling variable with mean $µ_{T} = 1$ and variance $σ_{T}^{2}$ . $T$ is independent of S and Y.
$R :$ response to unrelated question with mean $µ_{R}$ and variance $σ_{R}^{2}$ .
$n :$ The sample size. In split sampling, $n$ is split into $n_{1}$ and $n_{2}$ , where $n_{1} + n_{2} = n$ .
$p_{i} :$ the probability that an individual that has been assigned to the Greenberg sub-model within sub-sample $i$ is assigned the sensitive question, as opposed to the unrelated question.
$A :$ the probability that a respondent will trust the RRT methodology without additional scrambling.
$W :$ the sensitivity level of the sensitive question, that is, a proportion $(1 - W)$ of the respondents do not consider the question sensitive and are willing to provide true responses without scrambling.
$X :$ The auxiliary variable with known mean $µ_{X}$ and known variance $σ_{X}^{2}$ . This variable is known to have a strong positive correlation with $Y$ .

The privacy provided by the MOET model is given by

\nabla^{a} = \frac{1}{2} \{2 α A σ_{S}^{2} + ({2 - λ}_{1} - λ_{2}) (1 - A) [(σ_{Y}^{2} + µ_{Y}^{2}) σ_{T}^{2} + σ_{S}^{2}] + (λ_{1} + λ_{2}) [σ_{Y}^{2} + σ_{R}^{2} + {(µ_{Y} - µ_{R})}^{2}]\},

(4)

where

λ_{1} = (1 - α) (1 - p_{1}),

λ_{2} = (1 - α) (1 - p_{2}),

p_{1} \neq p_{2} .

Gupta et al. further proposed a ratio estimator for the MOET model, consistent with standard ratio estimator formulation as represented by Thompson (2012) [18,19]. The estimator and an expression for the model’s efficiency (MSE) are given in Equations (5) and (6). These can be used when high-quality auxiliary information is available.

{\hat{µ}}_{Y R} = [\frac{1 - p_{1}}{p_{2} - p_{1}} {\bar{Z}}_{2} - \frac{1 - p_{2}}{p_{2} - p_{1}} {\bar{Z}}_{1}] [\frac{1}{2} (\frac{µ_{X}}{{\bar{x}}_{1}} + \frac{µ_{X}}{{\bar{x}}_{2}})] .

(5)

M S E ({\hat{µ}}_{Y R}) = [{(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} V a r (\bar{Z}_{2}) + {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} V a r (\bar{Z}_{1})] + \frac{1}{n} {(\frac{µ_{Y}}{µ_{X}})}^{2} σ_{X}^{2} - \frac{2}{n} (\frac{µ_{Y}}{µ_{X}}) σ_{X Y},

(6)

3. Reflecting Measurement Error in Our MOET Basic and Ratio Estimators

When variables are contaminated with measurement error, this will lead to less accurate estimates. To explore this, we will now propose basic and ratio estimators in the presence of measurement error.

3.1. Development of Basic Estimator Accounting for Measurement Error

The basic MOET mean estimator was given in Equation (2) previously. A new version of this estimator that accounts for measurement error can be represented as follows:

{\hat{µ}}_{Y (M E)} = \frac{1 - p_{1}}{p_{2} - p_{1}} {\bar{z}}_{2} - \frac{1 - p_{2}}{p_{2} - p_{1}} {\bar{z}}_{1},

(7)

where

${\bar{z}}_{i}$ is the observed sample mean of split sample $i$ ;
${\bar{Z}}_{i} = µ_{Z i}$ is the theoretical mean value of $Z$ in split sample $i$ , with measurement error absent;
${\bar{Z}}_{i}^{'} = {\bar{z}}_{i} - {\bar{Z}}_{i}$ is the mean effect of measurement error.

Equation (7) could equally be written as follows:

{\hat{µ}}_{Y (M E)} = \frac{1 - p_{1}}{p_{2} - p_{1}} {\bar{Z}}_{2} - \frac{1 - p_{2}}{p_{2} - p_{1}} {\bar{Z}}_{1} + \frac{1 - p_{1}}{p_{2} - p_{1}} {\bar{Z}}_{2}^{'} - \frac{1 - p_{2}}{p_{2} - p_{1}} {\bar{Z}}_{1}^{'} .

(8)

Since E(

{\bar{Z}}_{i}^{'}

)

= 0

, this estimator is obviously unbiased.

Turning our attention to the MSE/variance of this estimator, it is easy to see that

{\hat{µ}}_{Y (M E)} = µ_{Y} + [\frac{1 - p_{1}}{p_{2} - p_{1}} {\bar{Z}}_{2}^{'} - \frac{1 - p_{2}}{p_{2} - p_{1}} {\bar{Z}}_{1}^{'}] .

(9)

From this expression, one can directly conclude that

V a r ({\hat{µ}}_{Y (M E)}) = V a r (µ_{Y}) + {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} V a r ({\bar{Z}}_{2}^{'}) + {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} V a r ({\bar{Z}}_{1}^{'}) .

(10)

Since

V a r (µ_{Y}) = 0

, it follows that the variance of the basic estimator in the presence of measurement error can be written as follows:

V a r ({\hat{µ}}_{Y (M E)}) = {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} V a r ({\bar{Z}}_{2}^{'}) + {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} V a r ({\bar{Z}}_{1}^{'}) .

(11)

We now let the random variable

U

represent the measurement error associated with

Z

, where

U

and

Z

are independent and

E (U) = 0

. It follows from

{\bar{Z}}_{i}^{'} = {\bar{Z}}_{i} + \bar{U}

that

V a r ({\bar{Z}}_{i}^{'}) = \frac{σ_{Z i}^{2}}{n_{i}} + \frac{σ_{U}^{2}}{n_{i}} .

(12)

Since MSE equals variance due to zero bias, we conclude for an equally split sample that

M S E ({\hat{µ}}_{Y (M E)}) = \frac{2}{n} [{(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} (σ_{Z 1}^{2} + σ_{U}^{2}) + {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} (σ_{Z 2}^{2} + σ_{U}^{2})],

(13)

where

σ_{Z i}^{2} = [W [1 - λ_{i} - A ϕ_{i}] σ_{S}^{2} + [W (1 - λ_{i}) ((1 - A) σ_{T}^{2} + 1) + 1 - W] (σ_{Y}^{2} + µ_{Y}^{2}) + W λ_{i} (σ_{R}^{2} + µ_{R}^{2}) - {[µ_{Y} + (µ_{R} - µ_{Y}) W λ_{i}]}^{2}],

λ_{i} = (1 - α) (1 - p_{i}), i = 1, 2,

ϕ_{i} = p_{i} (1 - α), i = 1, 2,

p_{1} \neq p_{2} .

Note that when

σ_{U}^{2} = 0

, this expression can be reduced to the familiar expression for the MSE of the MOET basic estimator, as shown in Equation (3). Therefore, we can rewrite Equation (13) as

M S E ({\hat{µ}}_{Y (M E)}) = \frac{2}{n} [{(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z 2}^{2} + {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z 1}^{2}] + \frac{2}{n} [{(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} + {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2}] σ_{U}^{2}

(14)

This makes it clear that the ME impact on the MSE of the MOET model is independent of model parameters

A

,

W

, and α, as well as the means and variances of the MOET model random variables. While setting

p_{1}

and

p_{2}

strategically could limit the ME impact, this would be counterproductive, as these values are typically chosen to minimize the MSE associated with

Z_{1}

and

Z_{2}

as per Parker, leading to a larger and more important impact [17].

3.2. Development of Ratio Estimator Reflecting Measurement Error

The ratio MOET mean estimator was shown previously in Equation (5). Its value will be affected not only by the error resulting from the collection of

Z

values as above, but also by the error resulting from the collection of

X

values:

{\hat{µ}}_{Y R (M E)} = \frac{1}{2} [\frac{1 - p_{1}}{p_{2} - p_{1}} {\bar{z}}_{2} - \frac{1 - p_{2}}{p_{2} - p_{1}} {\bar{z}}_{1}] [(\frac{µ_{X}}{µ_{X} + {\bar{X}}_{1}^{'}}) + (\frac{µ_{X}}{µ_{X} + {\bar{X}}_{2}^{'}})] .

(15)

The new variables relevant to this expression are as follows:

${\bar{x}}_{i}$ is the observed sample mean of observed $x$ values in split sample $i$ ;
${\bar{X}}_{i} = µ_{X}$ is the theoretical mean value of $X$ in split sample $i$ , with an absent measurement error;
${\bar{X}}_{i}^{'} = {\bar{x}}_{i} - {\bar{X}}_{i}$ is the mean effect of measurement associated with $X$ .

V

is the random variable that represents the measurement error incurred while collecting auxiliary information (

X

), such that

x = X + V

, and

V

is independent of

U

,

X

, and

Z

.

The mean estimate of the response to the sensitive question, reflecting measurement error, will be

{\hat{µ}}_{Y R (M E)} = \frac{1}{2} [\frac{1 - p_{1}}{p_{2} - p_{1}} {\bar{z}}_{2} - \frac{1 - p_{2}}{p_{2} - p_{1}} {\bar{z}}_{1}] [(\frac{µ_{X}}{µ_{X} + {\bar{X}}_{1}^{'}}) + (\frac{µ_{X}}{µ_{X} + {\bar{X}}_{2}^{'}})] .

(16)

Note that this estimator, like any ratio estimator, will be biased. However, the extent of the bias is not exaggerated by measurement error, which is assumed to have

µ_{U} = µ_{V} = 0

. The bias is as follows:

B i a s ({\hat{µ}}_{Y R (M E)}) = {\frac{2 µ_{Y}}{n} (\frac{σ_{X}}{µ_{X}})}^{2} - \frac{σ_{X Y}}{n µ_{X}} .

(17)

We now consider the efficiency of the measurement error-effected ratio estimator. An approximation for the MSE can be found based on first principles according to

M S E ({\hat{µ}}_{Y R (M E)}) = {E ({\hat{µ}}_{Y R (M E)} - µ_{Y})}^{2} .

(18)

Following a second order Taylor expansion, we apply the following identities, whose derivations are summarized in Appendix A of this study:

$E [{\bar{Z}}_{i}^{'} {\bar{Z}}_{i}^{'}] = \frac{1}{n_{i}} (σ_{Z i}^{2} + σ_{U}^{2})$

(19)
$E [{\bar{X}}_{i}^{'} X_{i}^{'}] = \frac{1}{n_{i}} (σ_{X}^{2} + σ_{V}^{2})$

(20)
$E [{\bar{X}}_{i}^{'} {\bar{Z}}_{i}^{'}] = \frac{1}{n_{i}} [σ_{Y X} - W (1 - α) (1 - p_{i}) (σ_{Y X} + μ_{R} μ_{X})]$

(21)

Applying these identities and assuming an equally split sample, this expression can be reduced to

\begin{array}{l} M S E ({\hat{µ}}_{Y R (M E)}) = & \frac{µ_{Y}^{2}}{n µ_{X}^{2}} (σ_{X}^{2} + σ_{V}^{2}) \\ + \frac{2}{n} [{(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} (σ_{Z 1}^{2} + σ_{U}^{2}) + {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} (σ_{Z 2}^{2} + σ_{U}^{2})] \\ - \frac{2 µ_{Y} σ_{Y X}}{n µ_{X}}, \end{array}

(22)

where

σ_{Z i}^{2} = [W [1 - λ_{i} - A ϕ_{i}] σ_{S}^{2} + [W (1 - λ_{i}) ((1 - A) σ_{T}^{2} + 1) + 1 - W] (σ_{Y}^{2} + µ_{Y}^{2}) + W λ_{i} (σ_{R}^{2} + µ_{R}^{2}) - {[µ_{Y} + (µ_{R} - µ_{Y}) W λ_{i}]}^{2}],

λ_{i} = (1 - α) (1 - p_{i}), i = 1, 2,

ϕ_{i} = p_{i} (1 - α), i = 1, 2,

p_{1} \neq p_{2} .

Note that when

σ_{U}^{2} = 0

and

σ_{V}^{2} = 0

, this expression can be reduced to the familiar expression for the MSE of the MOET ratio estimator (

M S E ({\hat{µ}}_{Y R})

), as shown in Equation (6).

As discussed above, the error associated with the measurement of

Z

, which is represented by the random variable

U

, will be independent of model parameters

A

,

W

, and α, as well as the means and variances of the MOET model random variables. It may seem that the story is different with respect to auxiliary information because

V

reflects the error in measuring

X

, and according to Equation (22) the variance of

V

impacts the ratio estimator’s MSE through the term

\frac{µ_{Y}^{2}}{n µ_{X}^{2}} σ_{V}^{2}

. But to the extent that the standard deviation of error in measuring

X

correlates with the size of

µ_{X}

(which is expected), there will in fact be no measurement error impact on the ratio estimator MSE.

4. Impacts of Measurement Error

We have discussed the fact that RRT administrative complexities can heighten the advent of measurement error. In this section, we will study the size of measurement error impact on estimation in Section 4.1. Then, in Section 4.2, we will determine how significant ME would have to be to offset the value of using RRT in the first place.

4.1. Impact of Measurement Error on Estimator Efficiency

Measurement error will of course be more impactful on MSE when it is large. In Figure 2, we consider a scenario that is consistent with other RRT papers (see, for example, the scenarios assumed in Section 6 of Parker et al. (2024)), but with the addition of measurement error [17]. We consider both the error in measuring

Z

(represented by the random variable

U

) and in measuring

X

(represented by the random variable

V

). Figure 2 shows the MSE of the basic and ratio estimators across possible levels of measurement error, as represented by their standard deviations (each ranging from 0 to 2) in a standard scenario represented by the values listed immediately following the figure. Note that

σ_{V}

is represented in the leftside graph to enable direct visual comparison between the two graphs, in spite of the fact that the leftside graph does not contain auxiliary information and therefore MSE does not vary across the

σ_{V}

dimension.

The value of 2 was chosen as the upper limit for standard deviation (see Figure 2) to represent a high measurement error (a standard deviation of 2 implies that the measurement error equals 20% of the mean response value).

We make several observations. First, we see, not surprisingly, that as the measurement error rises, the estimator efficiency declines (MSE increases). We also see that the rate of increase in MSE for the ratio error (which is subject to two kinds of measurement errors) is steeper than that of the basic estimator. The highest MSE shown for the basic estimator (corresponding to

σ_{U} = 2

) is 13.3% greater than the MSE observed when the measurement error is not accounted for. This can be thought of as the extent to which MSE would be under-represented if the measurement error was in fact at the

σ_{U} = 2

level but was not taken into account. In the righthand graph of Figure 2, we see that for the ratio estimator, the MSE rises both when

σ_{U}

rises and when

σ_{V}

rises, that is, MSE increases when there is significant error in the measurement of

Z

and/or

X

. When there is high error in measuring both (corresponding to

σ_{U} = σ_{V} = 2

), then the MSE of the ratio estimator will be 20.6% greater than the MSE observed when the measurement error is not accounted for. But even with maximum measurement error, the MSE of the ratio estimator remains smaller than the MSE of the basic estimator, in this scenario where

ρ_{X Y} = 0.75

. The gain in efficiency realized by making use of auxiliary data is greater than the loss in efficiency attributable to the errors in measuring the data. While Figure 2 represents only one particular scenario, this key result can be generalized by solving the following inequality:

M S E ({\hat{µ}}_{Y R (M E)}) < M S E ({\hat{µ}}_{Y (M E)}) .

(23)

A comparison between Equations (13) and (22) makes it immediately clear that this inequality can be reduced to

\frac{µ_{Y}^{2}}{n µ_{X}^{2}} (σ_{X}^{2} + σ_{V}^{2}) < \frac{2 µ_{Y} σ_{Y X}}{n µ_{X}} .

(24)

Under the standard assumption that

C V (X) = C V (Y)

, the expression can then be simplified and restated as follows:

\frac{σ_{V}}{σ_{X}} < \sqrt{2 ρ_{Y X} - 1}, ρ_{X Y} > 0.5 .

(25)

This condition is represented by the shaded region in Figure 3.

This figure shows us, for example, that if

ρ_{X Y} = 0.6

, then estimation will benefit from auxiliary information provided that the standard deviation attributable to ME is less than ~45% of the standard deviation associated with the auxiliary variable. In short, Equation (24) does not impose a restrictive condition, as

σ_{X}

will generally be much larger than

σ_{V}

. In short, we can conclude that the specter of measurement error will infrequently discourage the use of well-correlated auxiliary information.

4.2. Measurement Error Impact on RRT Objective: Inducing Truthfulness

In this subsection, we study whether the benefit of RRT (removal of lying bias) outweighs its cost (addition of measurement error). Because RRT methods often require the respondent to consider whether they want to answer the sensitive question directly, to scramble their true response, to answer an unrelated question, to opt for additional levels of scrambling, and so on, the administration of RRT can result in elevated levels of measurement error.

For this reason, we now turn our focus to an important question: is there a point at which measurement error becomes so significant that it nullifies the RRT model’s ability to secure truthful responses efficiently? We develop a model to study this issue. The model relies on several key assumptions. The reasonableness of these assumptions will be discussed at the end of this subsection.

Our model is constructed as follows:

We define a parameter $L B$ (lying bias), which represents the extent to which a group’s mean response will be impacted by untruthfulness, without some means of mitigation like RRT. As some questions are more sensitive than others, we will consider sensitive questions with 1%, 5%, and 10% lying bias (LB). For example, if the mean group response to a sensitive question had a +5% lying bias, and the true mean response to the sensitive question was 10, then a direct survey that employed no tactics to reduce the bias would see an average response of 10.5.
We will model untruthfulness as a normal random variable $K$ with mean $µ_{k} = L B * µ_{Y}$ and standard deviation $σ_{k} = \frac{L B * µ_{Y}}{2}$ . We choose this value of $σ_{k}$ because it fixes the range of lie-adjusted responses so that the lie-adjusted mean of $µ_{Y} + µ_{k},$ is precisely two standard deviations from the true mean $µ_{Y}$ , and therefore approximately 97.5% of lie-adjusted responses will be adjusted in the same direction as the assumed bias.
We define $Y_{D S}$ as a random variable that reflects the responses that would be yielded by a survey soliciting direct responses to a sensitive question. Responses will equal truthful responses plus the impact of lying. Specifically,

●

Y_{D S} = Y + K .

(26)

●

Y_{D S} ~ N ((1 + L B) µ_{Y}, \sqrt{σ_{Y}^{2} + \frac{L B * µ_{Y}}{2}}) .

(27)

In various scenarios representing different levels of measurement error and lying bias, we want to calculate the probability that our basic MOET estimator outperforms estimation based on a direct survey (that will yield responses with some level of untruthfulness, but will not face material measurement error). Specifically, we calculate the probability that the MOET estimator will result in an estimate closer to the true value than the direct survey estimate will:

\Pr (|{\hat{µ}}_{Y (M E)} - µ_{Y}| < |{\hat{µ}}_{Y_{D S}} - µ_{Y}|) .

(28)

Using basic inequality identities, expression (28) can be represented as

1 - \Pr (- ({\hat{µ}}_{Y (M E)} - µ_{Y}) < {\hat{µ}}_{Y_{D S}} - µ_{Y} < {\hat{µ}}_{Y (M E)} - µ_{Y}) - P R ({\hat{µ}}_{Y (M E)} - µ_{Y} < {\hat{µ}}_{Y_{D S}} - µ_{Y} < - ({\hat{µ}}_{Y (M E)} - µ_{Y})),

(29)

which can be rewritten as

1 - \Pr ({\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}} > 2 µ_{Y}, {\hat{µ}}_{Y (M E)} - \hat{µ}_{Y_{D S}} > 0) - P r ({\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}} < 2 µ_{Y}, {\hat{µ}}_{Y (M E)} - \hat{µ}_{Y_{D S}} < 0) .

(30)

To calculate the probabilities in the expression above, we need the distributions of

({\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}})

and

({\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}})

. Recalling that

{\hat{µ}}_{Y (M E)} = {\hat{µ}}_{Y} + \bar{U}

and

{\hat{µ}}_{Y_{D S}} = \bar{Y} + \bar{K}

, we can write

{\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}} = {\hat{µ}}_{Y} + \bar{Y} + \bar{U} + \bar{K},

(31)

{\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}} = {\hat{µ}}_{Y} - \bar{Y} + \bar{U} - \bar{K} .

(32)

Before concerning ourselves with

\bar{U}

and

\bar{K}

, we find the distributions of

{\hat{µ}}_{Y} + \bar{Y}

and

{\hat{µ}}_{Y} - \bar{Y}

. As

{\hat{µ}}_{Y}

and

\bar{Y}

are both estimates of the quantity

µ_{Y}

based on the same set of sampled

Y

values, they are correlated. None the less,

{\hat{µ}}_{Y}

+/−

\bar{Y}

have normal distributions as represented below:

{\hat{µ}}_{Y} + \bar{Y} ~ N (2 µ_{Y}, \sqrt{\frac{2}{n} {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z_{2}}^{2} + \frac{2}{n} {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z_{1}}^{2} + \frac{1}{n} σ_{Y}^{2} + 2 ρ_{{\hat{µ}}_{Y} + \bar{Y}, {\hat{µ}}_{Y} - \bar{Y}} \sqrt{\frac{1}{n} σ_{Y}^{2}} \sqrt{\frac{2}{n} {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z_{2}}^{2} + \frac{2}{n} {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z_{1}}^{2}}}),

(33)

{\hat{µ}}_{Y} - \bar{Y} ~ N (0, \sqrt{\frac{2}{n} {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z_{2}}^{2} + \frac{2}{n} {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z_{1}}^{2} + \frac{1}{n} σ_{Y}^{2} - 2 ρ_{{\hat{µ}}_{Y} + \bar{Y}, {\hat{µ}}_{Y} - \bar{Y}} \sqrt{\frac{1}{n} σ_{Y}^{2}} \sqrt{\frac{2}{n} {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z_{2}}^{2} + \frac{2}{n} {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z_{1}}^{2}}}) .

(34)

U

and

K

are normal and independent of each other and independent of both

{\hat{µ}}_{Y} + \bar{Y}

and

{\hat{µ}}_{Y} - \bar{Y}

. We can therefore find the distributions of

({\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}})

and

({\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}})

shown in Equations (31) and (32). We conclude that

{\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}} ~ N ((2 + L B) µ_{Y}, \sqrt{\frac{2}{n} {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z_{2}}^{2} + \frac{2}{n} {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z_{1}}^{2} + \frac{1}{n} σ_{Y}^{2} + 2 ρ_{{\hat{µ}}_{Y} + \bar{Y}, {\hat{µ}}_{Y} - \bar{Y}} \sqrt{\frac{1}{n} σ_{Y}^{2}} \sqrt{\frac{2}{n} {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z_{2}}^{2} + \frac{2}{n} {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z_{1}}^{2}} + \frac{2}{n} σ_{K}^{2} + \frac{2}{n} σ_{U}^{2}}),

(35)

{\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}} ~ N (- L B µ_{Y}, \sqrt{\frac{2}{n} {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z_{2}}^{2} + \frac{2}{n} {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z_{1}}^{2} + \frac{1}{n} σ_{Y}^{2} - 2 ρ_{{\hat{µ}}_{Y} + \bar{Y}, {\hat{µ}}_{Y} - \bar{Y}} \sqrt{\frac{1}{n} σ_{Y}^{2}} \sqrt{\frac{2}{n} {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z_{2}}^{2} + \frac{2}{n} {(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z_{1}}^{2}} + \frac{2}{n} σ_{K}^{2} + \frac{2}{n} σ_{U}^{2}}) .

(36)

Written in more simplified terms,

{\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}} ~ N ((2 + L B) µ_{Y}, \sqrt{M S E ({\hat{µ}}_{Y (M E)}) + \frac{1}{n} σ_{Y}^{2} + 2 ρ_{{\hat{µ}}_{Y} + \bar{Y}, {\hat{µ}}_{Y} - \bar{Y}} \sqrt{\frac{1}{n} σ_{Y}^{2} M S E ({\hat{µ}}_{Y})} + \frac{2}{n} σ_{K}^{2}}),

(37)

{\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}} ~ N (- L B µ_{Y}, \sqrt{M S E ({\hat{µ}}_{Y (M E)}) + \frac{1}{n} σ_{Y}^{2} - 2 ρ_{{\hat{µ}}_{Y} + \bar{Y}, {\hat{µ}}_{Y} - \bar{Y}} \sqrt{\frac{1}{n} σ_{Y}^{2} M S E ({\hat{µ}}_{Y})} + \frac{2}{n} σ_{K}^{2}}) .

(38)

The pair of random variables

({\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}})

and

({\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}})

is bivariate normal, so

({\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}}, {\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}}) ~ B V N (µ_{{\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}}}, µ_{{\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}}}, σ_{{\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}}}, σ_{{\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}}}, ρ_{{\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}}, {\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}}}),

(39)

where

µ_{{\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}}} = (2 + L B) µ_{Y},

µ_{{\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}}} = - L B µ_{Y},

σ_{{\hat{µ}}_{Y (M E)} + {\hat{µ}}_{Y_{D S}}} = \sqrt{M S E ({\hat{µ}}_{Y (M E)}) + \frac{1}{n} σ_{Y}^{2} + 2 ρ_{{\hat{µ}}_{Y} + \bar{Y}, {\hat{µ}}_{Y} - \bar{Y}} \sqrt{\frac{1}{n} σ_{Y}^{2} M S E ({\hat{µ}}_{Y})} + \frac{2}{n} σ_{K}^{2}},

σ_{{\hat{µ}}_{Y (M E)} - {\hat{µ}}_{Y_{D S}}} = \sqrt{M S E ({\hat{µ}}_{Y (M E)}) + \frac{1}{n} σ_{Y}^{2} - 2 ρ_{{\hat{µ}}_{Y} + \bar{Y}, {\hat{µ}}_{Y} - \bar{Y}} \sqrt{\frac{1}{n} σ_{Y}^{2} M S E ({\hat{µ}}_{Y})} + \frac{2}{n} σ_{K}^{2}} .

With this joint distribution specified, Equation (28), which represents the probability that the basic estimator results in an estimate closer to the true value of

µ_{Y}

than a direct survey would, can be studied.

Note that a similar theoretical expression to Equation (28), representing the probability that the ratio estimator would result in an estimate closer to the true value of

µ_{Y}

than a direct survey would (i.e.,

\Pr (|{\hat{µ}}_{Y R (M E)} - µ_{Y}| < |{\hat{µ}}_{Y_{D S}} - µ_{Y}|)

), could not be derived because the distribution of the key quantity

{\hat{µ}}_{Y R (M E)}

equals the ratio of two correlated normal variables, and this expression has no known distribution.

Figure 4 shows the probability that

{\hat{µ}}_{Y (M E)}

will be closer to the true value

µ_{Y}

than

{\hat{µ}}_{Y_{D S}}

is, over a range of measurement errors (

σ_{U}

) in a standard scenario represented by the values listed immediately following the figure. For illustrative purposes, we assume that the direct survey, which does not involve the administrative complexities inherent to RRT models, is not subject to measurement error. However, the direct survey will be impacted by untruthfulness (lying bias = LB). In contrast, the RRT model will eliminate untruthfulness but will be subject to measurement error. The probabilities represented in the figure are based on theoretical calculations that use empirical estimates for correlations; the calculated theoretical probabilities match empirical simulations closely.

As would be expected, when higher levels of untruthfulness are present and measurement error is low, the RRT-based estimate

{\hat{µ}}_{Y (M E)}

will be dramatically more reliable than the direct survey estimate

{\hat{µ}}_{Y_{D S}}

. For example, when there is no measurement error at all and respondents overreport their true responses by 10% (LB = +10%),

{\hat{µ}}_{Y (M E)}

will yield an estimate closer to the true mean response than

{\hat{µ}}_{Y_{D S}}

with 86% likelihood. Also, as expected, the superiority of

{\hat{µ}}_{Y (M E)}

declines as the measurement error increases. But the most important observation we make is that the probability of MOET superiority deteriorates only gradually as ME rises. Therefore, for a given level of LB and sample size, small levels of ME do not excessively undermine MOET estimation. For example, when

L B = 10 %

,

{\hat{µ}}_{Y (M E)}

will continue to be superior to

{\hat{µ}}_{Y_{D S}}

(closer to the true value with more than 50% likelihood) except in implausible scenarios where

σ_{U} > 12

.

While our model is specific both in terms of assumptions and in terms of scenario, its conclusions are strong, as seen in Section 6. When LB is material and sample sizes are adequate, plausible levels of ME do not reduce MOET performance so drastically that its benefit (estimating truthful responses accurately) is thwarted.

5. Measurement Error Accentuates Privacy

Privacy is an absolutely critical element of all RRT methodologies. Lanke (1976) intimated that in fact the most important quality of an RRT model is the extent to which it “protect[s] the privacy of interviewees” [20]. Indeed, it is the fact that the respondent’s true response to a sensitive question is scrambled (as in Warner-type quantitative RRT models) or hidden among similar-looking answers to nonsensitive questions (as in Greenberg-type quantitative RRT models) that gives the respondent the confidence to respond truthfully, free from possible shame, embarrassment, or even legal repercussions. But the collection of auxiliary information may serve to undermine privacy, unmasking the respondent’s identity and revealing their true response to the sensitive question, thereby undercutting the very fundamentals of RRT. Arnab and Dorffner (2006) grappled with this issue, noting that “most surveys are complex” and that usually “information of more than one character is collected at a time. Some of them are of a confidential nature while the others are not” [21]. The collection of additional auxiliary information can importantly undermine privacy; at an extreme level, if auxiliary information was perfectly correlated with the response to the sensitive question, knowledge of

X

would immediately lead to knowledge of

Y

.

Parker et al. (2024) pointed out that when auxiliary information (

X

) is collected, privacy can be represented as in [17]:

\nabla (Y, X) = (1 - ϕ) \nabla (Y),

(40)

because

\nabla (Y, X)

is bound by 0 and

\nabla (Y) .

\nabla (Y)

and

ϕ

are defined as follows:

$\nabla (Y)$ : privacy of the RRT model applied to the sensitive question whose true response is represented by the random variable $Y$ .
$ϕ$ : percentage reduction in privacy associated with auxiliary data.

It follows that privacy loss due to auxiliary information is

\nabla L = \nabla (Y) - \nabla (Y, X) = ϕ \nabla (Y) .

(41)

Using Yan et al.’s (2008) definition of privacy

\nabla (Y) = E [{(Z - Y)}^{2}]

, privacy in the presence of measurement error would be as follows [22]:

\nabla_{(M E)} (Y) = E [{(Z + U - Y)}^{2}] .

(42)

Recalling that

U

is uncorrelated with

Z

or

Y

, and that

E [U]

= 0, this expression can be simplified to

\nabla_{(M E)} (Y) = E [Z^{2}] + E [Y^{2}] - 2 E [Z Y] + E [U^{2}],

(43)

and can be further reduced to

\nabla_{(M E)} (Y) = \nabla (Y) + σ_{U}^{2} .

(44)

Equation (44) makes it clear that privacy, surprisingly, improves as a result of measurement error. This is because measurement error inadvertently results in extra response scrambling.

When auxiliary information that reduces privacy is collected, privacy can be represented:

\nabla_{(M E)} (Y, X) = (1 - ϕ) \nabla (Y) + σ_{U}^{2} .

(45)

Note that auxiliary information will reduce the privacy resulting from RRT perturbation, but not the privacy resulting from measurement error. With this in mind, the privacy loss that occurs when ME is present will be the same as when ME is not present:

{\nabla L}_{(M E)} (Y, X) = ϕ \nabla (Y) .

(46)

For the MOET model, Parker et al. calculated privacy as

\nabla^{a} = \frac{1}{2} \{2 α A σ_{S}^{2} + ({2 - λ}_{1} - λ_{2}) (1 - A) [(σ_{Y}^{2} + µ_{Y}^{2}) σ_{T}^{2} + σ_{S}^{2}] + (λ_{1} + λ_{2}) [σ_{Y}^{2} + σ_{R}^{2} {(µ_{Y} - µ_{R})}^{2}]\},

(47)

where the superscript

a

in the above expression reminds us that this measure has been adjusted to reflect the fact that optionality does not undermine privacy for the proportion of respondents

(1 - W)

who do not consider the sensitive question to be sensitive to them, as per Gupta et al. (2002) [23].

In Figure 5, we show the privacy that the MOET model provides when auxiliary information is collected, across a range of ME and at three different levels of privacy reduction due to auxiliary information (

ϕ

) in two reasonable scenarios, as represented by the values listed immediately following the figure.

The graph on the left represents the standard baseline scenario underlying the other graphs and figures throughout this study, with

σ_{Y} = σ_{X} = σ_{R} = 5

. In the standard scenario, privacy is higher by 21%, 32%, and 64% at

σ_{U} = 2

than at

σ_{U} = 0

at the three assumed levels of

ϕ

. The graph on the right represents another plausible scenario, in which the natural variability of

Y

is smaller:

σ_{Y} = 2

. We also chose

σ_{X} = 2

in this scenario because it is standard to assume

C V (X) = C V (Y)

, and we set

σ_{R} = 2

because the parameters of the unrelated question should be similar to the parameters of the sensitive question. In this scenario, privacy is higher by 81%, 122%, and 234% at

σ_{U} = 2

than at

σ_{U} = 0

at the three assumed levels of

ϕ

. All of this is to say that ME provides an additional source of privacy to RRT models, especially when privacy is low.

ME decreases efficiency (increases MSE), as per Equations (13) and (22), but we have seen that ME also inadvertently increases privacy, as per Equations (44) and (45), that is, the same phenomenon (in this case, measurement error) causes one aspect of model quality (efficiency) to deteriorate while it causes the other (privacy) to improve. Singh et al. (2020) and other statisticians have faced this same problem of assessing overall model value when “level of [privacy] protection” and “efficient estimation” are competing considerations [24]. Gupta et al. (2018) offered a solution to this dilemma when they proposed a unified measure (UM) which enables the quantification of overall model quality, taking both efficiency and privacy into account [25]:

δ^{a} = \frac{M S E ({\hat{µ}}_{Y})}{\nabla^{a}} .

(48)

The superscript

a

indicates that this measure has been adjusted to reflect Gupta et al.’s (2002) assertion that optionality does not reduce privacy for the proportion of respondents

(1 - W)

who do not consider the question to be sensitive [23]. Small values of UM reflect superior model performance. In Figure 6, we study UM across a range of ME values under three different privacy reduction assumptions in a standard scenario represented by the values listed immediately following the figure. We assume

σ_{U} = σ_{V}

across a range of 0 to 2; results are not sensitive to this assumption.

In our illustrative scenario, the ME impact on privacy (numerator) overpowers the ME impact on efficiency (denominator), so UM improves modestly as ME rises. This result can be generalized by calculating the relative change in UM when ME is present:

Δ (δ^{a}) = \frac{δ^{a} (σ_{U}, σ_{V})}{δ^{a} (σ_{U} = σ_{V} = 0)} - 1 .

(49)

This expression equals

Δ (δ^{a}) = (\frac{\nabla (Y) - ϕ \nabla (Y)}{\nabla (Y) - ϕ \nabla (Y) + σ_{U}^{2}}) \frac{\frac{µ_{Y}^{2}}{n µ_{X}^{2}} σ_{X}^{2} + \frac{2}{n} [{(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z 1}^{2} + {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z 2}^{2}] - \frac{2 µ_{Y} σ_{Y X}}{n µ_{X}} + \frac{2}{n} ({(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} + {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2}) σ_{U}^{2} + \frac{µ_{Y}^{2}}{n µ_{X}^{2}} σ_{V}^{2}}{\frac{µ_{Y}^{2}}{n µ_{X}^{2}} σ_{X}^{2} + \frac{2}{n} [{(\frac{1 - p_{2}}{p_{2} - p_{1}})}^{2} σ_{Z 1}^{2} + {(\frac{1 - p_{1}}{p_{2} - p_{1}})}^{2} σ_{Z 2}^{2}] - \frac{2 µ_{Y} σ_{Y X}}{n µ_{X}}} - 1 .

(50)

But when

n

is large, Equation (50) can be reduced to

Δ (δ^{a}) = - {(1 + \frac{(1 - ϕ) \nabla (Y)}{σ_{U}^{2}})}^{- 1}

(51)

The observation of this expression makes it clear that the change in UM will be negative when ME is introduced. Since small UM values indicate superior performance, this means that UM will in fact decrease (modestly) with a larger ME, if the sample size is large. Such a reduction will be the most significant when the adjusted privacy (

(1 - ϕ) \nabla (Y)

) is small. The major conclusion of this analysis is not that we expect UM gains from ME, but simply that we do not expect material UM deterioration as a result of ME.

6. Simulations

We show two tables of simulated values in this section. All tabular values represent model output that reflects the impact of measurement error on the MOET quantitative RRT model, and all simulations assume the same base parameter values, with minor deviations as noted. As a matter of full disclosure so that the values in the tables can be independently reproduced, scenario values are listed immediately preceding the tables. In both tables, the means of

Y

and

X

are set equal to 10 for convenience, as it is only their values relative to their standard deviations that matter. The standard deviations of

Y

and

X

are set equal to half of their means (5), as is consistent with other recent RRT studies such as Parker et al. (2024) [17]. The mean and variance of

R

are set equal to the mean and variance of

Y

because it is important that the unrelated question behaves similarly to the sensitive question.

μ_{S} = 0, μ_{T} = 1

,

σ_{S} = 1, and σ_{T} = 1

are standard choices for additive and multiplicative scrambling values, and the choices of

{α = 0.15, p}_{1} = 0.85, a n d p_{2} = 0.15

are chosen because they lead to high model efficiency, as per Parker et al. (2024) [17]. When not otherwise specified,

A

,

W

, and

ρ_{X Y}

are set to the moderate values of

0.95

,

0.90

, and

0.75

, respectively. Each line of each table can be thought of as a unique scenario, based on a particular set of model parameters, and each scenario is run 10,000 times in R software version 4.3.1 to show the close match between simulated and theoretical values.

Table 1 is provided for two reasons. First, to exhibit the close match between theoretical and empirical values (denoted by subscripts T and E, respectively). Second, Table 1 is intended to show the behavior of the basic estimator and the ratio estimator across a range of assumed measurement errors (

σ_{U}

and

σ_{V}

range from values of 0 to 2). We also show results across

ρ_{X Y}

values because we want to witness the performance of the ratio estimator in comparison to the basic estimator, and the ratio estimator’s performance depends heavily on the correlation between auxiliary information (

X

) and the true response to the sensitive question (

Y

). We do not show results varying by

A

,

W

, and α, because, as noted in Section 3.1 and Section 3.2 (see Equations (13) and (22)), measurement error does not depend on these quantities.

Table 1 is split into sections (A) and (B), where Table 1A shows scenarios where the standard deviation of measurement error arising from the measurement of

Z

and

X

(represented by

σ_{U}

and

σ_{V}

) rises in lockstep. Table 1B shows scenarios where

σ_{V}

remains relatively low regardless of the value of

σ_{U}

. This represents the idea that the collection of RRT data, which is by nature complex, may result in more measurement error than the collection of auxiliary information.

It is clear throughout Table 1 that theoretical and empirical values match closely. For example, in the third row of the table, we see that the basic estimator

{\hat{µ}}_{Y (M E)}_{E}

= 10.0007 is close to the theoretical value of 10.0000. Similarly, the theoretical and empirical values of

M S E ({\hat{µ}}_{Y (M E)})

for the basic estimator are 0.1884 and 0.1881. For the ratio estimator, the theoretical and empirical values of

{\hat{µ}}_{Y R (M E)}

are 10.0053 and 10.0051 and the theoretical and empirical values of

M S E ({\hat{µ}}_{Y R (M E)})

are 0.1454 and 0.1464. In all cases, theoretical and empirical values match closely, implying that the theoretical results are validated by the simulations.

Analytically, we make several additional observations. We note that, as expected, when measurement error increases, MSE rises (i.e., efficiency declines). However, we also note that the rate of decline is greater for the ratio estimator than it is for the basic estimator, even in Table 1B where

σ_{U}

is held constant. But because the ratio estimator is naturally more efficient when auxiliary data are of high quality, the ratio estimator generally remains more efficient than the basic estimator, even when high levels of measurement error are encountered. The only exceptions to this rule come when

ρ_{X Y} = 0.5

. This consequence of low correlation was anticipated by Equation (25), as demonstrated in Figure 3. But in cases where correlation is low, the ratio estimator should typically not be used.

For each grouping of simulations (for example the first five rows of Table 1A), it is easy to think of the successive lines as representations of increasing levels of measurement error. However, it is more important to conceptualize this information in reverse. That is, if true measurement error is in reality

σ_{U} = σ_{V} = 2

(as in line 5) but is not correctly recognized and accounted for (so, values are calculated based on

σ_{U} = σ_{V} = 0

, as in line 1), then the true level of MSE will be significantly understated. For example, the true MSE, when using the ratio estimator, would be 23.5% higher than it would appear to be if the measurement error was not taken into account (0.1696 versus 0.1373).

In Table 2 (below), we study whether the benefit of RRT (removal of lying bias) outweighs its cost (addition of measurement error). To do this, we simulate two surveys—a direct survey which does not implement RRT or any other SDB-reduction mechanism, and an MOET RRT-based survey. We know that with RRT absent, the direct survey will be subject to lying bias, but will be simple to administer and therefore will provoke very little measurement error. In contrast, an RRT survey is administratively complex and will provoke significant measurement error but very little inaccuracy due to lying bias. Based on 10,000 iterations, we estimate the probability the RRT survey will result in a closer estimate of the mean response to the sensitive question by finding the proportion of iterations in which RRT estimates are closer to true values than direct survey estimates are. We compare results across three lying bias levels (2%, 5%, 10%) and across five levels of measurement error (0, 0.5, 1.0, 1.5, 2.0). We also study results across three different sample sizes (50, 250, 500). Note that, as discussed in Section 4.2, the distributions of

({\hat{µ}}_{Y R (M E)} + {\hat{µ}}_{Y}_{D S})

and (

{\hat{µ}}_{Y R (M E)} - {\hat{µ}}_{Y}_{D S}

) do not follow known distributions; therefore, no theoretical expression for

P r (| {\hat{µ}}_{Y R (M E)} - μ_{Y} | < | {\hat{µ}}_{Y}_{D S} - μ_{Y} |)

could be derived.

For the basic estimator, the probability that the MOET RRT model improves estimation (

P r (| {\hat{µ}}_{Y (M E)} - μ_{Y} | < | {\hat{µ}}_{Y}_{D S} - μ_{Y} |)

) is calculated on both a theoretical and empirical basis. Theoretical and empirical results match closely as seen in the eighth and ninth columns of the table, a fact which implies that the theoretical probability expressions shown in Equations (28)–(30) are correct and that the assumption of bivariate normality represented in Equation (39) is justified. Because the distribution underlying the ratio estimator could not be identified, only empirical estimates of

P r (| {\hat{µ}}_{Y R (M E)} - μ_{Y} | < | {\hat{µ}}_{Y}_{D S} - μ_{Y} |)

were calculated. As expected, in all cases, the ratio estimator’s performance was similar but superior to the basic estimator’s performance.

We note that there are three factors represented in the table that increase the probability that

{\hat{µ}}_{Y (M E)}

and

{\hat{µ}}_{Y R (M E)}

will result in better estimates than

{\hat{µ}}_{D S}

: large sample sizes, high levels of lying bias, and low levels of measurement error. Indeed, when

n = 50

,

L B = 2 %

, and

σ_{U} = σ_{U} = 2

, it is only 29% likely that

{\hat{µ}}_{Y (M E)}

will be closer to

µ_{Y}

than

{\hat{µ}}_{D S}

. But when

n = 500

,

L B = 10 %

, and

σ_{U} = σ_{V} = 0

, the probability that

{\hat{µ}}_{Y (M E)}

is closer to

µ_{Y}

than

{\hat{µ}}_{D S}

is 95%. Importantly to this study, measurement error is the least impactful of the three factors.

The tabular findings imply that the MOET model should not be used when the sample size is low and LB is expected to be small. However, the table also implies that MOET should definitely be used when the sample size is adequate and LB is expected to be significant (the exact circumstance that the RRT models were developed for). The table further indicates the important conclusion that while measurement error will reduce MOET performance, reasonable levels of measurement error should not impact a researcher’s decision to use or not use the model.

7. Conclusions

Measurement error is particularly important to RRT models because the administration of surveys based on these models is often complicated and can therefore lead to excess measurement error. In this study, we developed basic and ratio estimators that reflect measurement error for the recently proposed MOET RRT model by Parker et al. (2024) [17]. Using them, we identified circumstances where a failure to reflect ME could result in underestimating efficiency by more than 20%, confirming our expectation that the reflection of measurement error can be important. The accuracy of our new estimators was validated through simulation. Researchers should take standard precautions to avoid excessive measurement error, such as providing simple, clear instructions to respondents, providing only positive whole-number values as scrambling variables, and providing a calculator or charts of computations. And researchers would benefit from using the revised estimators proposed in this study to better quantify the efficiency of their estimations.

We further considered two important questions involving measurement error. The first involved determining if and when measurement error associated with the collection of auxiliary data was so high that it should not be collected. Equation (25) showed us that it would only be in unusual circumstances—such as when the correlation of auxiliary information with the sensitive question was low and the variance that comes from collecting auxiliary information (measurement error) was high relative to the variance inherent in the auxiliary information—that auxiliary information should not be collected. Of course, low correlation of auxiliary information with the sensitive question is a circumstance where ratio estimation is unlikely to be effective regardless of measurement error. Researchers should use Equation (25) to verify, according to a priory estimates of variance and correlation, that auxiliary information should be collected in their study.

Second, we considered whether measurement error might be so large that it undermined the MOET model’s ability to extract truthful answers to sensitive questions. Toward this objective, we derived relationships that allowed us to calculate the probability that an MOET estimate of the mean response to the sensitive question would be closer to the true response than an estimate based on a direct survey (which would be biased by untruthfulness). We concluded that the MOET model yielded superior estimates in spite of measurement error, provided that lying bias was significant and sample size was adequate. Finally, we recognized that while auxiliary information improves efficiency, it comes at a cost—it reduces privacy.

Author Contributions

Initial conceptualization, S.G.; methodology, M.P., S.G. and S.K.; software, S.K. and M.P.; validation, S.K. and M.P.; formal analysis, M.P., S.K. and S.G.; writing—original draft preparation, M.P.; writing—review and editing, S.G., S.K. and M.P.; supervision, S.G. and S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

This study is based on a simulation and does not involve a real dataset. The R code used to run the simulations will be provided upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Derivation of Equation (18): $E [{\bar{Z}}_{i}^{'} {\bar{Z}}_{i}^{'}] = \frac{1}{n_{i}} (σ_{Z i}^{2} + σ_{U}^{2})$

Let

U_{j} = z_{j} - Z_{j}

(A1)

It follows that

ψ_{Z} = \sum_{j = 1}^{n} (Z_{j} - \bar{Z}),

(A2)

ψ_{U} = \sum_{j = 1}^{n} (U_{j} - \bar{U}) = \sum_{j = 1}^{n} U_{j},

(A3)

ψ_{U} = \sum_{j = 1}^{n} (U_{j} - \bar{U}) = \sum_{j = 1}^{n} U_{j},

(A4)

{[ψ_{Z} + ψ_{U}]}^{2} = {[\sum_{j = 1}^{n} (Z_{j} - \bar{Z}) + \sum_{i = 1}^{n} U_{j}]}^{2},

(A5)

{[ψ_{Z} + ψ_{U}]}^{2} = {[\sum_{j = 1}^{n} (Z_{j} - \bar{Z})]}^{2} + {[\sum_{j = 1}^{n} U_{j}]}^{2} + 2 [\sum_{j = 1}^{n} (Z_{j} - \bar{Z})] [\sum_{j = 1}^{n} U_{j}] .

(A6)

Taking these expectations into account, we have

{E {[ψ_{Z} + ψ_{U}]}^{2}} = E {{[\sum_{i = 1}^{n} (Z_{i} - \bar{Z})]}^{2}} + E {{[\sum_{i = 1}^{n} U_{i}]}^{2}} + 2 E {[\sum_{i = 1}^{n} (Z_{i} - \bar{Z})] [\sum_{i = 1}^{n} U_{i}]} .

(A7)

Because Z comprises linear combinations of the random variables

Y

,

R

,

S

, and

T

, and all of these variables are independent of measurement error (

U

), the cross-product term is zero. So,

{E {[ψ_{Z} + ψ_{U}]}^{2}} = E {{[\sum_{i = 1}^{n} (Z_{i} - \bar{Z})]}^{2}} + E {{[\sum_{i = 1}^{n} U_{i}]}^{2}},

(A8)

E {{[ψ_{Z} + ψ_{U}]}^{2}} = \sum_{i = 1}^{n} E {{(Z_{i} - \bar{Z})}^{2}} + \sum_{i = 1}^{n} E {{(U_{i} - \bar{U})}^{2}} (Note E (\bar{U}) = 0,

(A9)

E {{[ψ_{Z} + ψ_{U}]}^{2}} = n \frac{1}{n} \sum_{i = 1}^{n} E {{(Z_{i} - \bar{Z})}^{2}} + n \frac{1}{n} \sum_{i = 1}^{n} E {{(U_{i} - \bar{U})}^{2}},

(A10)

E {{[ψ_{Z} + ψ_{U}]}^{2}} = n (σ_{Z}^{2} + σ_{U}^{2}) .

(A11)

We now find a second expression that also equals

E {{[ψ_{Z} + ψ_{U}]}^{2}}

:

ψ_{Z} + ψ_{U} = \sum_{i = 1}^{n} (Z_{i} - \bar{Z}) + \sum_{i = 1}^{n} (z_{i} - Z_{i})

(A12)

\frac{1}{n} (ψ_{Z} + ψ_{U}) = \frac{1}{n} \sum_{i = 1}^{n} (Z_{i} - \bar{Z}) + \frac{1}{n} \sum_{i = 1}^{n} (z_{i} - Z_{i})

(A13)

\frac{1}{n} (ψ_{Z} + ψ_{U}) = \frac{1}{n} \sum_{i = 1}^{n} (z_{i} - \bar{Z})

(A14)

\frac{1}{n} (ψ_{Z} + ψ_{U}) = \frac{1}{n} \sum_{i = 1}^{n} z_{i} - \frac{1}{n} \sum_{i = 1}^{n} \bar{Z}

(A15)

\frac{1}{n} (ψ_{Z} + ψ_{U}) = \bar{z} - \bar{Z}

(A16)

Let

{\bar{Z}}^{'} = \bar{z} - \bar{Z}

(A17)

From (A16) and (A17), it follows that

\frac{1}{n} (ψ_{Z} + ψ_{U}) = {\bar{Z}}^{'}

(A18)

Solving further,

{(ψ_{Z} + ψ_{U})}^{2} = n^{2} {\bar{Z}}^{'}^{2}

(A19)

E [{(ψ_{Z} + ψ_{U})}^{2}] = n^{2} E ({\bar{Z}}^{'}^{2})

(A20)

Combining (A11) and (A20), we have

n^{2} E ({\bar{Z}}^{'}^{2}) = n (σ_{Z}^{2} + σ_{U}^{2}) .

(A21)

It follows that

E ({\bar{Z}}^{'} {\bar{Z}}^{'}) = \frac{1}{n} (σ_{Z}^{2} + σ_{U}^{2}) .

(A22)

For split samples where Z is defined differently in the two splits, we have the following:

E ({\bar{Z}}_{i}^{'} {\bar{Z}}_{i}^{'}) = \frac{1}{n_{i}} (σ_{Z 1}^{2} + σ_{U}^{2}), i = 1,2 .

(A23)

Appendix A.2. Derivation of Equation (19): $E [{\bar{X}}_{i}^{'} {\bar{X}}_{i}^{'}] = \frac{1}{n_{i}} (σ_{X}^{2} + σ_{V}^{2})$

This derivation follows identical development to that of

E [Z_{i}^{'} {\bar{Z}}_{i}^{'}]

in Appendix A.1.

Appendix A.3. Derivation of Equation (20): $E [{\bar{X}}_{i}^{'} {\bar{Z}}_{i}^{'}] = \frac{1}{n_{i}} [σ_{Y X} - W (1 - α) (1 - p_{i}) (σ_{Y X} + μ_{R} μ_{X})]$

From Appendix A.1, we have the following:

[ψ_{Z} + ψ_{U}] [ψ_{X} + ψ_{V}] = [\sum_{i = 1}^{n} (Z_{i} - \bar{Z}) + \sum_{i = 1}^{n} U_{i}] [\sum_{i = 1}^{n} (X_{i} - \bar{X}) + \sum_{i = 1}^{n} V_{i}] .

(A24)

Equivalently,

[ψ_{Z} + ψ_{U}] [ψ_{X} + ψ_{V}] = [\sum_{i = 1}^{n} (Z_{i} - \bar{Z})] [\sum_{i = 1}^{n} (X_{i} - \bar{X})] +

(A25)

[\sum_{i = 1}^{n} (Z_{i} - \bar{Z})] [\sum_{i = 1}^{n} V_{i}] + [\sum_{i = 1}^{n} U_{i}] [\sum_{i = 1}^{n} (X_{i} - \bar{X})] + [\sum_{i = 1}^{n} U_{i}] [\sum_{i = 1}^{n} V_{i}] .

Taking these expectations into account, terms involving

U

and

V

equal zero due to a noncorrelation with

Z

and

X

; therefore, we have the following:

E {[ψ_{Z} + ψ_{U}] [ψ_{X} + ψ_{V}]} = E {[\sum_{i = 1}^{n} (Z_{i} - \bar{Z})] [\sum_{i = 1}^{n} (X_{i} - \bar{X})]},

(A26)

E {[ψ_{Z} + ψ_{U}] [ψ_{X} + ψ_{V}]} = n \frac{1}{n} E {[\sum_{i = 1}^{n} (Z_{i} - \bar{Z})] [\sum_{i = 1}^{n} (X_{i} - \bar{X})]}, and

(A27)

E {[ψ_{Z} + ψ_{U}] [ψ_{X} + ψ_{V}]} = n σ_{Z X} .

(A28)

We now find a second expression that also equals

E {[ψ_{Z} + ψ_{U}] [ψ_{X} + ψ_{V}]}

.

We have the following:

\frac{1}{n} (ψ_{Z} + ψ_{U}) = {\bar{Z}}^{'} and \frac{1}{n} (ψ_{X} + ψ_{V}) = {\bar{X}}^{'} .

So,

\frac{1}{n^{2}} (ψ_{Z} + ψ_{U}) (ψ_{X} + ψ_{V}) = {\bar{Z}}^{'} {\bar{X}}^{'},

(A29)

E {(ψ_{Z} + ψ_{U}) (ψ_{X} + ψ_{V})} = E {n^{2} {\bar{Z}}^{'} {\bar{X}}^{'}} .

(A30)

Combining (A28) and (A30), we have

E {n^{2} {\bar{Z}}^{'} {\bar{X}}^{'}} = n σ_{Z X},

(A31)

E ({\bar{Z}}^{'} {\bar{X}}^{'}) = \frac{1}{n} σ_{Z X}, where

(A32)

σ_{Z X} = σ_{Y X} - W (1 - α) (1 - p) (σ_{Y X} + μ_{R} μ_{X})

For split samples, we have

E ({\bar{Z}}_{i}^{'} {\bar{X}}_{i}^{'}) = \frac{1}{n_{i}} [σ_{Y X} - W (1 - α) (1 - p_{i}) (σ_{Y X} + μ_{R} μ_{X})]

(A33)

References

Warner, S. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef]
Greenberg, B.G.; Abul-Ela, A.; Simmons, W.R.; Horvitz, D.G. The unrelated question randomized response model: Theoretical framework. J. Am. Stat. Assoc. 1969, 64, 520–539. [Google Scholar] [CrossRef]
Latkin, C.; Edwards, C.; Davey-Rothwell, M.; Tobin, K. The relationship between social desirability bias and self-reports of health, substance use, and social network factors among urban substance users in Baltimore, Maryland. Addict. Behav. 2017, 73, 133–136. [Google Scholar] [CrossRef]
Warner, S. The Linear Randomized Response Model. J. Am. Stat. Assoc. 1971, 66, 884–888. [Google Scholar] [CrossRef]
Greenberg, B.G.; Kuebler, R.; Abernathy, J.R.; Horvitz, D.G. Application of the Randomized Response Technique in Obtaining Quantitative Data. J. Am. Stat. Assoc. 1971, 66, 243–250. [Google Scholar] [CrossRef]
Pollock, K.H.; Bek, Y. A comparison of three randomized response models for quantitative data. J. Am. Stat. Assoc. 1976, 71, 884–886. [Google Scholar] [CrossRef]
Eichhorn, B.H.; Hayre, L.S. Scrambled randomized response methods for obtaining sensitive quantitative data. J. Stat. Plan. Inference 1983, 7, 307–316. [Google Scholar] [CrossRef]
Diana, G.; Perri, P.F. A class of estimators for quantitative sensitive data. Stat. Pap. 2011, 52, 633–650. [Google Scholar] [CrossRef]
Perri, P.F. Modified Randomized Devices for Simmons’ model. Model Assist. Stat. Appl. 2008, 3, 233–239. [Google Scholar] [CrossRef]
Blattman, C.; Gonwa, T.; Jamison, J.; Rodrigues, K.; Sheridan, M. Measuring the measurement error: A method to qualitatively validate sensitive survey data. J. Dvlpmntl. Econ. 2016, 120, 99–112. [Google Scholar] [CrossRef][Green Version]
Sharma, P.; Singh, R. Method of Estimation in the Presence of Nonresponse and Measurement Errors Simultaneously. JMASM 2015, 14, 107–121. [Google Scholar]
Khalil, S.; Noor-ul-Amin, M.; Hanif, M. Estimation of Population Mean for a Sensitive Variable in the Presence of Measurement Error. J. Stat. Mgmt. Syst. 2018, 21, 81–91. [Google Scholar] [CrossRef]
Makhdum, M.; Sanaullah, A.; Hanif, M. Mean Estimation of A Sensitive Variable Using Optional Randomized Response Technique in Two-Phase Sampling When Non-response and Measurement Error are Simultaneously Present. J. Stat. Theory Pract. 2022, 16, 59. [Google Scholar] [CrossRef]
Singh, N.; Vishwakarma, G. A generalized class of estimator of population mean with the combined effect of measurement errors and non-response in sample survey. Investig. Oper. 2019, 40, 275–285. [Google Scholar]
Priyanka, K.; Trisandhya, P.; Kumar, A. Evaluating the Effect of Measurement Error Under Randomized Response Techniques of the Sensitive Variable in Successive Sampling. Proc. Natl. Acad. Sci. India Sect. A Phys. Sci. 2023, 93, 631–644. [Google Scholar] [CrossRef]
Audu, A.; Singh, R.; Khare, S.; Dauran, N. Almost unbiased estimators for population mean in the presence of non-response and measurement error. J. Stat. Mgmt. Syst. 2020, 24, 573–589. [Google Scholar] [CrossRef]
Parker, M.; Gupta, S.; Khalil, S. A Mixture Quantitative Randomized Response Model That Improves Trust in RRT Methodology. Axioms 2024, 13, 11. [Google Scholar] [CrossRef]
Gupta, S.; Parker, M.; Khalil, S. A Ratio Estimator for the Mean Using a Mixture Optional Enhance Trust (MOET) Randomized Response Model. Mathematics 2024, 12, 3617. [Google Scholar] [CrossRef]
Thompson, S.K. Sampling, 3rd ed.; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2012; pp. 93–124. ISBN 978-0-470-40231-3. [Google Scholar]
Lanke, J. On the degree of protection in randomized interviews. Int. Stat. Rev. 1976, 44, 33. [Google Scholar] [CrossRef]
Arnab, R.; Dorffner, G. Randomized response technique for complex survey designs. Stat. Pap. 2006, 48, 131–141. [Google Scholar] [CrossRef]
Yan, Z.; Wang, J.; Lai, J. An Efficiency and Protection Degree-Based Comparison Among the Quantitative Randomized Response Strategies. Theory Methods 2008, 38, 400–408. [Google Scholar] [CrossRef]
Gupta, S.; Gupta, B.; Singh, S. Estimation of the sensitivity level of personal interview survey questions. J. Stat. Plan. Inference 2002, 100, 239–247. [Google Scholar] [CrossRef]
Singh, G.N.; Kumar, A.; Vishwakarma, G.K. Some alternative additive randomized response models for estimation of population mean of quantitative sensitive variable in the presence of scramble variable. Commun. Stat.-Simul. Comput. 2020, 49, 2785–2807. [Google Scholar] [CrossRef]
Gupta, S.; Mehta, S.; Shabbir, J.; Khalil, S. A Unified Measure of respondent privacy and model efficiency in quantitative RRT models. J. Stat. Theory Pract. 2018, 12, 506–511. [Google Scholar] [CrossRef]

Figure 1. MOET model decision tree diagram.

Figure 2. MSE of basic and ratio mean estimators when impacted by measurement error. Scenario values: A = 0.95, W = 0.90,

p_{1}

= 0.15,

p_{2}

= 0.85, α = 0.15, n = 500,

µ_{Y}

= 10,

µ_{R}

= 10,

µ_{X}

= 10,

µ_{S}

= 0,

µ_{T}

= 1,

µ_{U}

= 0,

µ_{V}

= 0,

σ_{Y}

= 5,

σ_{R}

= 5,

σ_{X}

= 5,

σ_{S}

= 1,

σ_{T}

= 1,

σ_{U}

: [0, 2],

σ_{V}

: [0, 2],

ρ_{X Y}

= 0.75.

Figure 2. MSE of basic and ratio mean estimators when impacted by measurement error. Scenario values: A = 0.95, W = 0.90,

p_{1}

= 0.15,

p_{2}

= 0.85, α = 0.15, n = 500,

µ_{Y}

= 10,

µ_{R}

= 10,

µ_{X}

= 10,

µ_{S}

= 0,

µ_{T}

= 1,

µ_{U}

= 0,

µ_{V}

= 0,

σ_{Y}

= 5,

σ_{R}

= 5,

σ_{X}

= 5,

σ_{S}

= 1,

σ_{T}

= 1,

σ_{U}

: [0, 2],

σ_{V}

: [0, 2],

ρ_{X Y}

= 0.75.

Figure 3. Inequality relationship between

\frac{σ_{V}}{σ_{X}}

and

ρ_{X Y}

that indicates whether the use of auxiliary information will improve estimation.

Figure 3. Inequality relationship between

\frac{σ_{V}}{σ_{X}}

and

ρ_{X Y}

that indicates whether the use of auxiliary information will improve estimation.

Figure 4. Probability that the MOET basic estimator’s estimate is closer to

µ_{Y}

than a direct survey estimate. Scenario values:

A = 0.95, W = 0.90, {α = 0.15, p}_{1} = 0.85, p_{2} = 0.15, μ_{Y} = 10, μ_{X} = 10, μ_{R}

= 10,

μ_{S} {= 0, μ}_{T} = 1, μ_{U} = 0, μ_{V} = 0, σ_{Y} = 5, σ_{R} = 5, σ_{S} = 1, σ_{T} = 1, n = 250

.

Figure 4. Probability that the MOET basic estimator’s estimate is closer to

µ_{Y}

than a direct survey estimate. Scenario values:

A = 0.95, W = 0.90, {α = 0.15, p}_{1} = 0.85, p_{2} = 0.15, μ_{Y} = 10, μ_{X} = 10, μ_{R}

= 10,

μ_{S} {= 0, μ}_{T} = 1, μ_{U} = 0, μ_{V} = 0, σ_{Y} = 5, σ_{R} = 5, σ_{S} = 1, σ_{T} = 1, n = 250

.

Figure 5. Privacy loss

(\nabla L)

across a range of percentages due to auxiliary information

(ϕ)

in two scenarios. Scenario values:

A = 0.95, W = 0.90, α = 0.15, p_{1} = 0.85, p_{2} = 0.15, μ_{Y} = 10, μ_{X} = 10, μ_{R}

= 10,

μ_{S}

= 0,

μ_{T} = 1, μ_{U} = 0, μ_{V} = 0, σ_{V} = 1, σ_{S} = 1, σ_{T} = 1, n = 500; standard s cenario : σ_{Y} = σ_{X} = σ_{R}

= 5,

* s cenario 2 : σ_{Y} = σ_{X} = σ_{R} = 2 .

Figure 5. Privacy loss

(\nabla L)

across a range of percentages due to auxiliary information

(ϕ)

in two scenarios. Scenario values:

A = 0.95, W = 0.90, α = 0.15, p_{1} = 0.85, p_{2} = 0.15, μ_{Y} = 10, μ_{X} = 10, μ_{R}

= 10,

μ_{S}

= 0,

μ_{T} = 1, μ_{U} = 0, μ_{V} = 0, σ_{V} = 1, σ_{S} = 1, σ_{T} = 1, n = 500; standard s cenario : σ_{Y} = σ_{X} = σ_{R}

= 5,

* s cenario 2 : σ_{Y} = σ_{X} = σ_{R} = 2 .

Figure 6. Unified measure

(δ^{a})

across a range of measurement errors [0, 2] at 3 levels of privacy loss due to auxiliary information

(ϕ)

. Scenario values:

A = 0.95, W = 0.90, {α = 0.15, p}_{1} = 0.85, p_{2} = 0.15, μ_{Y}

= 10,

μ_{X}

= 10,

μ_{R} = 10, μ_{S} = 0, μ_{T} = 1, μ_{U} = 0, μ_{V} = 0, σ_{Y} = 5, σ_{X} = 5, σ_{R} = 5, σ_{U} = σ_{V}, σ_{S} = 1, σ_{T} = 1

, n = 500.

Figure 6. Unified measure

(δ^{a})

across a range of measurement errors [0, 2] at 3 levels of privacy loss due to auxiliary information

(ϕ)

. Scenario values:

A = 0.95, W = 0.90, {α = 0.15, p}_{1} = 0.85, p_{2} = 0.15, μ_{Y}

= 10,

μ_{X}

= 10,

μ_{R} = 10, μ_{S} = 0, μ_{T} = 1, μ_{U} = 0, μ_{V} = 0, σ_{Y} = 5, σ_{X} = 5, σ_{R} = 5, σ_{U} = σ_{V}, σ_{S} = 1, σ_{T} = 1

, n = 500.

Table 1. Basic and ratio estimator performance reflecting measurement error. Scenario values:

A = 0.95, W = 0.90, {α = 0.15, p}_{1} = 0.85, p_{2} = 0.15, μ_{Y} = 10, μ_{X} = 10, μ_{R} = 10, μ_{S} = 0, μ_{T} = 1, μ_{U}

= 0,

μ_{V} = 0, σ_{Y} = 5, σ_{R} = 5, σ_{S} = 1, and σ_{T} = 1

.

Table 1. Basic and ratio estimator performance reflecting measurement error. Scenario values:

A = 0.95, W = 0.90, {α = 0.15, p}_{1} = 0.85, p_{2} = 0.15, μ_{Y} = 10, μ_{X} = 10, μ_{R} = 10, μ_{S} = 0, μ_{T} = 1, μ_{U}

= 0,

μ_{V} = 0, σ_{Y} = 5, σ_{R} = 5, σ_{S} = 1, and σ_{T} = 1

.

(A)
Parameters			Basic Estimator			Ratio Estimator
$σ_{U}$	$σ_{V}$	$ρ_{X Y}$	${\hat{µ}}_{Y (M E)}_{E}$	${M S E ({\hat{µ}}_{Y (M E)})}_{T}$	${M S E ({\hat{µ}}_{Y (M E)})}_{E}$	${\hat{µ}}_{Y R (M E)}_{T}$	${\hat{µ}}_{Y R (M E)}_{E}$	${{B i a s (\hat{µ}}_{Y R (M E)})}_{T}$	${M S E ({\hat{µ}}_{Y R (M E)})}_{T}$	${M S E ({\hat{µ}}_{Y R (M E)})}_{E}$
0.00	0.00	0.95	10.0046	0.1823	0.1823	10.0053	10.0092	0.0053	0.1373	0.1381
0.50	0.50	0.95	10.0076	0.1838	0.1803	10.0053	10.0126	0.0053	0.1393	0.1404
1.00	1.00	0.95	10.0007	0.1884	0.1881	10.0053	10.0051	0.0053	0.1454	0.1464
1.50	1.50	0.95	9.9975	0.1960	0.1944	10.0053	10.0008	0.0053	0.1555	0.1546
2.00	2.00	0.95	10.0004	0.2066	0.2027	10.0053	10.0099	0.0053	0.1696	0.1663
0.00	0.00	0.75	10.0022	0.1823	0.1800	10.0063	10.0062	0.0063	0.1573	0.1567
0.50	0.50	0.75	9.9999	0.1838	0.1825	10.0063	10.0054	0.0063	0.1593	0.1571
1.00	1.00	0.75	9.9911	0.1884	0.1889	10.0063	9.9980	0.0063	0.1654	0.1649
1.50	1.50	0.75	9.9973	0.1960	0.1984	10.0063	10.0000	0.0063	0.1755	0.1781
2.00	2.00	0.75	9.9999	0.2066	0.2036	10.0063	10.0105	0.0063	0.1896	0.1893
0.00	0.00	0.50	9.9992	0.1823	0.1798	10.0075	10.0059	0.0075	0.1823	0.1816
0.50	0.50	0.50	10.0055	0.1838	0.1825	10.0075	10.0096	0.0075	0.1843	0.1852
1.00	1.00	0.50	10.0000	0.1884	0.1862	10.0075	10.0043	0.0075	0.1904	0.1926
1.50	1.50	0.50	9.9951	0.1960	0.1968	10.0075	10.0058	0.0075	0.2005	0.2047
2.00	2.00	0.50	10.0048	0.2066	0.2050	10.0075	10.0139	0.0075	0.2146	0.2125
(B)
Parameters			Basic Estimator			Ratio Estimator
$σ_{U}$	$σ_{V}$	$ρ_{X Y}$	${\hat{µ}}_{Y (M E)}_{E}$	${M S E ({\hat{µ}}_{Y (M E)})}_{T}$	${M S E ({\hat{µ}}_{Y (M E)})}_{E}$	${\hat{µ}}_{Y R (M E)}_{T}$	${\hat{µ}}_{Y R (M E)}_{E}$	${{B i a s (\hat{µ}}_{Y R (M E)})}_{T}$	${M S E ({\hat{µ}}_{Y R (M E)})}_{T}$	${M S E ({\hat{µ}}_{Y R (M E)})}_{E}$
0.00	0.50	0.95	10.0033	0.1823	0.1828	10.0053	10.0081	0.0053	0.1378	0.1424
0.50	0.50	0.95	9.9975	0.1838	0.1825	10.0053	10.0046	0.0053	0.1393	0.1394
1.00	0.50	0.95	10.0067	0.1884	0.1882	10.0053	10.0108	0.0053	0.1439	0.1450
1.50	0.50	0.95	10.0028	0.1960	0.1993	10.0053	10.0073	0.0053	0.1515	0.1536
2.00	0.50	0.95	10.0031	0.2066	0.2005	10.0053	10.0057	0.0053	0.1621	0.1572
0.00	0.50	0.75	9.9938	0.1823	0.1811	10.0063	10.0027	0.0063	0.1578	0.1581
0.50	0.50	0.75	10.0034	0.1838	0.1838	10.0063	10.0077	0.0063	0.1593	0.1621
1.00	0.50	0.75	9.9958	0.1884	0.1858	10.0063	10.0015	0.0063	0.1639	0.1619
1.50	0.50	0.75	9.9985	0.1960	0.1982	10.0063	10.0051	0.0063	0.1715	0.1728
2.00	0.50	0.75	10.0021	0.2066	0.2052	10.0063	10.0062	0.0063	0.1821	0.1810
0.00	0.50	0.50	9.9963	0.1823	0.1799	10.0075	10.0088	0.0075	0.1828	0.1813
0.50	0.50	0.50	10.0035	0.1838	0.1854	10.0075	10.0137	0.0075	0.1843	0.1870
1.00	0.50	0.50	9.9992	0.1884	0.1865	10.0075	10.0070	0.0075	0.1889	0.1897
1.50	0.50	0.50	10.0021	0.1960	0.1984	10.0075	10.0103	0.0075	0.1965	0.2001
2.00	0.50	0.50	10.0072	0.2066	0.2064	10.0075	10.0125	0.0075	0.2071	0.2064

Table 2. Comparison between direct estimation reflecting lying bias and basic and ratio estimation reflecting measurement error. Scenario values:

A = 0.95, W = 0.90, {α = 0.15, p}_{1} = 0.85, p_{2} = 0.15, μ_{Y}

= 10,

μ_{X} = 10, μ_{R} = 10, μ_{S} = 0, μ_{T} = 1, μ_{U} = 0, μ_{V} = 0, ρ_{X Y} = 0.75, σ_{Y} = 5, σ_{R} = 5, σ_{S} = 1, and σ_{T} = 1

.

Table 2. Comparison between direct estimation reflecting lying bias and basic and ratio estimation reflecting measurement error. Scenario values:

A = 0.95, W = 0.90, {α = 0.15, p}_{1} = 0.85, p_{2} = 0.15, μ_{Y}

= 10,

μ_{X} = 10, μ_{R} = 10, μ_{S} = 0, μ_{T} = 1, μ_{U} = 0, μ_{V} = 0, ρ_{X Y} = 0.75, σ_{Y} = 5, σ_{R} = 5, σ_{S} = 1, and σ_{T} = 1

.

Parameters				Empirical Mean Estimates			$P r (\| {\hat{µ}}_{Y (M E)} - μ_{Y} \| < \| {\hat{µ}}_{Y}_{D S} - μ_{Y} \|)$		$P r (\| {\hat{µ}}_{Y R (M E)} - μ_{Y} \| < \| {\hat{µ}}_{Y}_{D S} - μ_{Y} \|)$
n	LB	$σ_{U}$	$σ_{V}$	${\hat{µ}}_{Y}_{D S}$	${\hat{µ}}_{Y (M E)}$	${\hat{µ}}_{Y R (M E)}$	Theoretical	Empirical	Empirical
50	2%	0.00	0.00	10.1910	9.9924	10.0652	0.30	0.30	0.35
50	2%	0.50	0.50	10.2143	10.0117	10.0642	0.30	0.30	0.34
50	2%	1.00	1.00	10.2049	9.9987	10.0549	0.29	0.28	0.33
50	2%	1.50	1.50	10.1896	9.9934	10.0748	0.30	0.29	0.32
50	2%	2.00	2.00	10.1855	10.0100	10.1045	0.29	0.29	0.32
50	5%	0.00	0.00	10.5000	9.9975	10.0715	0.37	0.37	0.39
50	5%	0.50	0.50	10.4987	9.9895	10.0622	0.35	0.36	0.39
50	5%	1.00	1.00	10.4987	10.0014	10.0689	0.36	0.36	0.39
50	5%	1.50	1.50	10.5131	9.9957	10.0670	0.35	0.36	0.38
50	5%	2.00	2.00	10.5009	9.9937	10.0686	0.35	0.35	0.36
50	10%	0.00	0.00	10.9945	9.9895	10.0640	0.53	0.53	0.54
50	10%	0.50	0.50	10.9951	9.9918	10.0686	0.53	0.53	0.55
50	10%	1.00	1.00	10.9976	9.9995	10.0676	0.52	0.52	0.53
50	10%	1.50	1.50	10.9999	9.9990	10.0737	0.51	0.52	0.52
50	10%	2.00	2.00	11.0059	10.0004	10.0808	0.50	0.51	0.51
250	2%	0.00	0.00	10.2012	9.9947	10.0043	0.34	0.35	0.39
250	2%	0.50	0.50	10.2025	10.0001	10.0109	0.34	0.34	0.38
250	2%	1.00	1.00	10.2062	10.0019	10.0088	0.34	0.34	0.38
250	2%	1.50	1.50	10.2023	10.0015	10.0185	0.34	0.34	0.37
250	2%	2.00	2.00	10.2031	9.9878	10.0055	0.33	0.33	0.35
250	5%	0.00	0.00	10.4958	9.9914	10.0082	0.57	0.57	0.58
250	5%	0.50	0.50	10.4984	9.9974	10.0134	0.57	0.56	0.57
250	5%	1.00	1.00	10.5026	9.9946	10.0051	0.57	0.57	0.58
250	5%	1.50	1.50	10.4927	9.9990	10.0162	0.56	0.56	0.57
250	5%	2.00	2.00	10.5008	10.0006	10.0151	0.54	0.54	0.55
250	10%	0.00	0.00	10.9928	9.9916	10.0106	0.86	0.86	0.88
250	10%	0.50	0.50	11.0043	10.0079	10.0162	0.86	0.86	0.88
250	10%	1.00	1.00	11.0062	10.0056	10.0156	0.86	0.86	0.87
250	10%	1.50	1.50	11.0018	9.9983	10.0108	0.85	0.85	0.86
250	10%	2.00	2.00	11.0006	9.9933	10.0097	0.84	0.84	0.84
500	2%	0.00	0.00	10.1990	10.0032	10.0085	0.40	0.40	0.43
500	2%	0.50	0.50	10.2021	10.0015	10.0062	0.40	0.40	0.43
500	2%	1.00	1.00	10.1988	9.9966	10.0065	0.40	0.40	0.43
500	2%	1.50	1.50	10.2064	10.0074	10.0106	0.39	0.39	0.42
500	2%	2.00	2.00	10.1954	9.9916	10.0011	0.38	0.38	0.40
500	5%	0.00	0.00	10.4988	10.0038	10.0126	0.72	0.72	0.73
500	5%	0.50	0.50	10.5008	10.0013	10.0058	0.72	0.73	0.74
500	5%	1.00	1.00	10.4998	10.0021	10.0084	0.72	0.71	0.72
500	5%	1.50	1.50	10.5024	9.9967	10.0044	0.71	0.71	0.71
500	5%	2.00	2.00	10.5018	10.0056	10.0122	0.69	0.70	0.70
500	10%	0.00	0.00	10.9976	9.9916	10.0014	0.95	0.95	0.97
500	10%	0.50	0.50	10.9961	9.9986	10.0085	0.95	0.95	0.97
500	10%	1.00	1.00	11.0034	10.0064	10.0121	0.95	0.96	0.97
500	10%	1.50	1.50	10.9980	9.9944	10.0044	0.95	0.95	0.96
500	10%	2.00	2.00	10.9973	9.9982	10.0083	0.95	0.95	0.96

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Parker, M.; Gupta, S.; Khalil, S. Impact of Auxiliary Information and Measurement Errors on Mean Estimation with Mixture Optional Enhanced Trust (MOET) Randomized Response Model. Axioms 2025, 14, 183. https://doi.org/10.3390/axioms14030183

AMA Style

Parker M, Gupta S, Khalil S. Impact of Auxiliary Information and Measurement Errors on Mean Estimation with Mixture Optional Enhanced Trust (MOET) Randomized Response Model. Axioms. 2025; 14(3):183. https://doi.org/10.3390/axioms14030183

Chicago/Turabian Style

Parker, Michael, Sat Gupta, and Sadia Khalil. 2025. "Impact of Auxiliary Information and Measurement Errors on Mean Estimation with Mixture Optional Enhanced Trust (MOET) Randomized Response Model" Axioms 14, no. 3: 183. https://doi.org/10.3390/axioms14030183

APA Style

Parker, M., Gupta, S., & Khalil, S. (2025). Impact of Auxiliary Information and Measurement Errors on Mean Estimation with Mixture Optional Enhanced Trust (MOET) Randomized Response Model. Axioms, 14(3), 183. https://doi.org/10.3390/axioms14030183

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Impact of Auxiliary Information and Measurement Errors on Mean Estimation with Mixture Optional Enhanced Trust (MOET) Randomized Response Model

Abstract

1. Introduction

2. MOET Model (2024)

MOET Parker Model Review [17]

3. Reflecting Measurement Error in Our MOET Basic and Ratio Estimators

3.1. Development of Basic Estimator Accounting for Measurement Error

3.2. Development of Ratio Estimator Reflecting Measurement Error

4. Impacts of Measurement Error

4.1. Impact of Measurement Error on Estimator Efficiency

4.2. Measurement Error Impact on RRT Objective: Inducing Truthfulness

5. Measurement Error Accentuates Privacy

6. Simulations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Derivation of Equation (18): $E [{\bar{Z}}_{i}^{'} {\bar{Z}}_{i}^{'}] = \frac{1}{n_{i}} (σ_{Z i}^{2} + σ_{U}^{2})$

Appendix A.2. Derivation of Equation (19): $E [{\bar{X}}_{i}^{'} {\bar{X}}_{i}^{'}] = \frac{1}{n_{i}} (σ_{X}^{2} + σ_{V}^{2})$

Appendix A.3. Derivation of Equation (20): $E [{\bar{X}}_{i}^{'} {\bar{Z}}_{i}^{'}] = \frac{1}{n_{i}} [σ_{Y X} - W (1 - α) (1 - p_{i}) (σ_{Y X} + μ_{R} μ_{X})]$

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Impact of Auxiliary Information and Measurement Errors on Mean Estimation with Mixture Optional Enhanced Trust (MOET) Randomized Response Model

Abstract

1. Introduction

2. MOET Model (2024)

MOET Parker Model Review [17]

3. Reflecting Measurement Error in Our MOET Basic and Ratio Estimators

3.1. Development of Basic Estimator Accounting for Measurement Error

3.2. Development of Ratio Estimator Reflecting Measurement Error

4. Impacts of Measurement Error

4.1. Impact of Measurement Error on Estimator Efficiency

4.2. Measurement Error Impact on RRT Objective: Inducing Truthfulness

5. Measurement Error Accentuates Privacy

6. Simulations

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Derivation of Equation (18): E Z ¯ i ′ Z ¯ i ′ = 1 n i σ Z i 2 + σ U 2

Appendix A.2. Derivation of Equation (19): E X ¯ i ′ X ¯ i ′ = 1 n i σ X 2 + σ V 2

Appendix A.3. Derivation of Equation (20): E X ¯ i ′ Z ¯ i ′ = 1 n i σ Y X − W 1 − α 1 − p i σ Y X + μ R μ X

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix A.1. Derivation of Equation (18): $E [{\bar{Z}}_{i}^{'} {\bar{Z}}_{i}^{'}] = \frac{1}{n_{i}} (σ_{Z i}^{2} + σ_{U}^{2})$

Appendix A.2. Derivation of Equation (19): $E [{\bar{X}}_{i}^{'} {\bar{X}}_{i}^{'}] = \frac{1}{n_{i}} (σ_{X}^{2} + σ_{V}^{2})$

Appendix A.3. Derivation of Equation (20): $E [{\bar{X}}_{i}^{'} {\bar{Z}}_{i}^{'}] = \frac{1}{n_{i}} [σ_{Y X} - W (1 - α) (1 - p_{i}) (σ_{Y X} + μ_{R} μ_{X})]$