A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution

Lee, Gi-Sung; Hong, Ki-Hak; Son, Chang-Kyoon

doi:10.3390/math12020196

Open AccessArticle

A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution

by

Gi-Sung Lee

¹,

Ki-Hak Hong

² and

Chang-Kyoon Son

^3,*

¹

Department of Children Welfare, Woosuk University, Wanju 55338, Republic of Korea

²

Department of Computer Science, Dongshin University, Naju 58245, Republic of Korea

³

Department of Applied Statistics, Dongguk University, Gyeongju 38066, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(2), 196; https://doi.org/10.3390/math12020196

Submission received: 29 November 2023 / Revised: 23 December 2023 / Accepted: 5 January 2024 / Published: 7 January 2024

(This article belongs to the Special Issue Uncertainty Quantification: Latest Advances and Applications)

Download

Browse Figure

Versions Notes

Abstract

In this paper, we suggest using a partial randomized response model using Poisson distribution to efficiently estimate a rare sensitive attribute by applying the probability proportional to size (PPS) sampling method when the population is composed of several different and sensitive clusters. We have obtained estimators for a rare and sensitive attribute and their variances and variance estimates by applying PPS sampling and two-stage equal probability sampling. We compare the efficiency between the estimators of the rare sensitive attribute, one obtained via PPS sampling with replacement and the other obtained using the two-stage equal probability sampling with replacement. As a result, it is confirmed that the estimate obtained via the PPS sampling with replacement is more efficient than the estimate provided by the two-stage equal probability sampling with replacement when the cluster sizes are different.

Keywords:

Poisson distribution; partial randomized response model; rare sensitive attribute; cluster sampling; probability proportional to size (PPS) sampling

MSC:

62D05

1. Introduction

In a socially and personally very sensitive survey, if you directly ask a question to the respondents, they tend to refuse to answer or give a false answer. To solve this problem, ref. [1] proposed a randomized response model (RRM) that could obtain sensitive information while protecting the identity or confidentiality of the respondent through an indirect response using a randomization device. Since then, many researchers have suggested various randomized response models to improve the quality of estimation.

Subsequently, refs. [2,3,4] organized, summarized and systematized the randomized response models, ref. [5] applied two-stage cluster sampling to a randomized response model, and ref. [6] researched improving the practicality of randomized response model by suggesting a randomized response model using PPS sampling. Meanwhile, the authors of [7] suggested a unrelated question randomized response method to estimate the mean number of participants with a rare sensitive attribute using Poisson distribution. Examples of rare sensitive attributes include the proportion of people with AIDS who have persistent relationships with strangers, the proportion of people who witnessed murders, and the number of girls raped by their own fathers, etc. and examples of rare unrelated attributes include the proportion of people born correctly at 12 o’clock, the proportion of babies born blind, and the proportion of triplets delivered by women [8,9] suggested a stratified two-stage randomized response models for estimating a rare sensitive attribute under Poisson distribution.

Furthermore, ref. [10] proposed a partial randomized response model using Poisson distribution, providing an alternative approach to estimating rare sensitive attributes through simple random estimation and stratified estimation. Their model demonstrated higher efficiency compared to Suman and Singh’s model. However, this research also faces limitations when applied to actual surveys if the population is clustered. Therefore, when the population is clustered, it is expected that applying Narjis and Shabbir’s model, which is more efficient than Suman and Singh’s model, could offer a practical solution for estimating rare sensitive attributes in real surveys.

In this study, we proposed a method for estimating rare sensitive attributes when the survey question is highly sensitive, and the population is composed of clusters with varying sizes. We applied the probability proportional to the size sampling method, which assigns sampling probabilities in proportion to the size of the clusters, to the partial randomized response model of [10]. In Section 2, we first introduced the partial randomized response model and proposed estimation methods using Probability Proportional to Size (PPS) with replacement, PPS without replacement, and two-stage equal probability sampling. In Section 3, we compared the efficiency of the estimation methods, and finally, in Section 4, we presented conclusions and implications of the study.

2. PPS Estimation for a Rare Sensitive Attribute by Partial Randomized Response Model

In Section 2, when the survey questions are very sensitive and the population is composed of N clusters that each contains

M_{i} (i = 1, 2, \dots, N)

sub-units, a two-stage selection method is used, in which n clusters are selected with PPS or with equal probability from the population, and then

m_{i} (i = 1, 2, \dots, n)

survey units are selected through simple random sampling in each selected cluster, which is applied to the partial randomized response model using the Poisson distribution proposed by [10] to deal with the method of estimating a rare sensitive attribute.

In Section 2.1, we reviewed Narjis and Shabbir’s Partial randomized response model and then we considered the sampling method for the clusters via PPS sampling with replacements in Section 2.2. Clusters by PPS sampling without replacement are considered in Section 2.3, and clusters by equal probability sampling are examined in Section 2.4.

2.1. Narjis, Shabbir’s Partial Randomized Response Model

In the partial randomized response model, a sample of size n is selected via simple random sampling with replacement from the population. An individual is selected from the sample using two randomization devices

(R_{1}, R_{2})

and is requested to report his/her response as per following outcomes of the devices.

The first-stage randomization device

R_{1}

consists of the following statements:

(1): I have the sensitive attribute A with probability T.
(2): Go to the randomization device $R_{2}$ with probability T.

The second-stage randomization device

R_{2}

consists of the following statements:

(1): I have the sensitive attribute A.
(2): Forced to say No.
(3): Draw one more card.

With probabilities

P_{1}

,

P_{2}

and

P_{3}

respectively,

\sum_{i = 1}^{3} P_{i} = 1

.

If the statement (3) appears on the card of the respondent, then it is necessary to carry out the process without replacing the card. In the second draw, if statement (3) reappears, then the respondent is suggested to report his/her actual status. The respondent should answer the question with s “Yes” (or “No”), if his/her actual status matches (un-matches) with the statement on the card.

The probability of getting a “Yes” from the respondent is given by:

l_{0} = T π + (1 - T) [P_{1} π (1 + P_{3} \frac{k}{k - 1}) + P_{3}^{2} \frac{k}{k - 1} π]

(1)

where k is the total number of cards in the randomization device

R_{2}

.

As before, assuming that

n \to \infty

and

θ_{0} \to 0

, then

n θ_{0} = λ_{0}

(finite). Equation (1) can be rewritten as

λ_{0} = T λ + (1 - T) [P_{1} λ (1 + P_{3} \frac{k}{k - 1}) + P_{3}^{2} \frac{k}{k - 1} λ]

(2)

Let

y_{1}, y_{2}, \dots, y_{n}

be a random sample of n observations from the Poisson distribution with parameter

λ_{0}

.

The maximum-likelihood estimator of

λ_{0}

is given by:

{\hat{λ}}_{p} = \frac{\frac{1}{n} \sum_{j = 1}^{n} y_{i}}{T + (1 - T) [P_{1} + P_{3} (\frac{k}{k - 1}) (P_{1} + P_{3})]}

(3)

The variance of the estimator

{\hat{λ}}_{p}

is given by:

V ({\hat{λ}}_{p}) = \frac{λ}{n [T + (1 - T) \{P_{1} + P_{3} (\frac{k}{k - 1}) (P_{1} + P_{3})\}]}

(4)

2.2. Estimation by PPS When PSUs Are Selected with Replacement

Suppose n primary sampling units (PSUs) of size

M_{i} (i = 1, 2, \dots, n)

have been selected from the population of N clusters with selection probability

φ_{i}

with replacement and the secondary sampling units (SSUs) of

m_{i} (i = 1, 2, \dots, n)

size are selected from each chosen primary unit using SRSWR. We apply the two-stage sampling procedure to Narjis and Shabbir’s partial randomized response model to estimate a rare sensitive attribute. Each person selected via the two-stage sampling procedure is requested to answer “Yes” or “No” using Narjis and Shabbir’s randomization device such as Table 1 and Table 2 for each First and Second randomization device in ith cluster.

If Question 3 in randomization device

R_{2 i}

appears on the card of the respondent, then it is necessary to select a card repeatedly in

R_{2 i}

without replacing the card. In the second draw, if Question 3 reappears, then the respondent is suggested to report his/her “Yes” or “No”, according to his/her true response to the sensitive question.

From First and Second randomization devices,

T_{i}

is the selection probability of a rare sensitive question in randomization device

R_{1 i}

for the ith cluster,

π_{i}

is the population proportion of a rare sensitive attribute for the ith cluster, and

P_{i 1}

is the selection probability of a rare sensitive question in randomization device

R_{2 i}

for the ith cluster. And

P_{i 2}

is the selection probability of the forced answer “No” in randomization device

R_{2 i}

,

P_{i 3}

is the selection probability of the statement “Draw one more cards” in randomization device

R_{2 i}

for the ith cluster, and

k_{i}

is the number of cards in the card deck of randomization device

R_{2 i}

for the ith cluster.

The probability of answering “Yes” from the respondent in cluster i is given by

l_{i 0} = T_{i} π_{i} + (1 - T_{i}) [P_{i 1} π_{i} (1 + P_{i 3} \frac{k_{i}}{k_{i} - 1}) + P_{i 3}^{2} \frac{k_{i}}{k_{i} - 1} π_{i}]

(5)

To clarify the response process, we presented a flow chart for the probability of answering “Yes” for ith cluster in Figure 1.

Since the attribute

A_{i}

in cluster i is very rare in the population, if we assume

m_{i} \to \infty

and

l_{i 0} \to 0

, then

m_{i} l_{i 0} = λ_{i 0}

(finite).

Let

y_{i 1}, y_{i 2}, \dots, y_{i m_{i}}

be a random sample of

m_{i}

observations from the Poisson distribution with parameter

λ_{i 0}

in cluster i, then the estimator

{\hat{λ}}_{i}

of

λ_{i}

, the parameter of a rare sensitive attribute of cluster i, is given by

{\hat{λ}}_{i} = \frac{\frac{1}{m_{i}} \sum_{j = 1}^{m_{i}} y_{i j}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]}

(6)

When respondents are selected via simple random sampling with replacement from the ith cluster, which was selected with replacement using sampling probability

φ_{i}

for the estimator

{\hat{λ}}_{p p z w r}

of

λ

, the parameter of a rare sensitive attribute is given by:

{\hat{λ}}_{p p z w r} = \frac{1}{n M_{0}} \sum_{i = 1}^{n} \frac{M_{i} {\hat{λ}}_{i}}{φ_{i}}

(7)

where

M_{0} = \sum_{i = 1}^{N} M_{i}

.

Theorem 1.

The estimator

{\hat{λ}}_{p p z w r}

is an unbiased estimator of the parameter λ.

Proof.

Since

y_{i j} \sim i i d P o (λ_{i 0})

for each cluster and

λ_{i 0} = T_{i} λ_{i} + (1 - T_{i}) [P_{i 1} λ_{i} (1 + P_{i 3} \frac{k_{i}}{k_{i} - 1}) + P_{i 3}^{2} \frac{k_{i}}{k_{i} - 1} λ_{i}] .

We have

\begin{matrix} E_{1} E_{2} ({\hat{λ}}_{p p z w r}) & = E_{1} E_{2} [\frac{1}{n M_{0}} \sum_{i = 1}^{n} \frac{M_{i} {\hat{λ}}_{i}}{φ_{i}}] \\ = E_{1} [\frac{1}{n M_{0}} \sum_{i = 1}^{n} \frac{M_{i} E_{2} ({\hat{λ}}_{i})}{φ_{i}}], \end{matrix}

where

\begin{matrix} E_{2} ({\hat{λ}}_{i}) & = E_{2} [\frac{\frac{1}{m_{i}} \sum_{j = 1}^{m_{i}} y_{i j}}{T_{i} + (1 - T_{i}) (P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3}))}] \\ = \frac{λ_{i 0}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]} \\ = λ_{i}, \end{matrix}

we can obtain

\begin{matrix} E_{1} E_{2} ({\hat{λ}}_{p p z w r}) & = E_{1} [\frac{1}{n M_{0}} \sum_{i = 1}^{n} \frac{M_{i} λ_{i}}{φ_{i}}] \\ = \frac{1}{n M_{0}} \sum_{i = 1}^{N} φ_{i} \frac{M_{i} λ_{i}}{φ_{i}} \\ = λ . \end{matrix}

□

Theorem 2.

The variance of

{\hat{λ}}_{p p z w r}

is given by

\begin{matrix} V ({\hat{λ}}_{p p z w r}) & = \frac{1}{n M_{0}^{2}} \sum_{i = 1}^{N} {φ_{i} [\frac{M_{i} λ_{i}}{φ_{i}} - M_{0} λ]}^{2} \\ + \frac{1}{n M_{0}^{2}} \sum_{i = 1}^{N} \frac{M_{i}^{2}}{m_{i} φ_{i}} \frac{λ_{i}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]} \end{matrix}

(8)

Proof.

By [11], we have

V ({\hat{λ}}_{p p z w r}) = V_{1} E_{2} ({\hat{λ}}_{p p z w r}) + E_{1} V_{2} ({\hat{λ}}_{p p z w r}),

where

\begin{matrix} V_{1} E_{2} ({\hat{λ}}_{p p z w r}) & = V_{1} E_{2} [\frac{1}{n M_{0}} \sum_{i = 1}^{n} \frac{M_{i} {\hat{λ}}_{i}}{φ_{i}}] \\ = V_{1} [\frac{1}{n M_{0}} \sum_{i = 1}^{n} \frac{M_{i} λ_{i}}{φ_{i}}] \\ = \frac{1}{n M_{0}^{2}} \sum_{i = 1}^{N} φ_{i} {[\frac{M_{i} λ_{i}}{φ_{i}} - M_{0} λ]}^{2} \end{matrix}

and

\begin{matrix} E_{1} V_{2} ({\hat{λ}}_{p p z w r}) & = E_{1} V_{2} [\frac{1}{n M_{0}} \sum_{i = 1}^{n} \frac{M_{i} {\hat{λ}}_{i}}{φ_{i}}] \\ = E_{1} [\frac{1}{{(n M_{0})}^{2}} \sum_{i = 1}^{n} \frac{M_{i}^{2}}{φ_{i}^{2}} V_{2} (\frac{\frac{1}{m_{i}} \sum_{j = 1}^{m_{i}} y_{i j}}{T_{i} + (1 - T_{i}) \{P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})\}})] \\ = E_{1} [\frac{1}{{(n M_{0})}^{2}} \sum_{i = 1}^{n} \frac{M_{i}^{2}}{φ_{i}^{2}} \frac{\frac{1}{m_{i}^{2}} \sum_{j = 1}^{m_{i}} V_{2} (y_{i j})}{{\{T_{i} + (1 - T_{i}) (P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3}))\}}^{2}}] . \end{matrix}

Because

y_{i j} \sim i i d P o (λ_{i 0})

, we have

\begin{matrix} E_{1} V_{2} ({\hat{λ}}_{p p z w r}) & = E_{1} [\frac{1}{{(n M_{0})}^{2}} \sum_{i = 1}^{n} \frac{M_{i}^{2}}{φ_{i}^{2}} \frac{\frac{1}{m_{i}^{2}} \sum_{j = 1}^{m_{i}} λ_{i 0}}{{\{T_{i} + (1 - T_{i}) (P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3}))\}}^{2}}] \\ = E_{1} [\frac{1}{{(n M_{0})}^{2}} \sum_{i = 1}^{n} \frac{M_{i}^{2}}{φ_{i}^{2} m_{i}} \frac{λ_{i 0}}{{\{T_{i} + (1 - T_{i}) (P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3}))\}}^{2}}] \\ = E_{1} [\frac{1}{{(n M_{0})}^{2}} \sum_{i = 1}^{n} \frac{M_{i}^{2}}{φ_{i}^{2} m_{i}} \frac{λ_{i}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]}] \\ = \frac{1}{n M_{0}^{2}} \sum_{i = 1}^{N} \frac{M_{i}^{2}}{φ_{i}} \frac{1}{m_{i}} \frac{λ_{i}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]} . \end{matrix}

Thus, we determine the variance of

{\hat{λ}}_{p p z w r}

as shown in (8). □

Also, the estimator of

V ({\hat{λ}}_{p p z w r})

is given by

\hat{V} ({\hat{λ}}_{p p z w r}) = \frac{1}{n (n - 1) M_{0}^{2}} \sum_{i = 1}^{n} {(\frac{M_{i} {\hat{λ}}_{i}}{φ_{i}} - {\hat{λ}}_{p p z w r})}^{2} .

(9)

On the other hand, when the sampling probabilities of n PSUs are proportional to each cluster size

M_{i}

, then

φ_{i} = M_{i} / M_{0}

, which is called PPS sampling. When a sample of n PSUs are selected via PPS sampling with replacement and

m_{i}

SSUs are selected using simple random sampling with replacement from each PSU, the estimator

{\hat{λ}}_{p p z w r}

of

λ

is as follows

{\hat{λ}}_{p p s w r} = \frac{1}{n} \sum_{i = 1}^{n} {\hat{λ}}_{i} .

(10)

And the variance of

{\hat{λ}}_{p p s w r}

and its estimator are, respectively,

\begin{matrix} V ({\hat{λ}}_{p p s w r}) & = \frac{1}{n M_{0}} \sum_{i = 1}^{N} M_{i} {(λ_{i} - λ)}^{2} \\ + \frac{1}{n M_{0}} \sum_{i = 1}^{N} \frac{M_{i}}{m_{i}} \frac{λ_{i}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]}, \end{matrix}

(11)

and

\hat{V} ({\hat{λ}}_{p p s w r}) = \frac{1}{n (n - 1)} \sum_{i = 1}^{n} {({\hat{λ}}_{i} - \frac{{\hat{λ}}_{p p s w r}}{M_{0}})}^{2} .

(12)

2.3. Estimation by PPS When PSUs Are Selected without Replacement

Suppose n PSUs of size

M_{i} (i = 1, 2, \dots, n)

have been selected from the population of N clusters with selection probability

ϕ_{i}

without replacement and the SSUs of size

m_{i}

are selected from each chosen primary unit via SRSWR. We apply the two-stage sampling procedure to Narjis and Shabbir’s RRT to estimate a rare sensitive attribute.

The estimator

{\hat{λ}}_{p p s w o r}

of

λ

, the parameter of a rare sensitive attribute obtained using the above sampling procedure is given by

{\hat{λ}}_{p p s w o r} = \frac{1}{M_{0}} \sum_{i = 1}^{n} \frac{M_{i} {\hat{λ}}_{i}}{ϕ_{i}} .

(13)

where

ϕ_{i}

is the inclusion probability of survey unit i.

And the variance of

{\hat{λ}}_{p p s w o r}

is given by:

\begin{matrix} V ({\hat{λ}}_{p p s w o r}) & = \frac{1}{M_{0}^{2}} \sum_{i = 1}^{N} \sum_{j > i}^{N} (ϕ_{i} ϕ_{j} - ϕ_{i j}) {[\frac{M_{i} λ_{i}}{ϕ_{i}} - \frac{M_{j} λ_{j}}{ϕ_{j}}]}^{2} \\ + \frac{1}{M_{0}^{2}} \sum_{i = 1}^{N} \frac{M_{i}^{2}}{m_{i} ϕ_{i}} \frac{λ_{i}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]}, \end{matrix}

(14)

where

ϕ_{i j}

is the joint inclusion probability of survey units i and j.

Also, the estimator of

V ({\hat{λ}}_{p p s w o r})

is given by

\begin{matrix} \hat{V} ({\hat{λ}}_{p p s w o r}) & = \frac{1}{M_{0}^{2}} \sum_{i = 1}^{n} \sum_{j > i}^{n} (\frac{ϕ_{i} ϕ_{j} - ϕ_{i j}}{ϕ_{i j}}) {(\frac{M_{i} {\hat{λ}}_{i}}{ϕ_{i}} - \frac{M_{j} {\hat{λ}}_{j}}{ϕ_{j}})}^{2} \\ + \frac{1}{M_{0}^{2}} \sum_{i = 1}^{n} \frac{M_{0}^{2}}{ϕ_{i} (m_{i} - 1)} \frac{{\hat{λ}}_{i}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]} \end{matrix}

(15)

2.4. Estimation via Two-Stage Equal Probability Sampling

Suppose n PSUs of size

M_{i} (i = 1, 2, \dots, n)

have been selected from the population of N clusters by SRSWR and the SSUs of size

m_{i}

are selected again from each chosen PSU via SRSWR. We consider the two-stage equal probability sampling procedure for Narjis and Shabbir’s RRT for estimating a rare sensitive attribute. The estimator

{\hat{λ}}_{w r}

of

λ

, the parameter of a rare sensitive attribute, obtained using the above procedure is given by

{\hat{λ}}_{w r} = \frac{1}{n \bar{M}} \sum_{i = 1}^{n} M_{i} {\hat{λ}}_{i},

(16)

where

\bar{M} = M_{0} / N

.

\begin{matrix} V ({\hat{λ}}_{w r}) & = \frac{1}{n {\bar{M}}^{2}} \frac{1}{(N - 1)} \sum_{i = 1}^{N} {(M_{i} λ_{i} - \bar{M} λ)}^{2} \\ + \frac{1}{n {\bar{M}}^{2}} \sum_{i = 1}^{N} \frac{M_{i}^{2}}{m_{i}} \frac{λ_{i}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]}, \end{matrix}

(17)

and

\hat{V} ({\hat{λ}}_{w r}) = \frac{1}{n (n - 1)} \sum_{i = 1}^{n} {(N M_{i} {\hat{λ}}_{i} - {\hat{λ}}_{w r})}^{2},

(18)

where

\bar{M} = M_{0} / N

.

3. Efficiency Comparisons for the PPS vs. Equal Probability Sampling

Narjis and Shabbir’s RRT model was developed under the assumption of simple random sampling and stratified random sampling, and the efficiency thereof was compared with that of the estimators [9]. Therefore, it is reasonable to compare the existing estimator with the estimator proposed in this paper using Narjis and Shabbir’s model. However, in the case of cluster sampling, the increase in variance compared to that obtained using simple random sampling or stratified sampling has already been dealt with in the typical sampling textbooks, so in this paper, as described above, when the population consists of N clusters, we consider the case the PPS with replacement estimator and two-stage equal probability estimator.

Now, the difference between the variance (17) of two-stage equal probability sampling and the variance (11) of PPS with replacement sampling is given as follows under

N - 1 ≒ N

\begin{matrix} V ({\hat{λ}}_{w r}) - V ({\hat{λ}}_{p p s w r}) & = \frac{1}{n N {\bar{M}}^{2}} [\sum_{i = 1}^{N} {(M_{i} - \bar{M})}^{2} λ_{i}^{2} + \bar{M} \{\sum_{i = 1}^{N} (M_{i} - \bar{M}) (λ_{i}^{2} - λ^{2})\} \\ + \sum_{i = 1}^{N} \frac{{(M_{i} - \bar{M})}^{2}}{m_{i}} \frac{λ_{i}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]} \\ + \bar{M} \sum_{i = 1}^{N} \frac{(M_{i} - \bar{M})}{m_{i}} \frac{λ_{i}}{T_{i} + (1 - T_{i}) [P_{i 1} + P_{i 3} (\frac{k_{i}}{k_{i} - 1}) (P_{i 1} + P_{i 3})]}] . \end{matrix}

(19)

In (19), if

M_{i} = \bar{M} = M_{0} / N

then

V ({\hat{λ}}_{w r}) = V ({\hat{λ}}_{p p s w r})

. In other words, if the cluster sizes are equal, the selection probability of PPS sampling with replacement becomes

1 / N

and is equal to that of two-stage equal probability sampling with replacement. Hence, they have the same efficiency.

If each cluster size

M_{i}

is unequal, the values

\sum_{i = 1}^{N} {(M_{i} - \bar{M})}^{2} λ_{i}^{2}

of first term of the right-hand side in (19) are much increased, and the values

\sum_{i = 1}^{N} (M_{i} - \bar{M}) (λ_{i}^{2} - λ^{2})

of the second term of the right-hand side in (19) have relatively small ones. Hence, the estimation using PPS sampling with replacement is more efficient than that of two-stage equal probability sampling with replacement.

We tabulate to summarize the relationship for each estimator in a cluster sampling design as follows.

Now, we compare the efficiency by calculating relative efficiencies (RE) between different sampling methods, such as simple random sampling with replacement (:ppzwr), PPS sampling with replacement (:ppswr) and two-stage equal probability sampling with replacement (:wr) according to varying parameter combinations by numerical example.

R E_{1} = \frac{V ({\hat{λ}}_{w r})}{V ({\hat{λ}}_{p p z w r})}, R E_{2} = \frac{V ({\hat{λ}}_{p p z w r})}{V ({\hat{λ}}_{p p s w r})}, R E_{3} = \frac{V ({\hat{λ}}_{w r})}{V ({\hat{λ}}_{p p s w r})} .

(20)

The values of

R E_{1}

greater than one means that unequal probability sampling with replacement (:ppzwr) is more efficient than two-stage equal probability sampling with replacement (:wr),

R E_{2}

greater than one means that PPS sampling with replacement (:ppswr) is more efficient than unequal probability sampling with replacement(:ppzwr), and

R E_{3}

greater than one means that PPS sampling with replacement (:ppswr) is more efficient than two-stage equal probability sampling with replacement(:wr).

In calculating REs, we set parameters for ith cluster

(i = 1, 2, 3, 4)

as follows.

$M_{0} =$ 10,000; $M_{1} = 1000$ ; $M_{2} = 2000$ ; $M_{3} = 3000$ ; $M_{4} = 4000$ ,
$m_{0} = 1000$ ; $m_{1} = 100$ ; $m_{2} = 200$ ; $m_{3} = 300$ ; $m_{4} = 400$ ,
$λ = 1.25, 1.5, 2.0, 2.25$ ;
$λ_{1} = 0.5$ , $λ_{2} = 1.0$ , $λ_{3} = 1.5$ , $λ_{4} = 2.0$ ;
$k_{1} = k_{2} = k_{3} = k_{4} = 15, 75$ ;
$P_{i 1}$ , $P_{i 2} = \frac{1 - P_{i 1}}{3}$ , $P_{i 3} = 1 - P_{i 1} - P_{i 2}$ .

We also assume the selection probabilities for ith cluster as follows.

$T_{1} = T_{2} = T_{3} = T_{4}$ ;
$P_{11} = P_{12} = P_{13} = P_{21} = P_{22} = P_{23} = P_{31} = P_{32} = P_{33} = P_{41} = P_{42} = P_{43}$ ,

varying from 0.2 to 0.8 by 0.2.

In order to compare the efficiency of the proposed estimators from numerical examples, we summarized the relative efficiencies according to various parameter values with their mean values.

From Table 3, it can be seen that for all the parametric combinations, the mean values of

R E_{1}

are greater than one, which indicates that the unequal probability sampling with replacement estimator

{\hat{λ}}_{p p z w r}

is more efficient than the two-stage estimator,

{\hat{λ}}_{w r}

, as the sensitive attribute value

λ

decreases, and in contrast, if sensitive attribute

λ

increases, then the efficiency of

{\hat{λ}}_{p p z w r}

decreases. In addition, the variation in

R E_{1}

with respect to

k_{i}

indicates that the

R E_{1}

increases as the values of selection probability

T_{i}

increase.

As shown in Table 4, the probability proportional to size estimator,

{\hat{λ}}_{p p s w r}

, is more efficient than the unequal probability sampling with replacement estimator,

{\hat{λ}}_{p p z w r}

. As the sensitive attribute value

λ

increases, and in contrast, as

λ

decreases, the probability proportional estimator decreases in efficiency.

As shown in Table 5, the probability proportional to size estimator,

{\hat{λ}}_{p p s w r}

, is more efficient than the two-stage sampling with replacement estimator,

{\hat{λ}}_{w r}

. As the sensitive attribute value

λ

decreases, and in contrast, as

λ

decreases, the probability proportional estimator decreases in efficiency.

In summary, an examination of the efficiency of a partial randomized response model for rare sensitive attributes based on a cluster sampling design with numerical examples shows the following trends:

(1): Between $p p z w r$ and $w r$ , efficiency decreases as a rare sensitive attribute $λ$ increases (refer to Table 4).
(2): Between $p p s w r$ and $p p z w r$ , efficiency increases as $λ$ increases, and efficiency is relatively low at specific values of $λ$ (refer to Table 5).
(3): Between $p p s w r$ and $w r$ , efficiency increases as $λ$ decreases, similar to the relation between $p p s w r$ and $p p z w r$ , where efficiency sharply increases at specific values of $λ$ (refer to Table 6).
(4): The number of cards $k_{i}$ does not significantly impact efficiency.

4. Conclusions

In this paper, when the population is composed of several different and sensitive clusters, we suggest a randomized method for efficiently estimating a rare sensitive attribute by applying the PPS sampling method to the partial randomized response model of [10]. And by applying PPS sampling and two-stage equal probability sampling, estimators for a rare and sensitive attribute and its variance and variance estimates are obtained. We compare the efficiency between the estimators of the rare sensitive attribute, one obtained using the PPS with replacement sampling method and the other obtained using the two-stage equal probability sampling with replacement method when the cluster sizes are different. As a result, it was confirmed that the estimation obtained using the PPS sampling with replacement is more efficient than the estimation obtained based on the two-stage equal probability sampling with replacement when the cluster sizes are different from each other.

Author Contributions

Conceptualization, G.-S.L.; methodology, C.-K.S.; writing—original draft preparation, K.-H.H.; writing—review and editing, C.-K.S.; project administration and funding acquisition, G.-S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by Woosuk University.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We would like to thank the anonymous reviewers for their very careful reading and valuable comments/suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Warner, S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef] [PubMed]
Fox, J.A.; Tracy, P.E. Randomized Response: A Method for Sensitive Survey; Sage Publications: Newbury Park, CA, USA, 1986. [Google Scholar]
Chaudhuri, A.; Mukerjee, R. Randomized Response: Theory and Techniques; Marcel Dekker, Inc.: New York, NY, USA, 1988. [Google Scholar]
Ryu, J.B.; Hong, K.H.; Lee, G.S. Randomized Response Model; Freedom Academy: Seoul, Republic of Korea, 1993. [Google Scholar]
Lee, G.S.; Hong, K.H. Randomized response model by two-stage cluster sampling. Korean Commun. Stat. 1998, 5, 99–105. [Google Scholar]
Lee, G.S. A Study on the Randomized Response Technique by PPS Sampling. Korean J. Appl. Stat. 2006, 19, 69–80. [Google Scholar]
Land, M.; Singh, S.; Sedory, S.A. Estimation of a rare sensitive attribute using Poisson distribution. Statistics 1965, 46, 351–360. [Google Scholar] [CrossRef]
Lee, G.S.; Hong, K.H.; Son, C.K. A stratified two-stage unrelated randomized response model for estimating a rare sensitive attribute based on the Poisson distribution. J. Stat. Theory Pract. 2016, 10, 239–262. [Google Scholar] [CrossRef]
Suman, S.; Singh, G.N. An ameliorated stratified two-stage randomized response model for estimating the rare sensitive parameter under Poisson distribution. Statistics 2019, 53, 395–416. [Google Scholar] [CrossRef]
Narjis, G.; Shabbir, J. An efficient partial randomized response model for estimating a rare sensitive attribute using Poisson distribution. Commun. Stat. Theory Methods 2021, 50, 1–17. [Google Scholar] [CrossRef]
Cochran, W.G. Sampling Techniques, 3rd ed.; John Wiley and Sons: New York, NY, USA, 1977. [Google Scholar]

Figure 1. Response flow using partial randomization device for the ith cluster.

Table 1. First stage randomization device

R_{1 i}

.

Table 1. First stage randomization device

R_{1 i}

.

	Question	Selection Probability
Question 1	Do you have a rare sensitive attribute $A_{i}$ ?	$T_{i}$
Question 2	Go to randomization device $R_{2 i}$ .	$1 - T_{i}$

Table 2. Second stage randomization device

R_{2 i}

.

Table 2. Second stage randomization device

R_{2 i}

.

	Question	Selection Probability
Question 1	Do you have a rare sensitive attribute $A_{i}$ ?	$P_{i 1}$
Question 2	Answer to “No”.	$P_{i 2}$
Question 3	Draw one more card	$P_{i 3}$

Table 3. The relationship between different estimators for cluster sampling.

	$P_{i} = M_{i} / M_{0}$	$M_{i} = \bar{M} = M_{0} / N$
${\hat{λ}}_{p p z w r}$	${\hat{λ}}_{p p z w r}$ = ${\hat{λ}}_{p p s w r}$
${\hat{λ}}_{p p s w r}$		${\hat{λ}}_{p p s w r}$ = ${\hat{λ}}_{w r}$
${\hat{λ}}_{p p s w o r}$
${\hat{λ}}_{w r}$

Table 4. The mean values of

R E_{1}

for

λ_{p p z w r}

vs.

λ_{w r}

.

Table 4. The mean values of

R E_{1}

for

λ_{p p z w r}

vs.

λ_{w r}

.

			$k_{i} = 15$				$k_{i} = 75$
			$T_{i}$				$T_{i}$
$λ$	$λ_{i}$	$P_{i}$	0.2	0.4	0.6	0.8	0.2	0.4	0.6	0.8
1.25	0.5	0.2	5.1216	5.1218	5.1219	5.122	5.1216	5.1217	5.1219	5.122
	1	0.4	5.1218	5.1219	5.1219	5.122	5.1218	5.1218	5.1219	5.122
	1.5	0.6	5.1219	5.1219	5.122	5.122	5.1219	5.1219	5.122	5.122
	2	0.8	5.122	5.122	5.122	5.122	5.122	5.122	5.122	5.122
1.5	0.5	0.2	2.5931	2.5931	2.5931	2.5931	2.5931	2.5932	2.5932	2.5932
	1	0.4	2.5931	2.5931	2.5931	2.5931	2.5932	2.5932	2.5932	2.5932
	1.5	0.6	2.5931	2.5931	2.5931	2.5931	2.5932	2.5932	2.5932	2.5932
	2	0.8	2.5931	2.5931	2.5931	2.5931	2.5932	2.5932	2.5932	2.5932
2	0.5	0.2	1.2366	1.2366	1.2366	1.2366	1.2367	1.2367	1.2367	1.2367
	1	0.4	1.2366	1.2366	1.2366	1.2366	1.2367	1.2367	1.2367	1.2367
	1.5	0.6	1.2366	1.2366	1.2366	1.2366	1.2367	1.2367	1.2367	1.2367
	2	0.8	1.2366	1.2366	1.2366	1.2366	1.2367	1.2367	1.2367	1.2367
2.25	0.5	0.2	1.0524	1.0524	1.0524	1.0524	1.0525	1.0525	1.0524	1.0524
	1	0.4	1.0524	1.0524	1.0524	1.0524	1.0525	1.0524	1.0524	1.0524
	1.5	0.6	1.0524	1.0524	1.0524	1.0524	1.0524	1.0524	1.0524	1.0524
	2	0.8	1.0524	1.0524	1.0524	1.0524	1.0524	1.0524	1.0524	1.0524

Table 5. The mean values of

R E_{2}

for

λ_{p p s w r}

vs.

λ_{p p z w r}

.

Table 5. The mean values of

R E_{2}

for

λ_{p p s w r}

vs.

λ_{p p z w r}

.

			$k_{i} = 15$				$k_{i} = 75$
			$T_{i}$				$T_{i}$
$λ$	$λ_{i}$	$P_{i}$	0.2	0.4	0.6	0.8	0.2	0.4	0.6	0.8
1.25	0.5	0.2	1.1104	1.1106	1.1107	1.1109	1.1102	1.1104	1.1106	1.1107
	1	0.4	1.1106	1.1107	1.1108	1.1109	1.1105	1.1106	1.1107	1.1108
	1.5	0.6	1.1108	1.1109	1.1109	1.111	1.1106	1.1107	1.1108	1.1108
	2	0.8	1.1109	1.1109	1.111	1.111	1.1108	1.1108	1.1108	1.1108
1.5	0.5	0.2	2.6033	2.604	2.6045	2.605	2.6027	2.6034	2.604	2.6045
	1	0.4	2.6041	2.6045	2.6048	2.6051	2.6036	2.604	2.6043	2.6046
	1.5	0.6	2.6047	2.6049	2.6051	2.6052	2.6042	2.6044	2.6046	2.6047
	2	0.8	2.6051	2.6052	2.6052	2.6053	2.6046	2.6047	2.6047	2.6048
2	0.5	0.2	3.2936	3.2941	3.2944	3.2947	3.2932	3.2937	3.2941	3.2944
	1	0.4	3.2942	3.2944	3.2946	3.2948	3.2938	3.2941	3.2943	3.2945
	1.5	0.6	3.2946	3.2947	3.2948	3.2949	3.2942	3.2943	3.2945	3.2946
	2	0.8	3.2948	3.2948	3.2949	3.2949	3.2945	3.2945	3.2946	3.2946
2.25	0.5	0.2	2.876	2.8762	2.8764	2.8766	2.8758	2.876	2.8762	2.8764
	1	0.4	2.8763	2.8764	2.8765	2.8766	2.8761	2.8762	2.8763	2.8764
	1.5	0.6	2.8765	2.8765	2.8766	2.8767	2.8763	2.8764	2.8764	2.8765
	2	0.8	2.8766	2.8766	2.8767	2.8767	2.8764	2.8765	2.8765	2.8765

Table 6. The mean values of

R E_{3}

for

λ_{p p s w r}

vs.

λ_{w r}

.

Table 6. The mean values of

R E_{3}

for

λ_{p p s w r}

vs.

λ_{w r}

.

			$k_{i} = 15$				$k_{i} = 75$
			$T_{i}$				$T_{i}$
$λ$	$λ_{i}$	$P_{i}$	0.2	0.4	0.6	0.8	0.2	0.4	0.6	0.8
1.25	0.5	0.2	5.6869	5.6881	5.6891	5.6899	5.6859	5.6872	5.6883	5.6891
	1	0.4	5.6884	5.6891	5.6896	5.6901	5.6875	5.6882	5.6888	5.6894
	1.5	0.6	5.6894	5.6897	5.69	5.6903	5.6886	5.689	5.6893	5.6896
	2	0.8	5.6901	5.6902	5.6903	5.6905	5.6893	5.6895	5.6896	5.6897
1.5	0.5	0.2	6.7506	6.7524	6.7538	6.7551	6.7491	6.751	6.7526	6.7539
	1	0.4	6.7529	6.7538	6.7547	6.7554	6.7515	6.7525	6.7535	6.7543
	1.5	0.6	6.7544	6.7548	6.7553	6.7557	6.7531	6.7536	6.7541	6.7546
	2	0.8	6.7554	6.7556	6.7557	6.7559	6.7542	6.7544	6.7546	6.7548
2	0.5	0.2	4.0731	4.0736	4.074	4.0744	4.0726	4.0732	4.0737	4.0741
	1	0.4	4.0737	4.074	4.0742	4.0745	4.0733	4.0737	4.0739	4.0742
	1.5	0.6	4.0742	4.0743	4.0744	4.0745	4.0738	4.074	4.0741	4.0743
	2	0.8	4.0745	4.0745	4.0746	4.0746	4.0741	4.0742	4.0743	4.0743
2.25	0.5	0.2	3.0268	3.027	3.0272	3.0274	3.0266	3.0269	3.0271	3.0273
	1	0.4	3.0271	3.0272	3.0273	3.0274	3.0269	3.0271	3.0272	3.0273
	1.5	0.6	3.0273	3.0273	3.0274	3.0275	3.0271	3.0272	3.0273	3.0273
	2	0.8	3.0274	3.0274	3.0275	3.0275	3.0273	3.0273	3.0273	3.0274

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, G.-S.; Hong, K.-H.; Son, C.-K. A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution. Mathematics 2024, 12, 196. https://doi.org/10.3390/math12020196

AMA Style

Lee G-S, Hong K-H, Son C-K. A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution. Mathematics. 2024; 12(2):196. https://doi.org/10.3390/math12020196

Chicago/Turabian Style

Lee, Gi-Sung, Ki-Hak Hong, and Chang-Kyoon Son. 2024. "A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution" Mathematics 12, no. 2: 196. https://doi.org/10.3390/math12020196

APA Style

Lee, G.-S., Hong, K.-H., & Son, C.-K. (2024). A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution. Mathematics, 12(2), 196. https://doi.org/10.3390/math12020196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution

Abstract

1. Introduction

2. PPS Estimation for a Rare Sensitive Attribute by Partial Randomized Response Model

2.1. Narjis, Shabbir’s Partial Randomized Response Model

2.2. Estimation by PPS When PSUs Are Selected with Replacement

2.3. Estimation by PPS When PSUs Are Selected without Replacement

2.4. Estimation via Two-Stage Equal Probability Sampling

3. Efficiency Comparisons for the PPS vs. Equal Probability Sampling

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI