Next Article in Journal
Optimal Error Quantification and Robust Tracking under Unknown Upper Bounds on Uncertainties and Biased External Disturbance
Previous Article in Journal
On a Linear Differential Game of Pursuit with Integral Constraints in 2
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution

1
Department of Children Welfare, Woosuk University, Wanju 55338, Republic of Korea
2
Department of Computer Science, Dongshin University, Naju 58245, Republic of Korea
3
Department of Applied Statistics, Dongguk University, Gyeongju 38066, Republic of Korea
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(2), 196; https://doi.org/10.3390/math12020196
Submission received: 29 November 2023 / Revised: 23 December 2023 / Accepted: 5 January 2024 / Published: 7 January 2024
(This article belongs to the Special Issue Uncertainty Quantification: Latest Advances and Applications)

Abstract

:
In this paper, we suggest using a partial randomized response model using Poisson distribution to efficiently estimate a rare sensitive attribute by applying the probability proportional to size (PPS) sampling method when the population is composed of several different and sensitive clusters. We have obtained estimators for a rare and sensitive attribute and their variances and variance estimates by applying PPS sampling and two-stage equal probability sampling. We compare the efficiency between the estimators of the rare sensitive attribute, one obtained via PPS sampling with replacement and the other obtained using the two-stage equal probability sampling with replacement. As a result, it is confirmed that the estimate obtained via the PPS sampling with replacement is more efficient than the estimate provided by the two-stage equal probability sampling with replacement when the cluster sizes are different.

1. Introduction

In a socially and personally very sensitive survey, if you directly ask a question to the respondents, they tend to refuse to answer or give a false answer. To solve this problem, ref. [1] proposed a randomized response model (RRM) that could obtain sensitive information while protecting the identity or confidentiality of the respondent through an indirect response using a randomization device. Since then, many researchers have suggested various randomized response models to improve the quality of estimation.
Subsequently, refs. [2,3,4] organized, summarized and systematized the randomized response models, ref. [5] applied two-stage cluster sampling to a randomized response model, and ref. [6] researched improving the practicality of randomized response model by suggesting a randomized response model using PPS sampling. Meanwhile, the authors of [7] suggested a unrelated question randomized response method to estimate the mean number of participants with a rare sensitive attribute using Poisson distribution. Examples of rare sensitive attributes include the proportion of people with AIDS who have persistent relationships with strangers, the proportion of people who witnessed murders, and the number of girls raped by their own fathers, etc. and examples of rare unrelated attributes include the proportion of people born correctly at 12 o’clock, the proportion of babies born blind, and the proportion of triplets delivered by women [8,9] suggested a stratified two-stage randomized response models for estimating a rare sensitive attribute under Poisson distribution.
Furthermore, ref. [10] proposed a partial randomized response model using Poisson distribution, providing an alternative approach to estimating rare sensitive attributes through simple random estimation and stratified estimation. Their model demonstrated higher efficiency compared to Suman and Singh’s model. However, this research also faces limitations when applied to actual surveys if the population is clustered. Therefore, when the population is clustered, it is expected that applying Narjis and Shabbir’s model, which is more efficient than Suman and Singh’s model, could offer a practical solution for estimating rare sensitive attributes in real surveys.
In this study, we proposed a method for estimating rare sensitive attributes when the survey question is highly sensitive, and the population is composed of clusters with varying sizes. We applied the probability proportional to the size sampling method, which assigns sampling probabilities in proportion to the size of the clusters, to the partial randomized response model of [10]. In Section 2, we first introduced the partial randomized response model and proposed estimation methods using Probability Proportional to Size (PPS) with replacement, PPS without replacement, and two-stage equal probability sampling. In Section 3, we compared the efficiency of the estimation methods, and finally, in Section 4, we presented conclusions and implications of the study.

2. PPS Estimation for a Rare Sensitive Attribute by Partial Randomized Response Model

In Section 2, when the survey questions are very sensitive and the population is composed of N clusters that each contains M i ( i = 1 , 2 , , N ) sub-units, a two-stage selection method is used, in which n clusters are selected with PPS or with equal probability from the population, and then m i ( i = 1 , 2 , , n ) survey units are selected through simple random sampling in each selected cluster, which is applied to the partial randomized response model using the Poisson distribution proposed by [10] to deal with the method of estimating a rare sensitive attribute.
In Section 2.1, we reviewed Narjis and Shabbir’s Partial randomized response model and then we considered the sampling method for the clusters via PPS sampling with replacements in Section 2.2. Clusters by PPS sampling without replacement are considered in Section 2.3, and clusters by equal probability sampling are examined in Section 2.4.

2.1. Narjis, Shabbir’s Partial Randomized Response Model

In the partial randomized response model, a sample of size n is selected via simple random sampling with replacement from the population. An individual is selected from the sample using two randomization devices ( R 1 , R 2 ) and is requested to report his/her response as per following outcomes of the devices.
The first-stage randomization device R 1 consists of the following statements:
(1)
I have the sensitive attribute A with probability T.
(2)
Go to the randomization device R 2 with probability T.
The second-stage randomization device R 2 consists of the following statements:
(1)
I have the sensitive attribute A.
(2)
Forced to say No.
(3)
Draw one more card.
With probabilities P 1 , P 2 and P 3 respectively, i = 1 3 P i = 1 .
If the statement (3) appears on the card of the respondent, then it is necessary to carry out the process without replacing the card. In the second draw, if statement (3) reappears, then the respondent is suggested to report his/her actual status. The respondent should answer the question with s “Yes” (or “No”), if his/her actual status matches (un-matches) with the statement on the card.
The probability of getting a “Yes” from the respondent is given by:
l 0 = T π + ( 1 T ) P 1 π 1 + P 3 k k 1 + P 3 2 k k 1 π
where k is the total number of cards in the randomization device R 2 .
As before, assuming that n and θ 0 0 , then n θ 0 = λ 0 (finite). Equation (1) can be rewritten as
λ 0 = T λ + ( 1 T ) P 1 λ 1 + P 3 k k 1 + P 3 2 k k 1 λ
Let y 1 , y 2 , , y n be a random sample of n observations from the Poisson distribution with parameter λ 0 .
The maximum-likelihood estimator of λ 0 is given by:
λ ^ p = 1 n j = 1 n y i T + ( 1 T ) P 1 + P 3 k k 1 ( P 1 + P 3 )
The variance of the estimator λ ^ p is given by:
V λ ^ p = λ n T + ( 1 T ) P 1 + P 3 k k 1 ( P 1 + P 3 )

2.2. Estimation by PPS When PSUs Are Selected with Replacement

Suppose n primary sampling units (PSUs) of size M i ( i = 1 , 2 , , n ) have been selected from the population of N clusters with selection probability φ i with replacement and the secondary sampling units (SSUs) of m i ( i = 1 , 2 , , n ) size are selected from each chosen primary unit using SRSWR. We apply the two-stage sampling procedure to Narjis and Shabbir’s partial randomized response model to estimate a rare sensitive attribute. Each person selected via the two-stage sampling procedure is requested to answer “Yes” or “No” using Narjis and Shabbir’s randomization device such as Table 1 and Table 2 for each First and Second randomization device in ith cluster.
If Question 3 in randomization device R 2 i appears on the card of the respondent, then it is necessary to select a card repeatedly in R 2 i without replacing the card. In the second draw, if Question 3 reappears, then the respondent is suggested to report his/her “Yes” or “No”, according to his/her true response to the sensitive question.
From First and Second randomization devices, T i is the selection probability of a rare sensitive question in randomization device R 1 i for the ith cluster, π i is the population proportion of a rare sensitive attribute for the ith cluster, and P i 1 is the selection probability of a rare sensitive question in randomization device R 2 i for the ith cluster. And P i 2 is the selection probability of the forced answer “No” in randomization device R 2 i , P i 3 is the selection probability of the statement “Draw one more cards” in randomization device R 2 i for the ith cluster, and k i is the number of cards in the card deck of randomization device R 2 i for the ith cluster.
The probability of answering “Yes” from the respondent in cluster i is given by
l i 0 = T i π i + ( 1 T i ) P i 1 π i 1 + P i 3 k i k i 1 + P i 3 2 k i k i 1 π i
To clarify the response process, we presented a flow chart for the probability of answering “Yes” for ith cluster in Figure 1.
Since the attribute A i in cluster i is very rare in the population, if we assume m i and l i 0 0 , then m i l i 0 = λ i 0 (finite).
Let y i 1 , y i 2 , , y i m i be a random sample of m i observations from the Poisson distribution with parameter λ i 0 in cluster i, then the estimator λ ^ i of λ i , the parameter of a rare sensitive attribute of cluster i, is given by
λ ^ i = 1 m i j = 1 m i y i j T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 )
When respondents are selected via simple random sampling with replacement from the ith cluster, which was selected with replacement using sampling probability φ i for the estimator λ ^ p p z w r of λ , the parameter of a rare sensitive attribute is given by:
λ ^ p p z w r = 1 n M 0 i = 1 n M i λ ^ i φ i
where M 0 = i = 1 N M i .
Theorem 1.
The estimator λ ^ p p z w r is an unbiased estimator of the parameter λ.
Proof. 
Since y i j i i d P o ( λ i 0 ) for each cluster and
λ i 0 = T i λ i + ( 1 T i ) P i 1 λ i 1 + P i 3 k i k i 1 + P i 3 2 k i k i 1 λ i .
We have
E 1 E 2 λ ^ p p z w r = E 1 E 2 1 n M 0 i = 1 n M i λ ^ i φ i = E 1 1 n M 0 i = 1 n M i E 2 ( λ ^ i ) φ i ,
where
E 2 ( λ ^ i ) = E 2 1 m i j = 1 m i y i j T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) = λ i 0 T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) = λ i ,
we can obtain
E 1 E 2 λ ^ p p z w r = E 1 1 n M 0 i = 1 n M i λ i φ i = 1 n M 0 i = 1 N φ i M i λ i φ i = λ .
Theorem 2.
The variance of λ ^ p p z w r is given by
V ( λ ^ p p z w r ) = 1 n M 0 2 i = 1 N φ i M i λ i φ i M 0 λ 2 + 1 n M 0 2 i = 1 N M i 2 m i φ i λ i T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 )
Proof. 
By [11], we have
V ( λ ^ p p z w r ) = V 1 E 2 ( λ ^ p p z w r ) + E 1 V 2 ( λ ^ p p z w r ) ,
where
V 1 E 2 ( λ ^ p p z w r ) = V 1 E 2 1 n M 0 i = 1 n M i λ ^ i φ i = V 1 1 n M 0 i = 1 n M i λ i φ i = 1 n M 0 2 i = 1 N φ i M i λ i φ i M 0 λ 2
and
E 1 V 2 ( λ ^ p p z w r ) = E 1 V 2 1 n M 0 i = 1 n M i λ ^ i φ i = E 1 1 ( n M 0 ) 2 i = 1 n M i 2 φ i 2 V 2 1 m i j = 1 m i y i j T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) = E 1 1 ( n M 0 ) 2 i = 1 n M i 2 φ i 2 1 m i 2 j = 1 m i V 2 ( y i j ) T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) 2 .
Because y i j i i d P o ( λ i 0 ) , we have
E 1 V 2 ( λ ^ p p z w r ) = E 1 1 ( n M 0 ) 2 i = 1 n M i 2 φ i 2 1 m i 2 j = 1 m i λ i 0 T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) 2 = E 1 1 ( n M 0 ) 2 i = 1 n M i 2 φ i 2 m i λ i 0 T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) 2 = E 1 1 ( n M 0 ) 2 i = 1 n M i 2 φ i 2 m i λ i T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) = 1 n M 0 2 i = 1 N M i 2 φ i 1 m i λ i T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) .
Thus, we determine the variance of λ ^ p p z w r as shown in (8). □
Also, the estimator of V ( λ ^ p p z w r ) is given by
V ^ ( λ ^ p p z w r ) = 1 n ( n 1 ) M 0 2 i = 1 n M i λ ^ i φ i λ ^ p p z w r 2 .
On the other hand, when the sampling probabilities of n PSUs are proportional to each cluster size M i , then φ i = M i / M 0 , which is called PPS sampling. When a sample of n PSUs are selected via PPS sampling with replacement and m i SSUs are selected using simple random sampling with replacement from each PSU, the estimator λ ^ p p z w r of λ is as follows
λ ^ p p s w r = 1 n i = 1 n λ ^ i .
And the variance of λ ^ p p s w r and its estimator are, respectively,
V ( λ ^ p p s w r ) = 1 n M 0 i = 1 N M i λ i λ 2 + 1 n M 0 i = 1 N M i m i λ i T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) ,
and
V ^ ( λ ^ p p s w r ) = 1 n ( n 1 ) i = 1 n λ ^ i λ ^ p p s w r M 0 2 .

2.3. Estimation by PPS When PSUs Are Selected without Replacement

Suppose n PSUs of size M i ( i = 1 , 2 , , n ) have been selected from the population of N clusters with selection probability ϕ i without replacement and the SSUs of size m i are selected from each chosen primary unit via SRSWR. We apply the two-stage sampling procedure to Narjis and Shabbir’s RRT to estimate a rare sensitive attribute.
The estimator λ ^ p p s w o r of λ , the parameter of a rare sensitive attribute obtained using the above sampling procedure is given by
λ ^ p p s w o r = 1 M 0 i = 1 n M i λ ^ i ϕ i .
where ϕ i is the inclusion probability of survey unit i.
And the variance of λ ^ p p s w o r is given by:
V λ ^ p p s w o r = 1 M 0 2 i = 1 N j > i N ( ϕ i ϕ j ϕ i j ) M i λ i ϕ i M j λ j ϕ j 2 + 1 M 0 2 i = 1 N M i 2 m i ϕ i λ i T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) ,
where ϕ i j is the joint inclusion probability of survey units i and j.
Also, the estimator of V ( λ ^ p p s w o r ) is given by
V ^ ( λ ^ p p s w o r ) = 1 M 0 2 i = 1 n j > i n ϕ i ϕ j ϕ i j ϕ i j M i λ ^ i ϕ i M j λ ^ j ϕ j 2 + 1 M 0 2 i = 1 n M 0 2 ϕ i ( m i 1 ) λ ^ i T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 )

2.4. Estimation via Two-Stage Equal Probability Sampling

Suppose n PSUs of size M i ( i = 1 , 2 , , n ) have been selected from the population of N clusters by SRSWR and the SSUs of size m i are selected again from each chosen PSU via SRSWR. We consider the two-stage equal probability sampling procedure for Narjis and Shabbir’s RRT for estimating a rare sensitive attribute. The estimator λ ^ w r of λ , the parameter of a rare sensitive attribute, obtained using the above procedure is given by
λ ^ w r = 1 n M ¯ i = 1 n M i λ ^ i ,
where M ¯ = M 0 / N .
V λ ^ w r = 1 n M ¯ 2 1 ( N 1 ) i = 1 N ( M i λ i M ¯ λ ) 2 + 1 n M ¯ 2 i = 1 N M i 2 m i λ i T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) ,
and
V ^ ( λ ^ w r ) = 1 n ( n 1 ) i = 1 n ( N M i λ ^ i λ ^ w r ) 2 ,
where M ¯ = M 0 / N .

3. Efficiency Comparisons for the PPS vs. Equal Probability Sampling

Narjis and Shabbir’s RRT model was developed under the assumption of simple random sampling and stratified random sampling, and the efficiency thereof was compared with that of the estimators [9]. Therefore, it is reasonable to compare the existing estimator with the estimator proposed in this paper using Narjis and Shabbir’s model. However, in the case of cluster sampling, the increase in variance compared to that obtained using simple random sampling or stratified sampling has already been dealt with in the typical sampling textbooks, so in this paper, as described above, when the population consists of N clusters, we consider the case the PPS with replacement estimator and two-stage equal probability estimator.
Now, the difference between the variance (17) of two-stage equal probability sampling and the variance (11) of PPS with replacement sampling is given as follows under N 1 N
V ( λ ^ w r ) V ( λ ^ p p s w r ) = 1 n N M ¯ 2 i = 1 N ( M i M ¯ ) 2 λ i 2 + M ¯ i = 1 N ( M i M ¯ ) ( λ i 2 λ 2 ) + i = 1 N ( M i M ¯ ) 2 m i λ i T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) + M ¯ i = 1 N ( M i M ¯ ) m i λ i T i + ( 1 T i ) P i 1 + P i 3 k i k i 1 ( P i 1 + P i 3 ) .
In (19), if M i = M ¯ = M 0 / N then V ( λ ^ w r ) = V ( λ ^ p p s w r ) . In other words, if the cluster sizes are equal, the selection probability of PPS sampling with replacement becomes 1 / N and is equal to that of two-stage equal probability sampling with replacement. Hence, they have the same efficiency.
If each cluster size M i is unequal, the values i = 1 N ( M i M ¯ ) 2 λ i 2 of first term of the right-hand side in (19) are much increased, and the values i = 1 N ( M i M ¯ ) ( λ i 2 λ 2 ) of the second term of the right-hand side in (19) have relatively small ones. Hence, the estimation using PPS sampling with replacement is more efficient than that of two-stage equal probability sampling with replacement.
We tabulate to summarize the relationship for each estimator in a cluster sampling design as follows.
Now, we compare the efficiency by calculating relative efficiencies (RE) between different sampling methods, such as simple random sampling with replacement (:ppzwr), PPS sampling with replacement (:ppswr) and two-stage equal probability sampling with replacement (:wr) according to varying parameter combinations by numerical example.
R E 1 = V ( λ ^ w r ) V ( λ ^ p p z w r ) , R E 2 = V ( λ ^ p p z w r ) V ( λ ^ p p s w r ) , R E 3 = V ( λ ^ w r ) V ( λ ^ p p s w r ) .
The values of R E 1 greater than one means that unequal probability sampling with replacement (:ppzwr) is more efficient than two-stage equal probability sampling with replacement (:wr), R E 2 greater than one means that PPS sampling with replacement (:ppswr) is more efficient than unequal probability sampling with replacement(:ppzwr), and R E 3 greater than one means that PPS sampling with replacement (:ppswr) is more efficient than two-stage equal probability sampling with replacement(:wr).
In calculating REs, we set parameters for ith cluster ( i = 1 , 2 , 3 , 4 ) as follows.
  • M 0 = 10,000; M 1 = 1000 ; M 2 = 2000 ; M 3 = 3000 ; M 4 = 4000 ,
  • m 0 = 1000 ; m 1 = 100 ; m 2 = 200 ; m 3 = 300 ; m 4 = 400 ,
  • λ = 1.25 , 1.5 , 2.0 , 2.25 ;
  • λ 1 = 0.5 , λ 2 = 1.0 , λ 3 = 1.5 , λ 4 = 2.0 ;
  • k 1 = k 2 = k 3 = k 4 = 15 , 75 ;
  • P i 1 , P i 2 = 1 P i 1 3 , P i 3 = 1 P i 1 P i 2 .
We also assume the selection probabilities for ith cluster as follows.
  • T 1 = T 2 = T 3 = T 4 ;
  • P 11 = P 12 = P 13 = P 21 = P 22 = P 23 = P 31 = P 32 = P 33 = P 41 = P 42 = P 43 ,
varying from 0.2 to 0.8 by 0.2.
In order to compare the efficiency of the proposed estimators from numerical examples, we summarized the relative efficiencies according to various parameter values with their mean values.
From Table 3, it can be seen that for all the parametric combinations, the mean values of R E 1 are greater than one, which indicates that the unequal probability sampling with replacement estimator λ ^ p p z w r is more efficient than the two-stage estimator, λ ^ w r , as the sensitive attribute value λ decreases, and in contrast, if sensitive attribute λ increases, then the efficiency of λ ^ p p z w r decreases. In addition, the variation in R E 1 with respect to k i indicates that the R E 1 increases as the values of selection probability T i increase.
As shown in Table 4, the probability proportional to size estimator, λ ^ p p s w r , is more efficient than the unequal probability sampling with replacement estimator, λ ^ p p z w r . As the sensitive attribute value λ increases, and in contrast, as λ decreases, the probability proportional estimator decreases in efficiency.
As shown in Table 5, the probability proportional to size estimator, λ ^ p p s w r , is more efficient than the two-stage sampling with replacement estimator, λ ^ w r . As the sensitive attribute value λ decreases, and in contrast, as λ decreases, the probability proportional estimator decreases in efficiency.
In summary, an examination of the efficiency of a partial randomized response model for rare sensitive attributes based on a cluster sampling design with numerical examples shows the following trends:
(1)
Between p p z w r and w r , efficiency decreases as a rare sensitive attribute λ increases (refer to Table 4).
(2)
Between p p s w r and p p z w r , efficiency increases as λ increases, and efficiency is relatively low at specific values of λ (refer to Table 5).
(3)
Between p p s w r and w r , efficiency increases as λ decreases, similar to the relation between p p s w r and p p z w r , where efficiency sharply increases at specific values of λ (refer to Table 6).
(4)
The number of cards k i does not significantly impact efficiency.

4. Conclusions

In this paper, when the population is composed of several different and sensitive clusters, we suggest a randomized method for efficiently estimating a rare sensitive attribute by applying the PPS sampling method to the partial randomized response model of [10]. And by applying PPS sampling and two-stage equal probability sampling, estimators for a rare and sensitive attribute and its variance and variance estimates are obtained. We compare the efficiency between the estimators of the rare sensitive attribute, one obtained using the PPS with replacement sampling method and the other obtained using the two-stage equal probability sampling with replacement method when the cluster sizes are different. As a result, it was confirmed that the estimation obtained using the PPS sampling with replacement is more efficient than the estimation obtained based on the two-stage equal probability sampling with replacement when the cluster sizes are different from each other.

Author Contributions

Conceptualization, G.-S.L.; methodology, C.-K.S.; writing—original draft preparation, K.-H.H.; writing—review and editing, C.-K.S.; project administration and funding acquisition, G.-S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by Woosuk University.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We would like to thank the anonymous reviewers for their very careful reading and valuable comments/suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Warner, S.L. Randomized response: A survey technique for eliminating evasive answer bias. J. Am. Stat. Assoc. 1965, 60, 63–69. [Google Scholar] [CrossRef] [PubMed]
  2. Fox, J.A.; Tracy, P.E. Randomized Response: A Method for Sensitive Survey; Sage Publications: Newbury Park, CA, USA, 1986. [Google Scholar]
  3. Chaudhuri, A.; Mukerjee, R. Randomized Response: Theory and Techniques; Marcel Dekker, Inc.: New York, NY, USA, 1988. [Google Scholar]
  4. Ryu, J.B.; Hong, K.H.; Lee, G.S. Randomized Response Model; Freedom Academy: Seoul, Republic of Korea, 1993. [Google Scholar]
  5. Lee, G.S.; Hong, K.H. Randomized response model by two-stage cluster sampling. Korean Commun. Stat. 1998, 5, 99–105. [Google Scholar]
  6. Lee, G.S. A Study on the Randomized Response Technique by PPS Sampling. Korean J. Appl. Stat. 2006, 19, 69–80. [Google Scholar]
  7. Land, M.; Singh, S.; Sedory, S.A. Estimation of a rare sensitive attribute using Poisson distribution. Statistics 1965, 46, 351–360. [Google Scholar] [CrossRef]
  8. Lee, G.S.; Hong, K.H.; Son, C.K. A stratified two-stage unrelated randomized response model for estimating a rare sensitive attribute based on the Poisson distribution. J. Stat. Theory Pract. 2016, 10, 239–262. [Google Scholar] [CrossRef]
  9. Suman, S.; Singh, G.N. An ameliorated stratified two-stage randomized response model for estimating the rare sensitive parameter under Poisson distribution. Statistics 2019, 53, 395–416. [Google Scholar] [CrossRef]
  10. Narjis, G.; Shabbir, J. An efficient partial randomized response model for estimating a rare sensitive attribute using Poisson distribution. Commun. Stat. Theory Methods 2021, 50, 1–17. [Google Scholar] [CrossRef]
  11. Cochran, W.G. Sampling Techniques, 3rd ed.; John Wiley and Sons: New York, NY, USA, 1977. [Google Scholar]
Figure 1. Response flow using partial randomization device for the ith cluster.
Figure 1. Response flow using partial randomization device for the ith cluster.
Mathematics 12 00196 g001
Table 1. First stage randomization device R 1 i .
Table 1. First stage randomization device R 1 i .
QuestionSelection Probability
Question 1Do you have a rare sensitive attribute A i ? T i
Question 2Go to randomization device R 2 i . 1 T i
Table 2. Second stage randomization device R 2 i .
Table 2. Second stage randomization device R 2 i .
QuestionSelection Probability
Question 1Do you have a rare sensitive attribute A i ? P i 1
Question 2Answer to “No”. P i 2
Question 3Draw one more card P i 3
Table 3. The relationship between different estimators for cluster sampling.
Table 3. The relationship between different estimators for cluster sampling.
P i = M i / M 0 M i = M ¯ = M 0 / N
λ ^ p p z w r λ ^ p p z w r = λ ^ p p s w r
λ ^ p p s w r     λ ^ p p s w r = λ ^ w r
λ ^ p p s w o r
λ ^ w r
Table 4. The mean values of R E 1 for λ p p z w r vs. λ w r .
Table 4. The mean values of R E 1 for λ p p z w r vs. λ w r .
k i = 15 k i = 75
T i T i
λ λ i P i 0.20.40.60.80.20.40.60.8
1.250.50.25.12165.12185.12195.1225.12165.12175.12195.122
10.45.12185.12195.12195.1225.12185.12185.12195.122
1.50.65.12195.12195.1225.1225.12195.12195.1225.122
20.85.1225.1225.1225.1225.1225.1225.1225.122
1.50.50.22.59312.59312.59312.59312.59312.59322.59322.5932
10.42.59312.59312.59312.59312.59322.59322.59322.5932
1.50.62.59312.59312.59312.59312.59322.59322.59322.5932
20.82.59312.59312.59312.59312.59322.59322.59322.5932
20.50.21.23661.23661.23661.23661.23671.23671.23671.2367
10.41.23661.23661.23661.23661.23671.23671.23671.2367
1.50.61.23661.23661.23661.23661.23671.23671.23671.2367
20.81.23661.23661.23661.23661.23671.23671.23671.2367
2.250.50.21.05241.05241.05241.05241.05251.05251.05241.0524
10.41.05241.05241.05241.05241.05251.05241.05241.0524
1.50.61.05241.05241.05241.05241.05241.05241.05241.0524
20.81.05241.05241.05241.05241.05241.05241.05241.0524
Table 5. The mean values of R E 2 for λ p p s w r vs. λ p p z w r .
Table 5. The mean values of R E 2 for λ p p s w r vs. λ p p z w r .
k i = 15 k i = 75
T i T i
λ λ i P i 0.20.40.60.80.20.40.60.8
1.250.50.21.11041.11061.11071.11091.11021.11041.11061.1107
10.41.11061.11071.11081.11091.11051.11061.11071.1108
1.50.61.11081.11091.11091.1111.11061.11071.11081.1108
20.81.11091.11091.1111.1111.11081.11081.11081.1108
1.50.50.22.60332.6042.60452.6052.60272.60342.6042.6045
10.42.60412.60452.60482.60512.60362.6042.60432.6046
1.50.62.60472.60492.60512.60522.60422.60442.60462.6047
20.82.60512.60522.60522.60532.60462.60472.60472.6048
20.50.23.29363.29413.29443.29473.29323.29373.29413.2944
10.43.29423.29443.29463.29483.29383.29413.29433.2945
1.50.63.29463.29473.29483.29493.29423.29433.29453.2946
20.83.29483.29483.29493.29493.29453.29453.29463.2946
2.250.50.22.8762.87622.87642.87662.87582.8762.87622.8764
10.42.87632.87642.87652.87662.87612.87622.87632.8764
1.50.62.87652.87652.87662.87672.87632.87642.87642.8765
20.82.87662.87662.87672.87672.87642.87652.87652.8765
Table 6. The mean values of R E 3 for λ p p s w r vs. λ w r .
Table 6. The mean values of R E 3 for λ p p s w r vs. λ w r .
k i = 15 k i = 75
T i T i
λ λ i P i 0.20.40.60.80.20.40.60.8
1.250.50.25.68695.68815.68915.68995.68595.68725.68835.6891
10.45.68845.68915.68965.69015.68755.68825.68885.6894
1.50.65.68945.68975.695.69035.68865.6895.68935.6896
20.85.69015.69025.69035.69055.68935.68955.68965.6897
1.50.50.26.75066.75246.75386.75516.74916.7516.75266.7539
10.46.75296.75386.75476.75546.75156.75256.75356.7543
1.50.66.75446.75486.75536.75576.75316.75366.75416.7546
20.86.75546.75566.75576.75596.75426.75446.75466.7548
20.50.24.07314.07364.0744.07444.07264.07324.07374.0741
10.44.07374.0744.07424.07454.07334.07374.07394.0742
1.50.64.07424.07434.07444.07454.07384.0744.07414.0743
20.84.07454.07454.07464.07464.07414.07424.07434.0743
2.250.50.23.02683.0273.02723.02743.02663.02693.02713.0273
10.43.02713.02723.02733.02743.02693.02713.02723.0273
1.50.63.02733.02733.02743.02753.02713.02723.02733.0273
20.83.02743.02743.02753.02753.02733.02733.02733.0274
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, G.-S.; Hong, K.-H.; Son, C.-K. A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution. Mathematics 2024, 12, 196. https://doi.org/10.3390/math12020196

AMA Style

Lee G-S, Hong K-H, Son C-K. A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution. Mathematics. 2024; 12(2):196. https://doi.org/10.3390/math12020196

Chicago/Turabian Style

Lee, Gi-Sung, Ki-Hak Hong, and Chang-Kyoon Son. 2024. "A Probability Proportional to Size Estimation of a Rare Sensitive Attribute Using a Partial Randomized Response Model with Poisson Distribution" Mathematics 12, no. 2: 196. https://doi.org/10.3390/math12020196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop