1. Introduction and Literature Review
In an exploratory survey, we encounter/incur certain errors that are not related to sampling. These errors include inaccuracies in observations due to wrong reporting, recording, tabulating, processing of data, or failure to measure some units of the sample. The measurements obtained on the units for estimating the study characteristic are rarely accurate. Measurement errors or observational errors, which refer to the disparity between observed values and the true values of the study characteristic, are common in survey sampling. For instance, a lot of families in a country typically do not register their baby; hence, no birth certificate can be issued because the birth could not have been registered. Since the birth was not registered, it is likely that the respondent who was part of the sample provided an approximate age rather than the real age.
Moreover, even when a variable is clearly specified, it occasionally happens that observations can be made on closely comparable alternatives known as proxies. As an illustration, if we want to learn about someone’s financial situation but they are unwilling to answer, we can still gather the information by changing the inquiry. For example, we could inquire about their educational background rather than explicitly asking about their financial situation. This will only be an estimate, though, as having a high level of education does not automatically translate to having a successful career. The issue of measuring inaccuracies has been covered by numerous authors, such as [
1,
2,
3,
4,
5,
6,
7,
8].
Besides this, some other factors may contribute to the non-sampling errors. It is possible that the respondent could not provide the needed details; however, the right respondent was intended for the query. Such a kind of non-sampling error is termed a non-response. Non-response is a further problem that frequently occurs during survey sampling. The problem of non-response might occur for any of the following reasons: the respondent may have been absent at the time of the survey, have refused to participate or his/her inability to remember the correct response. The phenomenon of non-response has been studied by the authors including [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19]. Several researchers, such as [
20,
21,
22,
23,
24] have recently combined their studies on measurement error and non-response.
In this chapter, we have proposed a new calibration estimator for estimating the mean of a stratified population in the presence of both non-response and measurement error. We endeavored to examine the consequences of measurement error and non-response on the study variable. New calibration weights have been created by minimizing the chi-square-type distance function subject to some calibration constraints based on the auxiliary information. Additionally, empirical data through simulation analysis have been shown to support the effectiveness of the proposed calibration estimator. The effectiveness of the suggested calibration estimator has been elaborated over some existing estimators.
2. Sampling Design, Procedures and Notations
Suppose a finite population consists of units and it is divided into homogeneous sub-groups (called strata) such that stratum comprises units and . Let and be the study and auxiliary variables assuming values and , respectively, on the unit in the stratum . Let and be the population means of the variables and , respectively. Let and be the respective population means of the variables and in the stratum. We choose a sample of units from stratum using the simple random sampling without replacement (SRSWOR) method and assume that units respond and units do not respond on study variable . It is further assumed that the auxiliary variable is free from the non-response. We then select a sub-sample of units from non-responding units in the stratum. Here, indicates the inverse sampling rate.
Let
and
be the observed and actual values of the variables (
X,
Y) on the
unit in the
stratum. The measurement errors
and
can be defined as
Let and be the population variances of and in the stratum. Let and be the population variances of and for the non-responding group in the stratum. Let and be the population variances related to the measurement errors and in the stratum. Let and be the population variances related to the measurement errors and for the non-responding group in the stratum. Let be the coefficients of variation under the variables in the stratum. Let be the coefficients of variation under the variables for the non-responding group in the stratum. The mathematical interpretations of the notations are given below:
, , , , , , , , , , , , , , , , , , , , , , , and . is the number of non-responding units in the stratum.
3. Existing Estimators
In stratified random sampling, ref. [
15] estimator for the population mean
under measurement error and non-response is given as
where
.
and
are, respectively, the means based on
and
units under study variable
.
The expression for the variance (
VAR) of the estimator
is represented as
where
,
and
.
is the non-response rate in the
stratum.
The separate ratio-type estimator of the population mean
in stratified random sampling under measurement error and non-response is given by
where
and
.
and
are, respectively, the means based on
and
units under the auxiliary variable
.
The expression for the mean square error (
MSE) of the estimator
is given as follows:
where
.
is the population correlation coefficient between
and
in the
stratum.
is the population correlation coefficient between
and
for the non-response group in the
stratum.
Ref. [
24] has proposed a new estimator of the population mean
under stratified random sampling in the presence of measurement error and non-response as
where
and
is arbitrarily chosen scaler.
The following are the approximate expressions for the bias and
MSE of the estimator
up to the first order of approximation:
where
and
The expression for the minimum
MSE of the estimator
at the optimum value of
is given as follows:
4. Proposed Calibration Estimator
The ref. [
15] estimator of the population total
under stratified random sampling in the presence of measurement error and non-response is given as
where
and
are the design weights. Further,
and
are the inclusion probabilities.
Now, we suggest a novel calibration estimator for the population total
T in stratified random sampling under measurement error and non-response as follows:
where
is the calibrated weight for the
non-responding unit in the
stratum, which minimizes the chi-square-type distance function
subject to the calibration constraints
Here,
is the tuning parameter associated with the
non-responding unit in the
stratum. One can derive a new calibration weight solution by minimizing the chi-square-type distance function subject to the given calibration constraint. Thus, the Lagrange function can be written as
where
is the Lagrange multiplier.
Differentiating the
with respect to
, we get
Hence, substituting the value of
from Equation (16) into Equation (14), we have
Putting the value of
from Equation (17) into Equation (16) by assuming
, we get
Thus, by placing the value of
from Equation (18) into Equation (12), the estimator
becomes
Substituting the values of design weights
and
, the calibration estimator
of the population total T is obtained as
where
,
and
.
Now, we define the calibration estimator for the mean of the stratified population,
under the presence of non-response and measurement error as
Here,
is the calibrated weight for the
stratum that has to be chosen to minimize the chi-square-type distance function
subject to the calibration constraint
where
is the tuning parameter for the
stratum.
Let us define the Lagrange function as
where
is the Lagrange multiplier.
Differentiating Equation (24) with respect to
and equalizing the derivative to zero, we get
Substituting the value of
from Equation (25) into Equation (23), we get
as
Considering
and value of
given in Equation (26), Equation (25) reduces to
Substituting the value of
from Equation (27) into Equation (21), the calibration estimator
of the population mean
becomes
5. Properties of Proposed Calibration Estimator
Let us rewrite the proposed calibration estimator
as
where
and
.
In order to obtain the
MSE of the proposed calibration estimator
, we use the theory of large sample approximations. Let us assume
Now, articulating Equation (29) in terms of
,
and neglecting the higher order terms, we get
Squaring both sides of Equation (30) and then taking the expectation, we have
Substituting the values of
,
and
into Equation (31), we get
Thus, the
MSE of the proposed calibration estimator
is given as follows:
where
,
and
.
is the correlation coefficient between measurement errors
and
in the
stratum.
is the correlation coefficient between measurement errors
and
for the non-response group in the
stratum.
6. Empirical Study
To assess the efficacy of the suggested calibration estimator under the influence of non-response and measurement error, it is critical to show the theoretical truth through some numerical instances. In this section, we used two different data sets that were generated intentionally to perform an empirical examination. The R programming language is used to execute the simulation analysis.
6.1. Data Set I
We created an artificial data set that defines a population of four strata with respective sizes 3000, 4500, 1200 and 2700 in order to gain some insight into effectiveness.
The data under the study variable
for each stratum are produced using Normal distribution with mean
and standard deviation
, i.e.,
amid 10%, 20%, 30% and 40% weights of the non-respondent group. We followed the instructions provided by [
25] to create the data associated with the auxiliary variable
for each stratum under the assumption that it would have specific correlations with the study variable
. Therefore, we first generate the data under a dummy variable
for each stratum with the same distribution as that of
. Then, we generate the data under the auxiliary variable
for each stratum using the transformation
. Here,
indicates the correlation coefficient between the study variable
and the auxiliary variable
. The particulars of the population are given in
Table 1.
Further, we generate data under measurement errors U and V using Normal distributions, i.e., and . It is assumed that the measurement errors and are uncorrelated with each other. Now, we obtain the observed values as and .
Now, we fix the sample size as
= 2000. Using proportional allocation, we determine the stratum sample size and then select the sample from each stratum. Further, we compute the estimates of
, through the estimators
and
utilizing the sample information. There have been 1000 replications of the steps involved in selecting the sample from each stratum and computing the estimates. Finally, we have calculated the approximate
VAR/MSE (
AVAR/
AMSE) of the estimators
using the following formulae:
Table 2 reveals the
AVAR/
AMSE of the estimators
,
,
and
. The percentage relative efficiency (
PRE) of the estimators
,
and
with respect to the estimator
is also revealed.
6.2. Data Set II
We have generated another data set to strengthen the performance of the proposed calibration estimator. As per
Section 6.1, the procedure of data generation and finalization of results was followed.
Table 3 describes the particulars of the population.
Table 4 depicts the
AVAR/
AMSE of the estimators
,
,
and
. The
PRE of the estimators
,
and
with respect to the estimator
is also depicted.
7. Concluding Remarks
A new calibration estimator of the population mean under stratified random sampling in the presence of non-response and measurement error has been pioneered out. The expression for the
MSE of the suggested calibration estimator has been derived up to the first order of approximation. To determine the effectiveness of the suggested calibration estimator, a simulation analysis through some artificially generated data has been carried out. In simulation analysis, the
AVAR/
AMSE has been utilized as a tool to assess the suggested calibration estimator’s accuracy relative to ref. [
15] estimator, the separate ratio-type estimator and [
24] estimator.
Table 2 and
Table 4 reveal that the suggested calibration estimator provides the best
PRE as compared to the other estimators. According to this study, the proposed calibration estimator produces better results than the current ones, and, hence, it can be very useful in the circumstances that arise in practice.
Author Contributions
Conceptualization, M.K.C. and N.B.; Methodology, M.K.C. and N.B.; Software, M.K.C. and N.B.; Validation, M.K.C. and N.B.; Formal Analysis, M.M.H.; Investigation, M.M.A. and M.M.H.; Resources, M.M.A. and M.M.H.; Data Curation, M.M.A.; Writing—Original Draft, N.B.; Writing—Review and Editing, N.B.; Visualization, M.M.A. and M.M.H.; Supervision, M.K.C.; Funding Acquisition, M.M.A. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported and funded by the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) (grant number IMSIU-DDRSP2602).
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Acknowledgments
The authors extend their appreciation to the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University (IMSIU) for funding this work through this research group, grant number IMSIU-DDRSP2602.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Cochran, W.G. Sampling Techniques, 2nd ed.; Wiley Eastern Limited: New Delhi, India, 1963. [Google Scholar]
- Cochran, W.G. Sampling Techniques, 3rd ed.; John Wiley and Sons: New York, NY, USA, 1977. [Google Scholar]
- Fuller, W. Estimation in the presence of measurement error. Int. Stat. Rev. 1995, 63, 121–147. [Google Scholar] [CrossRef]
- Manisha; Singh, R.K. An estimation of population mean in the presence of measurement errors. J. Ind. Soc. Agric. Stat. 2001, 54, 13–18. [Google Scholar]
- Shalabh. Ratio method of estimation in the presence of measurement errors. J. Indian Soc. Agric. Stat. 1997, 50, 150–155. [Google Scholar]
- Singh, H.; Karpe, N. On the estimation of ratio and product of two population means using supplementary information in presence of measurement errors. Statistica 2009, 69, 27–47. [Google Scholar] [CrossRef]
- Singh, H.; Karpe, N. Estimation of mean, ratio and product using auxiliary information in the presence of measurement errors in sample surveys. J. Stat. Theory Pract. 2010, 4, 111–136. [Google Scholar] [CrossRef]
- Wang, L. A simple adjustment for measurement errors in some dependent variable models. Stat. Probab. Lett. 2002, 58, 427–433. [Google Scholar] [CrossRef]
- Andersson, P.G.; Särndal, C.E. Calibration for nonresponse treatment: In one or two steps. Stat. J. IAOS 2016, 32, 375–381. [Google Scholar] [CrossRef]
- Chaudhary, M.K.; Kumar, A. Estimating the population mean in stratified random sampling using two-phase sampling in the presence of non-response. World Appl. Sci. J. 2015, 33, 874–882. [Google Scholar] [CrossRef]
- Chaudhary, M.K.; Ray, B.K. Treating the Problem of Non-Response in Stratified Random Sampling Under Calibration Approach. Commun. Stat.-Simul. Comput. 2024, 54, 4472–4480. [Google Scholar] [CrossRef]
- Chaudhary, M.K.; Ray, B.K.; Vishwakarma, G.K.; Kadilar, C. A calibration-based approach on estimation of mean of a stratified population in the presence of non-response. Commun. Stat.-Theory Methods 2024, 53, 7054–7068. [Google Scholar] [CrossRef]
- Chaudhary, M.K.; Singh, R.; Shukla, R.K.; Kumar, M.; Smarandache, F. A family of estimators for estimating population mean in stratified sampling under non-response. Pak. J. Stat. Oper. Res. 2009, 5, 47–54. [Google Scholar] [CrossRef][Green Version]
- Dykes, L.; Singh, S.; A Sedory, S.; Louis, V. Calibrated estimators of population mean for a mail survey design. Commun. Stat.-Theory Methods 2015, 44, 3403–3427. [Google Scholar] [CrossRef]
- Hansen, M.H.; Hurwitz, W.N. The problem of non-response in sample surveys. J. Am. Stat. Assoc. 1946, 41, 517–529. [Google Scholar] [CrossRef]
- Khare, B.B.; Sinha, R.R. Estimation of finite population ratio using two-phase sampling scheme in the presence of non-response. Aligarh J. Stat. 2004, 24, 43–56. [Google Scholar]
- Okafor, F.C.; Lee, H. Double sampling for ratio and regression estimation with sub sampling the non-respondent. Surv. Methodol. 2000, 26, 183–188. [Google Scholar]
- Rao, P.S.R.S. Ratio estimation with sub-sampling the non-respondents. Surv. Methodol. 1986, 12, 217–230. [Google Scholar]
- Tabasum, R.; Khan, I.A. Double sampling ratio estimator for the population mean in presence of non-response. Assam Stat. Rev. 2006, 20, 73–83. [Google Scholar]
- Azeem, M.; Hanif, M. Joint influence of measurement error and non-response on estimation of population mean. Commun. Stat.-Theory Methods 2017, 46, 1679–1693. [Google Scholar] [CrossRef]
- Chaudhary, M.K.; Vishwakarma, G.K. A general family of factor-type estimators of population mean in the presence of non-response and measurement errors. Int. J. Math. Stat. 2019, 20, 83–93. [Google Scholar]
- Chaudhary, M.K.; Vishwakarma, G.K. Estimation of finite population mean using two auxiliary variables in the presence of non-response and measurement errors. J. Stat. Appl. Probab. 2021, 10, 579–586. [Google Scholar] [CrossRef]
- Kumar, S. Improved estimation of population mean in presence of nonresponse and measurement error. J. Stat. Theory Pract. 2016, 10, 707–720. [Google Scholar] [CrossRef]
- Singh, R.; Bouza, C.; Mishra, M. Estimation in stratified random Sampling in the presence of errors. Rev. Oper. 2020, 41, 125–137. [Google Scholar]
- Reddy, M.K.; Rao, K.R.; Boiroju, N.K. Comparison of ratio estimators using Monte Carlo simulation. Int. J. Agric. Stat. Sci. 2010, 6, 517–527. [Google Scholar]
Table 1.
Particulars of population.
Table 1.
Particulars of population.
| Stratum No. | | | Distribution of Study Variable | Distribution of Auxiliary Variable | |
|---|
| I | 3000 | 526 | | | 0.89 |
| II | 4500 | 789 | | | 0.88 |
| III | 1200 | 211 | | | 0.85 |
| IV | 2700 | 474 | | | 0.87 |
Table 2.
AVAR/AMSE and PRE of estimators , , and .
Table 2.
AVAR/AMSE and PRE of estimators , , and .
| (in %) | | | | | | | |
|---|
| 2 | 10 | 0.102745 | 0.029351 | 0.018459 | 0.010394 | 350.0553 | 556.6057 | 988.4678 |
| 20 | 0.108292 | 0.034114 | 0.025782 | 0.011077 | 317.4438 | 420.0029 | 977.655 |
| 30 | 0.113531 | 0.038108 | 0.03597 | 0.012882 | 315.6305 | 297.9212 | 881.2957 |
| 40 | 0.117689 | 0.052610 | 0.037467 | 0.013235 | 314.1183 | 223.6457 | 889.214 |
| 3 | 10 | 0.103193 | 0.030059 | 0.018659 | 0.010506 | 343.3064 | 553.055 | 982.2718 |
| 20 | 0.115118 | 0.039727 | 0.026796 | 0.013967 | 289.7689 | 429.8067 | 824.2161 |
| 30 | 0.122605 | 0.045945 | 0.038118 | 0.014729 | 266.851 | 321.6481 | 832.4187 |
| 40 | 0.130534 | 0.053623 | 0.051708 | 0.017295 | 252.4422 | 243.5195 | 754.7496 |
| 4 | 10 | 0.106304 | 0.032059 | 0.019981 | 0.0122 | 331.5906 | 532.025 | 871.3259 |
| 20 | 0.118255 | 0.0411 | 0.028799 | 0.015376 | 287.7254 | 410.6241 | 769.0829 |
| 30 | 0.130514 | 0.038219 | 0.05482 | 0.017757 | 237.9387 | 341.4958 | 735.0149 |
| 40 | 0.131337 | 0.060333 | 0.056648 | 0.020196 | 217.6859 | 231.8475 | 650.3204 |
Table 3.
Description of population.
Table 3.
Description of population.
| Stratum No. | | | Distribution of Study Variable | Distribution of Auxiliary Variable |
|---|
| I | 3000 | 563 | | |
| II | 1500 | 281 | | |
| III | 750 | 141 | | |
| IV | 1850 | 346 | | |
| V | 900 | 169 | | |
Table 4.
AVAR/AMSE and PRE of estimators , , and .
Table 4.
AVAR/AMSE and PRE of estimators , , and .
| (in %) | | | | | | | |
|---|
| 2 | 10 | 0.046784 | 0.021893 | 0.014879 | 0.007855 | 214.6799 | 316.5576 | 595.6192 |
| 20 | 0.048505 | 0.023145 | 0.01669 | 0.008826 | 210.4741 | 290.6266 | 555.8719 |
| 30 | 0.048903 | 0.025269 | 0.019536 | 0.00909 | 193.6023 | 250.4525 | 543.9642 |
| 40 | 0.049923 | 0.027112 | 0.024421 | 0.011597 | 184.1373 | 204.4239 | 430.4638 |
| 3 | 10 | 0.050241 | 0.022215 | 0.015877 | 0.008942 | 228.2089 | 316.438 | 561.8874 |
| 20 | 0.050872 | 0.026484 | 0.018536 | 0.009114 | 192.3742 | 274.4542 | 564.3931 |
| 30 | 0.05472 | 0.028544 | 0.019685 | 0.009841 | 193.0554 | 279.3968 | 556.0156 |
| 40 | 0.056756 | 0.031071 | 0.026426 | 0.011683 | 182.7848 | 214.7699 | 485.8166 |
| 4 | 10 | 0.051982 | 0.024336 | 0.016787 | 0.009185 | 214.485 | 309.6545 | 566.5659 |
| 20 | 0.05259 | 0.029532 | 0.019888 | 0.009386 | 178.1353 | 264.4309 | 560.2934 |
| 30 | 0.055495 | 0.032645 | 0.021689 | 0.009988 | 169.9952 | 255.8705 | 556.181 |
| 40 | 0.064398 | 0.036373 | 0.027424 | 0.013587 | 178.0275 | 234.8218 | 477.1321 |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |