Abstract
This article presents a new set of estimators designed to estimate the finite population variance of a study variable in two-phase sampling. These estimators utilize the information about extreme values and ranks of an auxiliary variable. Through a first-order approximation, we investigate the properties of these estimators, including biases and mean squared errors (MSEs). Furthermore, a comprehensive simulation study is conducted to assess their performance and validate our theoretical insights. Results demonstrate that our proposed class of estimators performs better in terms of percent relative efficiency (PRE) across various simulation scenarios compared to existing estimators. In addition, in the application section, we utilize three data sets to further validate the performance of our proposed estimators against conventional unbiased variance estimators, ratio and regression estimators, as well as other existing methods.
MSC:
62D05
1. Introduction
In sampling theory, it is standard practice to incorporate auxiliary variables alongside the study variable to enhance design and improve the estimator efficiency by using their relationship. However, in many practical scenarios, information about auxiliary variables is not available before conducting a survey. In such cases, a two-phase sampling technique is preferred. Two-phase sampling, also known as double sampling, involves two distinct phases to select a sample from a population. Two-phase sampling is a cost-effective sampling scheme, so it plays an important role in sample surveys and is widely used when the auxiliary information is not available in advance. A brief review of two-phase sampling was first introduced by Neyman [1]. Following that, the topic was not investigated until the research conducted by Sukhatme [2]. In recent years, two-phase sampling has received much interest due to its abilities of screening variables at a low cost. Some studies on two-phase sampling include [3,4,5,6,7,8,9,10].
Variation is an inherent phenomenon of nature, and the estimation problem of finite population variance is a significant concern. Das [11] first discussed the use of auxiliary information to estimate population variance, which was extended by Isaki [12]. The accuracy of the estimators can be improved by carefully using auxiliary information. To estimate population variance, Bahl and Tuteja [13] suggested a few ratio- and product-type exponential estimators. Many scholars have provided a number of estimators for finite population variance, including [14,15,16,17,18,19,20,21,22].
There may be abnormal observation results in the sampling survey data when extreme values are part of the sample, which may lead to biased results. In this sense, some researchers have worked on extreme values and provided different kinds of transformations to estimate population characteristics. Through a linear transformation, Mohanty [23] provided two estimators given the known minimum and largest observations of the auxiliary variable. After that, these works were not examined any more until the work of Khan [24]. They used several finite population mean estimators and the idea of employing extreme values in them. Daraz et al. [25] enhanced the estimate of the finite population mean under extreme values by employing a stratified random sampling technique. Through transformations of extreme values, Daraz and Khan [26] proposed many efficient classes of estimators to estimate population variance. A recent study by Daraz et al. [27] looked into the characteristics of finite population variance and presented various types of estimators with minimal mean squared errors. By using the extreme values of the auxiliary variable, Daraz et al. [28] suggested double exponential ratio-type estimators to discuss the efficiency of the estimators for estimating population variance. In order to address the accuracy of the estimators through the linear transformation of extreme values and ranks of auxiliary variables, Daraz et al. [29] obtained a class of efficient estimators by utilizing the dual use of auxiliary variables under simple random sampling. For more details, see [30,31,32] and the references therein.
The significance of classical estimators tends to decrease in terms of the mean squared error when handling extreme values in a data set. The temptation to exclude such data from the sample may happen. Including these data in the process of determining population characteristics is important for properly addressing this difficulty. If the auxiliary variable and the study variable are related, the rankings of the auxiliary variable are associated with the study variable. Thus, these rankings can be employed as an effective way to increase the accuracy of the estimator. In this article, the extreme values of the auxiliary variable are kept in the data and are used as supplementary information to increase the accuracy of the proposed class of estimators. Inspired by [26,27,28,29], we use the transformations technique to provide a new class of estimators by utilizing the known information on the extreme values and the ranks of the auxiliary variable to estimate the finite population variance in two-phase sampling.
The following sections comprise this article: The concepts and notations are presented in Section 2. A number of existing estimators are included in Section 3. In Section 4, we describe our suggested class of estimators. The mathematical comparison is given in Section 5. To evaluate the theoretical results presented in Section 5, we simulate six distinct artificial populations using different probability distributions in Section 6. We also provide numerical examples in this section to validate our theoretical findings. In conclusion, the results are discussed along with suggestions for further research, as offered in Section 7.
2. Concepts and Notations
Let us consider a finite population of size N units. Let the unit values of the dependent (study) variable Y, the independent (auxiliary) variable X, and the corresponding ranks of the independent variable R be represented by , , and , respectively. The population variances for these variables are defined as
and
where
are the corresponding population means of and R. For these variables , the population coefficients of variation are as follows:
respectively. Additionally, we have knowledge about the population correlation coefficients that exist between Y and Y and and X and R, as follows:
where
and
are the population covariances, respectively.
In this paper, we provide a set of estimators to estimate the finite population variance of Y in the presence of the auxiliary variable X. The definition of the two-phase sampling scheme is
- A sample of size (ń < N) from the first phase is chosen in order to estimate the population variance .
- For the second phase, a sample size of (n < ń) is chosen in order to observe both y and x, respectively.
The biases and mean squared errors for various estimators can be derived by defining the following terms:
such that for .
where
Also,
where
Here, , and are the population coefficients of kurtosis.
3. Literature Review
In this section, our next step is to compare and contrast existing estimators for finite population variances with the proposed class of estimators.
The usual variance estimator of for population variance is given by
Isaki [12] suggested a ratio estimator for population variance which is given by
The bias and of are expressed as follows:
and
Watson [33] proposed the linear regression estimator , which is defined as
where is the sample regression coefficient.
The of the estimator is expressed as follows:
where .
Bahal and Tuteja [13] introduced an exponential ratio type estimator which is defined as follows:
The bias and of are expressed as follows:
and
A ratio-type estimator called developed by Upadhyaya and Singh [14], which employs the kurtosis of an auxiliary variable, is expressed as follows:
The bias and of are expressed as follows:
and
where .
Kadilar and Cingi [16] suggested that some ratio estimators are defined as
and
The bias and of are expressed as follows:
and
where and .
4. Proposed Class of Estimators
This section presents an improved class of estimators inspired by prior works [26,27,28,29]. These estimators employ minimum and maximum values of auxiliary variables, along with their ranks, in two-phase sampling to estimate the variance of the finite population. The suggested estimator is defined as
where are known constants values either (1 or 2), and are the parameters of the auxiliary variables. The minimum and maximum observations of the independent (auxiliary) variable are represented by , while the minimum and maximum observations of the ranks of the independent variable are represent by . The known values of are given in Table 1, , and . We can introduce the different classes of the recommended estimator from (18), which are listed in Table 1.
Table 1.
Some classes of the proposed estimator.
Now, we discuss the properties of the new proposed class of estimators; we rewrite (18) in terms of errors to obtain the bias and the of , i.e.,
where
and .
Applying the Taylor series to the first approximation order, we obtain
Using Equation (20), the bias of is given by
After the simple simplifications, we get
where .
In order to obtain a first-order approximation of the , we squared both sides of Equation (20) and then applied the expected value, which is given by the following equation:
After the simplification, we get
5. Mathematical Comparison
This section covers the comparisons between the suggested estimator and several existing estimators, such as , and .
For that is,
Similarly, that is,
If Conditions (23) or (24) hold true, the suggested estimator demonstrates a higher efficiency in comparison to .
For that is,
Similarly, that is,
If Conditions (25) or (26) hold true, the suggested estimator demonstrates a higher efficiency in comparison to .
For that is,
Similarly, that is,
If Conditions (27) or (28) hold true, the suggested estimator demonstrates a higher efficiency in comparison to .
For that is,
Similarly, that is,
If Conditions (29) or (30) hold true, the suggested estimator demonstrates a higher efficiency in comparison to .
For that is,
Similarly, that is,
If Conditions (31) or (32) hold true, the suggested estimator demonstrates a higher efficiency in comparison to .
For that is,
Similarly, that is,
6. Numerical Comparison
In this section, we assess the effectiveness of the proposed class of estimators as compared to other existing estimators through the percent relative efficiency (PREs). This evaluation is carried out using both simulated data sets and three distinct real data sets.
6.1. Simulation Study
In order to validate the theoretical results presented in Section 5, we employ the methodology suggested by [27,29] to conduct a simulation analysis. The objective is to assess the performance of the proposed class of estimators based on the known minimum and maximum values of the auxiliary variable, as well as its ranks within the framework of two-phase sampling. By employing the following probability distributions, it is possible to carefully produce six different populations for the auxiliary variable X.
- Population 1:
- Population 2:
- Population 3:
- Population 4:
- Population 5:
- Population 6:
Subsequently, the dependent variable Y is measured as
where
signifies the correlation coefficient between the dependent and independent variables, while
indicates the error term. The selected value of might reflect the quality and consistency of the data. A correlation coefficient suggests that there is relatively low noise and the relationship between x and y is consistent across the data set.
To calculate the percent relative efficiencies (PREs), we adopted the following procedures in R-Software (latest v. 4.4.0).
- Step 1:
- Firstly, we make use of particular probability distributions to obtain a population of 1500.
- Step 2:
- We apply the simple random sampling without replacement (SRSWOR) approach to obtain a first phase sample of size from a population of size N.
- Step 3:
- Using the SRSWOR approach again, we obtain the second phase sample size n from the first phase sample.
- Step 4:
- We calculate the population total and the minimum and maximum values of the auxiliary variable from the above steps.
- Step 5:
- For each population, we generate samples of different sizes using SRSWOR.
- Step 6:
- For each sample size, we find the values of all the estimators discussed in this article.
- Step 7:
- We executed Steps 5 and 6 a total of 65,000 times. The outcomes for artificial populations are detailed in Table 2, and the results for real data sets are summarized in Table 6.
Table 2. Percent relative efficiency of the estimators based on artificial populations.
Finally, we use the following formulas to obtain the MSEs and PREs of each estimator across all replications:
and
where i is one of
6.2. Numerical Examples
We compared the percent relative efficiencies of different estimators using three real data sets in order to assess the performances of the proposed estimators. The descriptions of the data sets are defined below, while summary statistics of the data sets are given in Table 3, Table 4 and Table 5.
Table 3.
Summary statistics for Data 1.
Table 4.
Summary statistics for Data 2.
Table 5.
Summary statistics for Data 3.
- Data 1. This data set was selected from Bureau of Statistics page 226 [34] and was conducted in Pakistan during the year 2012, which comprised 33 divisions. The data set can be downloaded from the Pakistan Bureau of Statistics web page via the following link: https://www.pbs.gov.pk/content/microdata (accessed on 5 August 2024).Y: Departmental employment levels in 2012.X: Number of factories the departments registered in 2012.R: Ranks the number of factories the departments registered in 2012.
- Data 2. This data set was selected from Cochran page 23 [35], comprising 33 units of food cost and weekly income of families.Y: Food expenses related to the families’ employment.X: Families’ weekly income.R: Ranks the families’ weekly income.
- Data 3. Another data set was selected from Bureau of Statistics page 126 [34], conducted in Pakistan during the year 2012, which comprised 33 divisions. The data set can be downloaded from the Pakistan Bureau of Statistics web page via the following link: https://www.pbs.gov.pk/content/microdata (accessed on 5 August 2024).Y: The total enrollment of students in 2012.X: Government elementary and secondary schools in 2012.R: Ranks the government elementary and secondary schools in 2012.
Finally, we use the following formula to calculate the percent relative efficiencies (PREs):
where l is one of
We used simulation studies and three real data sets in order to determine the performance of the proposed class of estimators. The criterion was used for the comparisons between different estimators. The values of the proposed and existing estimators obtained from the simulation study are given in Table 2, while the outcomes for real data sets are presented in Table 6, respectively. The following are some general findings:
Table 6.
Percent relative efficiency using empirical data sets.
7. Conclusions
In order to estimate finite population variance, this study presented a new family of efficient estimators. These estimators considered the extreme values of the auxiliary variable, alongside its ranks. The suggested class of estimators is shown to be more efficient under the theoretical assumptions given in Section 5, enabling a comparative analysis compared to the existing ones. To investigate these constraints, we conducted a simulation study and examined multiple empirical data sets. The results show that the proposed class of estimators regularly outperforms the existing ones in terms of . The results are shown in Table 2. The results presented in Table 6 provide additional evidence for this conclusion, which is consistent with the theoretical understandings presented in Section 5. We conclude that the suggested estimators are more efficient than the other estimators taken into consideration based on both the simulation and empirical results. Among these, is particularly preferred due to its minimal .
However, we examined the characteristics of the proposed efficient class of estimators within a two-phase sampling framework. Additionally, it is feasible to develop new estimators using the two-phase stratified sampling method, and our findings may assist in identifying more efficient estimators with lower s. It is also a good topic for future research work.
Author Contributions
Conceptualization, U.D.; software, U.D.; validation, U.D., M.A.A. and O.A.; formal analysis, U.D., M.A.A., O.A. and A.S.A.N.; investigation, U.D.; resources, U.D., M.A.A. and O.A.; data curation, U.D. and O.A.; writing—original draft, U.D.; writing—review and editing, U.D.; visualization, U.D., M.A.A. and O.A.; supervision, U.D. and M.A.A.; project administration, U.D., M.A.A., O.A. and A.S.A.N.; funding acquisition, U.D., M.A.A. and A.S.A.N. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. KFU241734].
Data Availability Statement
The real data are secondary, and their sources are given in the data section, while the simulated data have been generated using R software (latest v. 4.4.0).
Conflicts of Interest
The authors declare no conflict of interest.
References
- Neyman, J. Contribution to the theory of sampling human population. J. Am. Stat. Assoc. 1938, 33, 101–116. [Google Scholar] [CrossRef]
- Sukhatme, B.V. Some ratio-type estimators in two-phase sampling. J. Am. Stat. Assoc. 1960, 57, 628–632. [Google Scholar] [CrossRef]
- Erinola, A.Y.; Singh, R.V.K.; Audu, A.; James, T. Modified class of estimator for finite population mean under two-phase sampling using regression estimation approach. Asian. J. Prob. Stat. 2021, 4, 52–64. [Google Scholar] [CrossRef]
- Jabbar, M.; Javid, Z.; Zaheer, A.; Zainab, R. Ratio type exponential estimator for the estimation of finite population variance under two-stage sampling. Res. J. Appl. Sci. Eng. Technol. 2014, 7, 4095–4099. [Google Scholar] [CrossRef]
- Qureshi, M.N.; Tariq, M.U.; Hanif, M. Memory-type ratio and product estimators for population variance using exponentially weighted moving averages for time-scaled surveys. Commun. Stat. Simul. Comput. 2024, 53, 1484–1493. [Google Scholar] [CrossRef]
- Sanaullah, A.; Hanif, M.; Asghar, A. Generalized exponential estimators for population variance under two-phase sampling. Int. J. Appl. Comput. Math. 2016, 2, 75–84. [Google Scholar] [CrossRef][Green Version]
- Singh, H.P.; Singh, S.; Kim, J.M. Efficient use of auxiliary variables in estimating finite population variance in two-phase sampling. Int. Commun. Stat. Appl. Methods 2010, 17, 165–181. [Google Scholar] [CrossRef]
- Khan, M. Improvement in estimating the finite population mean under maximum and minimum values in double sampling scheme. J. Stat. Appl. Probab. Lett. 2015, 2, 115–121. [Google Scholar]
- Vishwakarma, G.K.; Zeeshan, S.M. Generalized ratio-cum-product estimator for finite population mean under two-phase sampling scheme. J. Mod. Appl. Stat. Meth. 2020, 19, 1–16. [Google Scholar] [CrossRef]
- Zaman, T.; Kadilar, C. New class of exponential estimators for finite population mean in two-phase sampling. Commun. Stat. Theory Methods 2021, 50, 874–889. [Google Scholar] [CrossRef]
- Das, A.K.; Tripathi, T.P. Use of auxiliary information in estimating the finite population variance. Sankhya Indian J. Stat. Ser. C 1978, 40, 39–148. [Google Scholar]
- Isaki, C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
- Bahl, S.; Tuteja, R. Ratio and product type exponential estimators. J. Inf. Optim. Sci. 1991, 12, 159–164. [Google Scholar] [CrossRef]
- Upadhyaya, L.; Singh, H. An estimator for population variance that utilizes the kurtosis of an auxiliary variable in sample surveys. Vikram Math. J. 1999, 19, 14–17. [Google Scholar]
- Dubey, V.; Sharma, H. On estimating population variance using auxiliary information. Stat. Transit. New Ser. 2008, 9, 7–18. [Google Scholar]
- Kadilar, C.; Cingi, H. Ratio estimators for the population variance in simple and stratified random sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
- Singh, H.; Chandra, P. An alternative to ratio estimator of the population variance in sample surveys. J. Transp. Stat. 2008, 9, 89–103. [Google Scholar]
- Shabbir, J.; Gupta, S. Some estimators of finite population variance of stratified sample mean. Commun. Stat. Theory Methods 2010, 39, 3001–3008. [Google Scholar] [CrossRef]
- Singh, H.P.; Solanki, R.S. A new procedure for variance estimation in simple random sampling using auxiliary information. J. Stat. Pap. 2013, 54, 479–497. [Google Scholar] [CrossRef]
- Yadav, S.K.; Kadilar, C.; Shabbir, J.; Gupta, S. Improved family of estimators of population variance in simple random sampling. J. Stat. Theory Pract. 2015, 9, 219–226. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S. Using rank of the auxiliary variable in estimating variance of the stratified sample mean. Int. J. Comput. Theor. Stat. 2019, 6. [Google Scholar] [CrossRef]
- Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat. Theory Methods 2023, 52, 2610–2624. [Google Scholar] [CrossRef]
- Mohanty, S.; Sahoo, J. A note on improving the ratio method of estimation through linear transformation using certain known population parameters. Sankhyā Indian J. Stat. Ser. 1995, 57, 93–102. [Google Scholar]
- Khan, M.; Shabbir, J. Some improved ratio, product, and regression estimators of finite population mean when using minimum and maximum values. Sci. World J. 2013, 2013, 431868. [Google Scholar] [CrossRef]
- Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 20. [Google Scholar] [CrossRef]
- Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1899402. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon 2024, 10, e33402. [Google Scholar] [CrossRef]
- Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry 2024, 16, 957. [Google Scholar] [CrossRef]
- Cekim, H.O.; Cingi, H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacet. J. Math. Stat. 2017, 46, 685–694. [Google Scholar]
- Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Walia, G.S.; Kaur, H.; Sharma, M. Ratio type estimator of population mean through efficient linear transformation. Am. J. Math. Stat. 2015, 5, 144–149. [Google Scholar]
- Watson, D.J. The estimation of leaf area in field crops. J. Agric. Sci. 1937, 27, 474–483. [Google Scholar] [CrossRef]
- Bureau of Statistics. Punjab Development Statistics Government of the Punjab, Lahore, Pakistan; Bureau of Statistics: Lahore, Pakistan, 2013.
- Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).