Abstract
This article suggests an improved class of efficient estimators that use various transformations to estimate the finite population variance of the study variable. These estimators are particularly helpful in situations where we know about the minimum and maximum values of the auxiliary variable, and the ranks of the auxiliary variable are associated with the study variable. Consequently, these rankings can be applied as an effective tool to improve the accuracy of the estimator. A first-order approximation is used to investigate the properties of the proposed class of estimators, such as bias and mean squared error () under simple random sampling. A simulation study carried out in order to measure the performance and verify the theoretical results. The suggested class of estimators has a greater percent relative efficiency () than the other existing estimators in all of the simulated situations, according to the results. Three symmetric and asymmetric datasets are examined in the application section in order to show the superior performance of the proposed class of estimators over the existing estimators.
1. Introduction
Survey sampling collects accurate data on population characteristics to enhance the estimation performance while reducing the costs, time, and human resources. In many populations, a few extreme values exist, and estimating the unknown population characteristics without considering such information can be quite sensitive. Consequently, outcomes may be overstated or underestimated in certain cases. Therefore, the accuracy of classical estimators usually decreases in terms of mean square error () when extreme values in the dataset are present. Such information might be tempted to be eliminated from the sample. In order to adequately address this problem, it is important to include this information in the process of estimating population characteristics. Given the known smallest and largest observations of the auxiliary variable, ref. [1] offered two estimators by transforming them linearly. Such works were not studied further after that, until the works of ref. [2]. They applied the concept of using extreme values to a variety of finite population mean estimators. Using a stratified random sampling method, ref. [3] improved the estimate of the finite population mean under extreme values. For more details, see refs. [4,5,6,7] and the references therein.
The estimation problem of finite population variance is an important issue, and controlling variability in applications is challenging. This problem arises in biological and agricultural research, giving a signal that the intended results are unexpected. By carefully using supplementary information, the accuracy of the estimators can be increased. Ref. [8] was the first to discuss the utilization of auxiliary information in the calculation of population variance. Ref. [9] proposed some ratio and product type exponential estimators to estimate the population variance. To estimate the population variance, ref. [10] suggested different efficient classes of the estimator through extreme values transformations. Recently, ref. [11] used the concept of extreme values to introduce new classes of estimators for estimating the population variance with minimum mean squared errors. Ref. [12] provided some new classes of difference-cum-ratio-type exponential estimators for a finite population variance in stratified random sampling by utilizing the known information about extreme values. A variety of researchers have suggested many different kinds of estimators for calculating the population variance, including refs. [13,14,15,16,17,18,19,20,21,22,23].
The rankings of the auxiliary variable are associated with the study variable when there is a relationship between the two variables. As a result, these rankings can be utilized as a valuable tool to enhance the accuracy of the estimator. This article retains the extreme values of the auxiliary variable in the data and utilizes them as auxiliary information. As discussed by refs. [10,11], this article aims to suggest an effective class of estimators for estimating the variance of a finite population. These estimators utilize the available information on the extreme values of an auxiliary variable, as well as the ranks of the auxiliary variable under simple random sampling, in order to enhance the accuracy.
This article is divided into the following sections. Section 2 presents the concepts and notations. This section also includes information on certain existing estimators. In Section 3, we explain our proposed class of estimators. Section 4 provides the mathematical comparison. In Section 5, we simulate six different artificial populations using various probability distributions to assess the theoretical findings described in Section 4. This section also includes numerical examples to support our theoretical results. Finally, Section 6 discusses the results, as well as suggestions for future studies.
2. Concepts and Notations
Consider a finite population with size N units, denoted by . Let , and represent the ith unit values of the study variable Y, the auxiliary variable X, and the ranks of the auxiliary variable R, respectively. For these variables, we define the population variances
and
where
and
are the population means of and R, respectively.
The population coefficients of variation for and R are defined as
and
respectively. Furthermore, we know that the population correlation coefficients between Y and Y and and X and R are
and
where
and
are the population co-variances, respectively.
In order to calculate the unknown population parameter , we adopt simple random sampling without replacement to pick a random sample of n units from the population. Let us define the sample variances
and
where
and
are the sample means of and R, respectively. Additionally, the sample coefficients of variation are defined as
and
where and denote the sample standard deviations, respectively.
For each estimator, we define the following terms in order to obtain the biases and mean square errors:
, and such that for i = 0, 1, 2.
where , , and .
Also
where represents the population central moment with orders , and denotes the standard deviation of
The population coefficients of kurtosis are defined as
where
respectively. Here , and .
Next, we go over different existing estimators of finite population variances and compare them with the proposed class of estimators.
For population variance, the usual variance estimator of is provided by
Ref. [8] suggested a ratio estimator for population variance , which is given by
The following are the formulas for the bias and of which can be found in ref. [8]
and
The linear regression estimator proposed by ref. [24], is defined as
where is the sample regression coefficient.
The following is the formula for the of which can be found in ref. [24]
where .
Ref. [9] suggested an exponential ratio-type estimator , which is expressed as
The following are the formulas for the bias and of which can be found in ref. [9]
and
In simple random sampling, ref. [20] proposed a ratio-type estimator by utilizing the kurtosis of an auxiliary variable.
The following are the formulas for the bias and of which can be found in ref. [20]
and
where .
Ref. [15] proposed certain ratio estimators as follows
and
The following are the formulas for the bias and of which can be found in ref. [15]
and
where .
3. Proposed Estimator
This section, which is inspired by refs. [10,11], presents a new class of effective estimators that estimate the finite population variance by using the largest and smallest values and rankings of the auxiliary variable under simple random sampling.
where represent known constant values, whereas represent auxiliary variable parameters. The largest and smallest values of the auxiliary variable are denoted by , while the largest and smallest values of the ranks of the auxiliary variable are denoted by . Table 1 shows the known values for , while and . Table 1 lists the various classes of the proposed estimator derived from (53).
Table 1.
Some classes of the proposed estimator.
Properties of the Proposed Estimator
The bias and of the proposed estimator are now obtained by rewriting (53) in terms of errors, i.e.,
where and .
Using the Taylor series under the first order of approximation, we obtain
Using (55), the bias of is given by
We derived an by squaring both sides of (55) and taking the expectation. The equation is as follows
4. Mathematical Comparison
The comparison of the suggested class of estimators , with other existing estimators , is covered in this section.
5. Numerical Comparison
This section compares the mean squared errors MSEs of several estimators, including the proposed class of estimators, using both simulated and actual datasets. The purpose is to evaluate the performance of these estimators. In addition, we compute the percent relative efficiency (PREs) of both the proposed class of estimators and other existing estimators. For more details see Appendix A and Appendix B.
5.1. Simulation Study
We use the approach outlined in refs. [10,11] to perform a simulation study in order to validate the theoretical results reported in Section 4. Using the probability distributions listed below, it is possible to artificially generate the auxiliary variable X into six different populations:
- Population 1:
- Population 2: ,
- Population 3:
- Population 4:
- Population 5:
- Population 6:
The variable of interest, Y, is calculated using the following formula:
where the error term is and the correlation coefficient between the target and research variables is
In order to calculate the mean squared errors () and percent relative efficiencies () of the proposed class of estimators and other existing estimators, we performed the following procedures in R software:
- Step 1: A population of 1000 observations is initially generated by employing the above probability distributions.
- Step 2: We obtain the population total from Step 1 along with the smallest and largest values of the supplementary variable.
- Step 3: We use SRSWOR to obtain different sizes of samples for each population.
- Step 4: For each sample size, calculate the values of all the estimators discussed in this article.
- Step 5: subsequently 80,000 repetitions of Steps 3 and 4, Table 2 and Table 3 present the outcomes of the artificial populations, while Table 4 and Table 5 present a summary of the real datasets.
Table 2. MSEs of all the estimators using simulated data.
Table 3. PREs of all the estimators using simulated data.
Table 4. MSEs using empirical datasets.
Table 5. PREs using empirical datasets.
Finally, to obtain MSE and PRE for each estimator over all of the replications, we apply the following formulas:
and
where
5.2. Numerical Examples
We evaluated the suggested estimator’s performance by comparing the mean squared errors MSEs and PREs among various estimators using three real-life datasets. The following lists the datasets together with summar y statistics:
Data 1. [Source: Ref. [25], p. 135]
Y: Enrollment of students in 2012,
X: The total schools in 2012,
R: Ranks the total schools in 2012.
The summary statistics are as follows:
Data 2. [Source: Ref. [25], p. 226]
Y: Total number of workers in 2012,
X: Total number of registered factories in 2012,
R: Ranks the total number of registered factories in 2012.
The summary statistics are as follows:
Data 3. (Source: Ref. [26], p. 24)
Y: Food costs associated with the family’s job,
X: The weekly earnings of families,
R: Ranks the weekly earnings of families.
The summary statistics are as follows:
To find out how well the suggested class of estimators performed, we employed three real datasets and simulation tests. For comparing various estimators, the criteria has been adopted. According to the simulation investigation, the and values of the suggested and existing estimators can be found in Table 2 and Table 3, respectively. Table 4 and Table 5 demonstrate the results obtained for the actual datasets. Here are some general findings that we found:
6. Conclusions
We introduced a class of effective estimators for calculating the finite population variance in this article. These estimators use the auxiliary variable’s known minimum and maximum values, as well as its ranks. In Section 4, we discussed theoretical conditions that illustrate the greater efficiency of the suggested estimators in order to compare their qualities with those of existing estimators. We performed a simulation study and examined various empirical datasets in order to validate these conditions. According to Table 3, the suggested estimators consistently perform better in terms of than existing estimators. The theoretical conclusions in Section 4 are further confirmed by the empirical data shown in Table 5. The simulation and empirical data lead us to conclude that the suggested estimators are more efficient than the other estimators under consideration. As has the lowest among these suggested estimators, it is particularly preferred.
We investigated the characteristics of the suggested efficient class of estimators using a simple random sampling technique. Our findings are useful for identifying more efficient estimators with low for stratified random sampling. This topic is useful for future research.
Author Contributions
Methodology, U.D.; Software, U.D.; Validation, U.D. and O.A.; Formal analysis, U.D., M.A.A. and O.A.; Investigation, U.D., M.A.A. and O.A.; Resources, U.D. and O.A.; Data curation, U.D., M.A.A. and O.A.; Writing—original draft, U.D.; Writing—review and editing, U.D.; Visualization, U.D.; Supervision, M.A.A.; Project administration, U.D., M.A.A. and O.A.; Funding acquisition, M.A.A. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. KFU241416].
Data Availability Statement
Data are contained within the article.
Acknowledgments
We would like to express our sincere gratitude to the editor and the anonymous reviewers for their valuable feedback and insightful suggestions, which greatly improved the quality of this manuscript.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Numerical Examples
Mean squared errors:
Percent relative efficiency:
Appendix B. Simulation Study
Now, we perform simulations
Mean squared errors:
Relative efficiency:
References
- Mohanty, S.; Sahoo, J. A note on improving the ratio method of estimation through linear transformation using certain known population parameters. Sankhyā Indian J. Stat. Ser. 1995, 57, 93–102. [Google Scholar]
- Khan, M.; Shabbir, J. Some improved ratio, product, and regression estimators of finite population mean when using minimum and maximum values. Sci. World J. 2013, 2013, 431868. [Google Scholar] [CrossRef] [PubMed]
- Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 20. [Google Scholar] [CrossRef]
- Cekim, H.O.; Cingi, H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacet. J. Math. Stat. 2017, 46, 685–694. [Google Scholar]
- Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Khan, M. Improvement in estimating the finite population mean under maximum and minimum values in double sampling scheme. J. Stat. Appl. Probab. Lett. 2015, 2, 115–121. [Google Scholar]
- Walia, G.S.; Kaur, H.; Sharma, M. Ratio type estimator of population mean through efficient linear transformation. Am. J. Math. Stat. 2015, 5, 144–149. [Google Scholar]
- Isaki, C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
- Bahl, S.; Tuteja, R. Ratio and product type exponential estimators. J. Inf. Optim. Sci. 1991, 12, 159–164. [Google Scholar] [CrossRef]
- Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1899402. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon 2024, 10, e33402. [Google Scholar] [CrossRef]
- Ahmad, S.; Al Mutairi, A.; Nassr, S.G.; Alsuhabi, H.; Kamal, M.; Rehman, M.U. A new approach for estimating variance of a population employing information obtained from a stratified random sampling. Heliyon 2023, 9, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Dubey, V.; Sharma, H. On estimating population variance using auxiliary information. Stat. Transit. New Ser. 2008, 9, 7–18. [Google Scholar]
- Kadilar, C.; Cingi, H. Ratio estimators for the population variance in simple and stratified random sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S. Some estimators of finite population variance of stratified sample mean. Commun. Stat. Theory Methods 2010, 39, 3001–3008. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S. Using rank of the auxiliary variable in estimating variance of the stratified sample mean. Int. J. Comput. Theor. Stat. 2019, 6, 207. [Google Scholar] [CrossRef]
- Singh, H.; Chandra, P. An alternative to ratio estimator of the population variance in sample surveys. J. Transp. Stat. 2008, 9, 89–103. [Google Scholar]
- Singh, H.P.; Solanki, R.S. A new procedure for variance estimation in simple random sampling using auxiliary information. J. Stat. Pap. 2013, 54, 479–497. [Google Scholar] [CrossRef]
- Upadhyaya, L.; Singh, H. An estimator for populationvariance that utilizes the kurtosis of an auxiliary variablein sample surveys. Vikram Math. J. 1999, 19, 14–17. [Google Scholar]
- Yadav, S.K.; Kadilar, C.; Shabbir, J.; Gupta, S. Improved family of estimators of population variance in simple random sampling. J. Stat. Theory Pract. 2015, 9, 219–226. [Google Scholar] [CrossRef]
- Yasmeen, U.; Noor-ul-Amin, M. Estimation of Finite Population Variance Under Stratified Sampling Technique. J. Reliab. Stat. Stud. 2021, 14, 565–584. [Google Scholar] [CrossRef]
- Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat. Theory Methods 2023, 52, 2610–2624. [Google Scholar] [CrossRef]
- Watson, D.J. The estimation of leaf area in field crops. J. Agric. Sci. 1937, 27, 474–483. [Google Scholar] [CrossRef]
- Bureau of Statistics. Punjab Development Statistics Government of the Punjab, Lahore, Pakistan; Bureau of Statistics: Lahore, Pakistan, 2013.
- Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).