Abstract
To estimate the finite population variance of the study variable, this paper proposes an improved class of efficient estimators using different transformations. When both the minimum and maximum values of the auxiliary variable are known and the ranks of the auxiliary variable are associated with the study variable, these estimators are particularly useful. Therefore, the precision of the estimators can be effectively improved through the utilization of these rankings. We examine the properties of the proposed class of estimators, including bias and mean squared error (MSE), using a first-order approximation through a stratified random sampling method. To determine the performances and validate the findings mathematically, a simulation study is carried out. Based on the results, the proposed class of estimators performs better in terms of the mean squared error and percent relative efficiency as compared to other estimators in all scenarios. Furthermore, in order to prove that the performances of the improved class of estimators are better than those of the existing estimators, three data sets are examined in the application section.
Keywords:
auxiliary information; study variable; minimum and maximum values; ranks; mean squared error; percent relative efficiency MSC:
62D05
1. Introduction
In order to optimize the performance of the estimators under investigation while minimizing costs, time, and human resources, survey sampling aims to gather accurate information regarding various characteristics of the population. In many populations, a few extreme values exist, and estimating unknown population characteristics without considering such information can be quite sensitive. Consequently, outcomes may be overstated or underestimated in certain cases. Therefore, the accuracy of classical estimators usually decreases in terms of mean square error when extreme values in the data set are present. Such information might be tempting to eliminate from the sample. In order to adequately address this challenge, it is important to include this information in the process of estimating population characteristics. Given the known smallest and largest observations of the auxiliary variable, Mohanty [1] offered two estimators by transforming them linearly. Such works were not studied further after that, until the works of Khan [2]. They applied the concept of using extreme values to a variety of finite population mean estimators. Daraz et al. [3] improved the estimation of the finite population mean under the influence of extreme values by applying a stratified random sampling technique. For more details, see [4,5,6,7] and the references therein.
The estimation problem of finite population variance is an important issue, and controlling variability in applications is challenging. This problem arises in biological and agricultural research, giving a signal that the intended results are unexpected. By carefully using supplementary information, the accuracy of the estimators can be increased. The use of auxiliary information to estimate population variance was first discussed by Das [8], and later expanded further by Isaki [9]. Different product and ratio-type exponential estimators were proposed by Bahl and Tuteja [10] to estimate population variance. Through transformations of extreme values, Daraz and Khan [11] proposed many efficient classes of estimators to estimate the population variance. Recently, using the concept of extreme values, Daraz et al. [12] proposed new classes of estimators for estimating population variance with least mean squared errors. Daraz et al. [13] proposed double exponential ratio-type estimators to discuss the effectiveness of the estimators for estimating the population variance by employing the extreme values of the auxiliary variable. Daraz et al. [14] used the dual use of auxiliary variables under simple random sampling to obtain a class of efficient estimators, addressing the accuracy of the estimators through the linear transformation of extreme values and rankings of auxiliary variables. A variety of researchers have suggested many different kinds of estimators for calculating the finite population variance, including [15,16,17,18,19,20,21,22,23,24,25].
The objective of this article is to effectively utilize the extreme values of the auxiliary variable and the ranks of the auxiliary variable in the estimation process for estimating the finite population variance. Additionally, the objective is to discuss the effectiveness and accuracy of the estimators through various transformations. The rankings of the auxiliary variable are associated with the study variable when there is a relationship between the two variables. As a result, these rankings can be utilized as a valuable tool to enhance the accuracy of the estimator. As discussed in [12,14], we introduce a new class of estimators in this article for estimating the finite population variance utilizing the known information on the extreme values of the auxiliary variable and the ranks of the auxiliary variable under a simple random sampling for further improvement.
This article is divided into the following sections. Section 2 presents the concepts and notations. Some existing estimators are covered in Section 3. In Section 4, we provide an in-depth explanation of our proposed class of estimators. Section 5 gives the mathematical comparison. Section 6 includes a simulation study to produce six distinct artificial populations by utilizing different probability distributions to investigate the theoretical results discussed in Section 5. Some numerical examples are also included in this part to illustrate our theoretical conclusions. Finally, Section 7 includes some conclusions and suggestions for further research.
2. Concepts and Notations
Consider a finite population of size N units, denoted by . This population is divided into L strata, each of which is with the property that . Let , , and be the values of the study variable the auxiliary variable and the ranks of the auxiliary variable R in the stratum of the unit, respectively. Let
and
represent the population means of the study variable and auxiliary variable as well as the ranks of the auxiliary variable in the stratum that correspond to the population means
and
respectively, where the known stratum weight is denoted by .
For these variables, we define the population variances in the stratum as
where , and are defined above. The population coefficients of variations in the stratum are defined as
and
respectively. Further, let
and
be the population co-variances between , and in the stratum, respectively.
Simple random sampling without replacement is used to select a sample of size from the stratum. Let
and
be the sample means of the study variable and auxiliary variable as well as the ranks of the auxiliary variable in the stratum. The sample variances for these variables are defined as
and
respectively. Additionally, the sample coefficients of variation of and R in the stratum are defined as
and
respectively.
3. Existing Estimators
In this section, we discuss some existing estimators of finite population variances and compare them with the proposed class of estimators. To derive the biases and mean square errors for various estimators, we define the following terms:
In stratified random sampling, the variance of the usual estimator is defined as follows:
The unbiased estimator of is defined as
The usual variance estimator of for population variance is given by
Isaki [9] suggested a ratio estimator for population variance , which is given by
The bias and of are expressed as follows:
and
The linear regression estimator proposed by Watson [26] is defined as
where is the sample regression coefficient.
- The of the estimator is expressed as follows:
For population variance under stratified random sampling, Bahal and Tuteja [10] introduced an exponential ratio-type estimator , which is defined as
The bias and of are expressed as follows:
and
In stratified random sampling, Upadhyaya and Singh [21] proposed a ratio-type estimator by utilizing the kurtosis of an auxiliary variable.
The bias and of are expressed as follows:
and
where .
Kadilar and Cingi [16] proposed certain ratio estimators as
and
The bias and of are expressed as follows:
and
where and .
4. Proposed Estimator
Motivated by [12,14], this section introduces an improved class of efficient estimators that use the extreme values of the auxiliary variable and rankings based on stratified random sampling to estimate the finite population variance.
where represent known constant values, whereas represent auxiliary variable parameters. Table 1 shows the known values for , and . The largest and smallest values of the auxiliary variable in the stratum are denoted by and , while the largest and smallest values of the ranks of the auxiliary variable are denoted by and . Table 2 presents the various classes of the proposed estimator derived from (18) and Table 1.
Table 1.
Different parameters of the auxiliary variables.
Table 2.
Some classes of the proposed estimator.
Properties of the Proposed Estimator
By rewriting (18) in terms of errors, we can derive the bias and of the suggested estimator , i.e.,
where and .
Applying the Taylor series to the first approximation order, we obtain
Using (20), the bias of is given by
By squaring both sides of (20) and taking the expected value, we obtain a first-order approximate , which is given by the following equation:
5. Mathematical Comparison
In this section, we discuss the comparisons between the proposed class of estimators, , with other existing estimators, , and .
6. Numerical Comparison
The objective of this section is to analyze the performance of various estimators by comparing their mean squared errors (MSEs) using simulated and real datasets. We further calculate the percent relative efficiency (PER) of the proposed class of estimators in comparison to other existing estimators.
6.1. Simulation Study
The simulation research was conducted using the methodology described in [12,14] to verify the theoretical findings mentioned in Section 5. With the help of the following probability distributions, we can obtain six distinct populations for the auxiliary variable X using the following probability distributions:
- Population 1: .
- Population 2: .
- Population 3: .
- Population 5: .
- Population 5: .
- Population 6: .
The following formula is used to calculate the variable of interest Y:
where the error term is and the correlation coefficient between the study and auxiliary variables is
The quality and consistency of the data can possibly be indicated by the selected value of . A correlation coefficient of indicates that the relationship between x and y is consistent throughout the data set and that there is relatively low noise.
We used the R-Software (latest v. 4.4.0) to perform the following operations to estimate the mean squared errors (MSEs) and percent relative efficiencies (PREs) of the suggested class of estimators and other existing estimators:
- Step 1: In order to generate a population of size 1200, we first used the particular kinds of probability distributions defined above. Using stratified random sampling techniques, this population was then split into two strata in order to calculate different values for the existing and recommended class of estimators.
- Step 2: We derived the population total from Step 1, together with the minimum and maximum values of the auxiliary variable. Furthermore, we derived the maximum and minimum values of the ranks of the auxiliary variable.
- Step 3: Simple random sampling without replacement (SRSWOR) was used to generate different sample sizes for each population. The specified sample sizes are 20%, 30%, and 40% of the total population .
- Step 4: We found the and values for each sample size that is covered in this article.
- Step 5: Following 65,000 replications of Steps 3 and 4, the results for artificial populations are shown in Table 3 and Table 4, while the summary for real data sets are shown in Table 5 and Table 6.
Table 3. MSEs of different estimators using the artificial populations.
Table 4. PREs of different estimators using the artificial populations.
Table 5. MSEs using empirical datasets.
Table 6. PREs using empirical datasets. - Finally, we used the following formulae to obtain the MSE and PRE for each estimator over all replications:andwhere and
6.2. Numerical Examples
We investigated the mean squared errors (MSEs) and percent relative efficiencies (PREs) of the recommended class of estimators and other existing estimators using three real data sets to assess their performances. The descriptions of the data sets are defined below, while summary statistics of the data sets are given in Table 7, Table 8 and Table 9.
Table 7.
Summary statistics for data set-1.
Table 8.
Summary statistics for data set-2.
Table 9.
Summary statistics for data set-3.
- Data 1. This data set, which included different divisions, was chosen from Bureau of Statistics page 135 [27] and was collected in Pakistan in 2012. The data set can be downloaded by using the following URL from the Pakistan Bureau of Statistics website: https://www.pbs.gov.pk/\content/microdata, (accessed on 30 July 2024).
- Y: The total enrollment of students in 2012.
- X: Government elementary and secondary schools in 2012.
- R: Ranks of the government elementary and secondary schools in 2012.
- Two groups were generated from the data, and the summary statistics of data set-1 are given in Table 7.
- Group 1: the divisions of Sargodha, Gujranwala, Rawalpindi, and Lahore.
- Group 2: the divisions of Multan, Bahawalpur, Faisalabad, D.G Khan, and Sahiwal.
- Data 2. This data set, which includes different divisions, was chosen from Bureau of Statistics page 226 [27] and was collected in Pakistan in 2012. The data set can be downloaded by using the following URL from the Pakistan Bureau of Statistics website: https://www.pbs.gov.pk/\content/microdata, (accessed on 30 July 2024).
- Y: Departmental employment levels in 2012.
- X: Number of factories the departments registered in 2012.
- R: Ranks of the number of factories the departments registered in 2012.
- Two groups were generated from the data, and the summary statistics of data set-2 are given in Table 8.
- Group 1: the divisions of Sargodha, Gujranwala, Rawalpindi, and Lahore.
- Group 2: the divisions of Multan, Bahawalpur, Faisalabad, D.G Khan, and Sahiwal.
- Data 3. Another data point was selected from Cochran page 24 [28], comprising different units of food cost and weekly income of families.
- Y: Food expenses related to the families’ employment.
- X: Families’ weekly income.
- R: Ranks of the families weekly income.
- Two groups were generated from the data, and the summary statistics of data set-3 are given in Table 9.
Finally, we used the following formula to calculate the percent relative efficiency (PRE) for different data sets:
where K is one of or
We used simulation studies and three real data sets in order to determine the performance of the proposed class of estimators. The 4 criterion was used for the comparisons between different estimators. The and values of the proposed and existing estimators obtained from the simulation study are given in Table 3 and Table 4, respectively. The outcomes for real data sets are presented in Table 5 and Table 6. The following are some general findings:
7. Conclusions
In this article, we proposed a new class of efficient estimators based on different transformations for determining the finite population variance. We noticed that when both the minimum and maximum values of the auxiliary variable are known and the ranks of the auxiliary variable are associated with the study variable, these estimators are particularly useful. Therefore, the precision of the estimators can be effectively improved. To compare the enhancements of the recommended estimators with those of existing estimators, we investigated the theoretical conditions that show the better accuracy of the estimators in Section 5. To verify these conditions, we analyzed different empirical data sets and conducted a simulation study. According to Table 4, the recommended estimators consistently perform better in terms of than existing estimators. The theoretical conclusions in Section 5 are further confirmed by empirical data shown in Table 6. The simulation and empirical results lead us to conclude that the suggested estimators are more efficient than the other estimators under consideration. Since has the lowest among these recommended estimators, it is particularly preferred.
However, we studied certain characteristics of the recommended efficient class of estimators under stratified random sampling. Our results can be useful in identifying the more efficient estimators that can obtain the lowest . It is also conceivable to provide some novel estimators using the two-stage sampling technique. Further research on this area could be valuable.
Author Contributions
Conceptualization, U.D.; Methodology, M.A.A.; Software, M.A.A. and U.D.; Validation, U.D.; Formal analysis, M.A.A. and U.D.; Investigation, M.A.A. and U.D.; Resources, U.D.; Data curation, U.D.; Writing—original draft, M.A.A. and U.D.; Writing—review and editing, U.D.; Visualization, U.D.; Supervision, U.D.; Project administration, M.A.A. and U.D.; Funding acquisition, M.A.A. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No.KFU241799].
Data Availability Statement
The real data are secondary, and their sources are given in the data section, while the simulated data have been generated using R software (latest v. 4.4.0).
Conflicts of Interest
The authors declare no conflict of interest.
References
- Mohanty, S.; Sahoo, J. A note on improving the ratio method of estimation through linear transformation using certain known population parameters. Sankhya Indian J. Stat. Ser. B 1995, 57, 93–102. [Google Scholar]
- Khan, M.; Shabbir, J. Some improved ratio, product, and regression estimators of finite population mean when using minimum and maximum values. Sci. World J. 2013, 2013, 431868. [Google Scholar] [CrossRef] [PubMed]
- Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 20. [Google Scholar] [CrossRef]
- Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Cekim, H.O.; Cingi, H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacet. J. Math. Stat. 2017, 46, 685–694. [Google Scholar]
- Khan, M. Improvement in estimating the finite population mean under maximum and minimum values in double sampling scheme. J. Stat. Appl. Probab. Lett. 2015, 2, 1–7. [Google Scholar]
- Walia, G.S.; Kaur, H.; Sharma, M. Ratio type estimator of population mean through efficient linear transformation. Am. J. Math. Stat. 2015, 5, 144–149. [Google Scholar]
- Das, A.K.; Tripathi, T.P. Use of auxiliary information in estimating the finite population variance. Sankhya Indian J. Stat. Ser. C 1978, 40, 39–148. [Google Scholar]
- Isaki, C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
- Bahl, S.; Tuteja, R. Ratio and product type exponential estimators. J. Inf. Optim. Sci. 1991, 12, 159–164. [Google Scholar] [CrossRef]
- Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1899402. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon 2024, 10, e33402. [Google Scholar] [CrossRef] [PubMed]
- Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry 2024, 16, 957. [Google Scholar] [CrossRef]
- Dubey, V.; Sharma, H. On estimating population variance using auxiliary information. Stat. Transit. New Ser. 2008, 9, 7–18. [Google Scholar]
- Kadilar, C.; Cingi, H. Ratio estimators for the population variance in simple and stratified random sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S. Some estimators of finite population variance of stratified sample mean. Commun. Stat. Theory Methods 2010, 39, 3001–3008. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S. Using rank of the auxiliary variable in estimating variance of the stratified sample mean. Int. J. Comput. Theor. Stat. 2019, 6, 171–181. [Google Scholar] [CrossRef]
- Singh, H.; Chandra, P. An alternative to ratio estimator of the population variance in sample surveys. J. Transp. Stat. 2008, 9, 89–103. [Google Scholar]
- Singh, H.P.; Solanki, R.S. A new procedure for variance estimation in simple random sampling using auxiliary information. J. Stat. Pap. 2013, 54, 479–497. [Google Scholar] [CrossRef]
- Upadhyaya, L.; Singh, H. An estimator for populationvariance that utilizes the kurtosis of an auxiliary variablein sample surveys. Vikram Math. J. 1999, 19, 14–17. [Google Scholar]
- Yadav, S.K.; Kadilar, C.; Shabbir, J.; Gupta, S. Improved family of estimators of population variance in simple random sampling. J. Stat. Theory Pract. 2015, 9, 219–226. [Google Scholar] [CrossRef]
- Yasmeen, U.; Noor-ul-Amin, M. Estimation of Finite Population Variance Under Stratified Sampling Technique. J. Reliab. Stat. Stud. 2021, 14, 565–584. [Google Scholar] [CrossRef]
- Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat. Theory Methods 2023, 52, 2610–2624. [Google Scholar] [CrossRef]
- Daraz, U.; Alomair, M.A.; Albalawi, O.; Al Naim, A.S. New techniques for estimating finite population variance using ranks of Auxiliary Variable in Two-Stage Sampling. Mathematics 2024, 12, 2741. [Google Scholar] [CrossRef]
- Watson, D.J. The estimation of leaf area in field crops. J. Agric. Sci. 1937, 27, 474–483. [Google Scholar] [CrossRef]
- Bureau of Statistics. Punjab Development Statistics; Government of the Punjab: Lahore, Pakistan, 2013.
- Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).