Abstract
In this research, a logarithmic-type estimator was formulated for estimating the finite population variance in stratified random sampling. By ensuring that the sampling process is symmetrically conducted across the population, biases can be minimized, and the sample is more likely to be representative of the population as a whole. We conducted a comprehensive numerical study and simulation study to evaluate the performance of the proposed estimator. The mean squared error values were computed for both our proposed estimator and several existing ones, including the standard unbiased variance estimator, difference-type estimator, and other considered estimators. The results of the numerical study and simulation study demonstrated that the proposed log-type estimator outperforms the other considered estimators in terms of MSE and percentage relative efficiency. Graphical representations of the results are also provided to illustrate the efficiency of the proposed estimator. Based on the findings of this study, we conclude that the proposed log-type estimator is a valuable addition to the existing literature on variance estimation in stratified random sampling. It provides a more efficient and accurate estimate of the population variance, which can be beneficial for various statistical applications.
1. Introduction
In survey sampling, it is critical to ensure accurate and exact estimations of population parameters. When creating strata in stratified sampling, symmetry can be applied to ensure that each stratum is internally homogeneous and balanced. This involves dividing the population into groups that exhibit similar characteristics, creating symmetric groupings. For example, if you’re stratifying by income levels, you might aim to create strata with similar income distributions within each group, thereby achieving symmetry.
This study explores the intricate realm of variance estimation in stratified random sampling (STRS), a technique often used to improve survey efficiency by splitting the population into distinct strata. Understanding and resolving the sources of variation within and between strata is crucial for creating accurate estimates. This work also emphasizes the importance of log-type estimators in the context of variance estimation. In the STRS paradigm, the use of log-type estimators can play a critical role in contributing to more robust and accurate variance estimations.
The role of stratified sampling in estimating population variance is discussed by [1]. Later, ref. [2] presented more precise variance estimators for predicting population variance that leverage auxiliary information to reduce bias and improve estimate when compared to existing approaches, therefore supporting numerous sectors that rely on correct variance estimation. In simple random sampling (SRS), ref. [3] as well as, ref. [4] suggested a variance estimator. They compared the proficiency of the estimator to the traditional ratio estimator and ref. [2] estimator, and theoretical and numerical investigations were used to demonstrate the effectiveness of the suggested estimator. Later, ref. [5] extended their research to variance-type ratio estimators in both SRS and STRS, demonstrating the efficacy of the proposed estimator. Further, ref. [6] established a new ratio-type exponential estimator in SRS that is superior to classic ratio, regression, refs. [2,5] estimators. Later, ref. [7] introduced unbiased estimators for population variance using equilibrated stratification and obtained lower variances. Further, ref. [8] suggested exponential ratio- and product-type estimators using bivariate data of auxiliary variables and illustrated the efficiency of these estimators through empirical research.
Further, ref. [9] introduced a category of exponential estimators, demonstrating their effectiveness over other methods in terms of bias and MSE using the provided dataset. Novel estimators using known population parameters to estimate variance are introduced by [10], comparing them to established estimators and showing their superiority under optimal conditions through bias and MSE analysis. An empirical study validates the proposed estimators’ effectiveness. Refs. [11,12] introduced a category of estimators and proved its effectiveness over others by utilizing four datasets. Additionally, by analyzing large sample properties, they demonstrated their superior efficiency over various existing estimators by employing a numerical study. By utilizing bivariate auxiliary information to estimate population variance, ref. [13] proposed a novel generalized exponential estimator. This analysis showed its enhanced efficiency compared to existing estimators through empirical and simulated studies. For estimating population variance, ref. [14] suggested a log-type estimator. For population variance, ref. [15] introduced an innovative set of exponential ratio estimators within the context of STRS, demonstrating equal optimal efficiency with regression estimators and outperforming classical ratio estimators by using analytical and numerical results. Later, ref. [16] offered a few estimators for finite population variance, and ref. [17] proposed a new class of estimators and ranks in STRS for finite population variance, outperforming conventional estimators in efficiency on empirical evaluation with real data analysis. Further, ref. [18] introduced innovative variance estimators using ln-function in STRS, outperforming conventional estimators. The separate method showcases superior efficiency, validated by MSE derivation, numerical examples, and simulations. Ref. [19] proposed variance estimator by using L-moments approach under double stratified sampling. Later, ref. [20] recommended generalized variance estimators by using single and double auxiliary variables and proved their efficiency over others by employing empirical and simulation studies. Further, refs. [21,22,23] proposed various variance estimators. Ref. [24] proposed hybrid estimators in SRS. Theoretical comparisons and empirical evidence showcase their enhanced efficiency over other estimators. Further, ref. [25] introduced an improved variance estimator and proved its efficiency with others by using three datasets. Ref. [26] proposed an advanced variance estimator and showed its superiority by utilizing numerical and simulation studies with real datasets. Further, ref. [27] proposed a nonparametric maximum likelihood estimator (MLE), developed using the EM algorithm and a likelihood based on order statistics, which outperforms over other considered estimators. Later, ref. [28] suggested an exponential ratio with a product estimator was proposed for the estimation of population variance in SRS. Empirical validation confirms theoretical discoveries and assists data practitioners. Further, ref. [29] introduced finite population variance estimation in random responses via SRS for applied and environmental sciences and proved its effectiveness over others. Refs. [30,31] explored innovative approaches for variance estimation in sampling methodologies, particularly focusing on L-moments and calibration techniques. Their work contributed to refining variance estimation methods, especially in the context of stratified and double stratified random sampling method, with practical applications including analyses related to the COVID-19 pandemic. A ratio-type estimator was proposed by [32] and ref. [33] suggested an estimator in conditional and unconditional post-stratification.
Expanding on the contributions of [31], future directions may involve refining variance estimation methods through a deeper exploration of calibration approaches and the integration of L-moments in diverse sampling frameworks. Additionally, there is potential for investigating the robustness and scalability of these methods across various domains, with a focus on enhancing their applicability in real-world data analysis contexts beyond epidemiological studies. Furthermore, efforts to streamline implementation and improve computational efficiency could enhance the practical utility of these variance estimators in large-scale surveys and monitoring programs.
The existing literature lacks a comprehensive exploration of variance estimation through log-type estimators. In this study, our objective is to introduce a log-type estimator tailored for estimating population variance within the framework of stratified random sampling. We develop a logarithmic estimator for population variance, detailed in Section 4. Through a comparative analysis with established methods outlined in the current literature, and considering the conditions delineated in Section 5, we derive valuable insights. Empirical findings presented in Section 6, along with simulation investigations, corroborate the superior efficiency of our proposed estimator over alternative approaches.
2. Notations
Consider a finite population comprising N units distributed across L strata. Let represent the characteristics of the study variable and auxiliary variable , in stratum h such that A sample of units are drawn from in each stratum satisfying .
Let represent the sample variances accordingly to the population variances and . Here, and represent the sample means according to the population means and .
We assume error terms to obtain the equations for bias and MSE for the variance estimators as
where
3. Review of the Literature
The literature contains various variance estimators utilized in STRS, accompanied by their respective Var/MSE formulae. Employing these estimators, we compared them with the proposed estimator, identifying pertinent conditions crucial for evaluating efficiency in comparisons.
- 1.
- The unbiased variance estimator is
Variance of is given by
- 2.
- The usual difference-type estimator iswhere is unknown. Its optimum value is .
The minimum variance of is attained at the optimum value of ,
where .
- 3.
- The population variance’s unbiased estimator as provided by [3] isand its variance is given by
- 4.
- We transformed the [10] estimator in STRS aswhere is a suitable constant.
The minimum MSE of for the optimum value of is provided as
- 5.
- Ref. [16] suggested an estimator such aswhere .
By using the optimum value of we obtain MSE as
- 6.
- A suggested a generalized exponential ratio with a product estimator was suggested by [28] iswhere is a suitable value to minimize the MSE of , as follows:
4. Proposed Estimator
To better estimate variance, we introduced a log-type estimator, enhancing accuracy and reliability. This method improves precision and reliability in variance estimation, offering a more effective alternative to conventional approaches. Below is the combination of difference and ratio type logarithmic estimator.
By taking deviation on both sides with , we have
By computing expectations on both sides of Equation (10), the resulting outcome yields the bias.
Upon squaring both sides of Equation (10) and subsequently computing expectations, the mean squared error (MSE) is derived as
By differentiation Equation (9) with respect to and and equating them with zero, we obtain
where
By substituting the values of and into Equation (9), we obtain the minimum MSE as
where
5. Comparison of Efficiency
In this research, we theoretically specified numerous conditions for comparing the proposed estimators to a variety of traditional and existing estimators used in this context. This comparison analysis provides insights into why the proposed estimators outperform others, particularly with regard to MSE and percentage relative efficiency (PRE).
From (1) and (10), we obtain
From (2) and (10), we obtain
From (3) and (10), we obtain
From (4) and (10), we obtain
From (5) and (10), we obtain
From (6) and (10), we obtain
5.1. Quantitative Assessment
Population-I: We used the data from [5]. The data are about the information on apple production amounts (considered as the primary variable of interest) and the count of apple trees (regarded as an auxiliary variable) originating from the dataset encompassing 854 villages across Turkey in the year 1999, sourced from the Institute of Statistics, Republic of Turkey. Initially, the data were stratified based on the distinct regions within Turkey. Symmetry can also be applied in determining the allocation of sample units to each stratum. Symmetric allocation ensures that each stratum receives a fair representation in the sample relative to its size and variability. This can involve proportional allocation based on the size of each stratum or optimal allocation methods that consider both stratum size and variability.
Following this stratification, a random sampling approach was employed to select villages from each region using Neyman allocation to determine sample sizes per stratum (region). Specifically, a predetermined sample size of was utilized. Subsequently, after analyzing the outcomes of the sample sizes for individual regions, a decision was made to merge the two regions. Consequently, the data were organized into six strata, designated as follows: (1) Marmara, (2) Aegean, (3) Mediterranean, (4) Central Anatolia, (5) Black Sea, and (6) East and Southeast Anatolia.
The theoretical conditions outlined in Equations (11)–(16) are not only theoretically sound but were also validated numerically. Employing the data statistics provided in Table 1, we calculated the MSE values for the estimators, as detailed in Table 2. The results reveal that the proposed estimator exhibits a lower MSE value, coupled with a significantly higher PRE value. This indicates that, among the estimators considered in this study, the proposed estimator boasts the highest PRE, underscoring its superior performance.
Table 1.
Data statistics.
Table 2.
MSE and PRE values of the considered and proposed estimators for Population-I.
5.2. Simulation Analysis
A simulation exercise was performed using the R program to show the proposed and considered estimators’ performance by using two populations.
- (a)
- Population-II: We subdivided N = 1500 into four subpopulations of varying sizes. We conducted 10,000 iterations to achieve efficient results. The models are as follows:
- (b)
- Population–III: We divided 2000 samples into four strata and applied optimum allocation to obtain samples of strata. We conducted 20,000 iterations to obtain the MSE values of estimators. The models considered in this population are as follows:
The PRE of the estimators was determined by employing
where r = 1, 2, 3, 4, 5, 6, prop.
5.3. Discussion of Results
This article introduced a logarithmic-type estimator specifically developed for estimating the finite population variance of a study variable. This estimator leverages the information from an auxiliary variable to enhance the precision of the variance estimation. This study emphasizes the importance of variance estimation to enhance the reliability of survey outcomes. We suggested an estimator and derived its bias and MSE equations, and we also considered the existing variance estimators from the literature and derived their MSE equations. When the considered estimator’s efficiency was compared with the proposed estimator’s efficiency, we obtained the theoretical conditions from (11) to (16).
By using a real dataset in Population-I, we computed the performance effectiveness of the estimators under consideration, including the proposed estimator, by assessing their MSE and PRE values. From Table 2, we can observe that the value of MSE is low compared to the other one. Also, the PRE is high, which indicates the importance of the proposed estimator.
Moreover, from simulation studies, we can prove the effectiveness of the proposed estimator. Table 3 reveals that the proposed estimator demonstrates superior efficiency in comparison to the existing methods. Here, we considered two populations and generated data by using a normal distribution and performed a simulation. In Populations II and III, we performed 10,000 and 20,000 replications respectively. After the replications, we obtained the data statistics of the replications’ average. Then, we found the values of MSE and PRE for all the considered and suggested estimators.
Table 3.
Comparisons between the proposed estimator and other considered estimators through simulation.
In the graphical representation plotted in Figure 1, the red, blue and green colors indicate the PRE values of Population-I, Population-II, and Population-III respectively. We considered estimator 7 as the proposed estimator in this study. We can observe that the proposed estimator’s PRE values are high compared to the others in all three populations. Our suggested log-type estimator’s superior performance highlights its potential as a valuable tool in variance estimation, providing a more efficient and reliable alternative to the existing approaches.
Figure 1.
Graphical representation of PRE values of the three populations.
These results contribute to the ongoing discourse on refining statistical methodologies for survey research, providing a robust alternative for enhancing the precision of survey outcomes. As we navigate the implications of our findings, the proposed estimator stands as a promising avenue for further exploration and potential adoption in diverse sampling contexts, signaling a positive step forward in the evolution of variance estimation techniques.
6. Conclusions
In this study, we formulated a logarithmic-type estimator for finite population variance estimation and we conducted an in-depth analysis of its effectiveness using a real-world dataset. Our investigation delved into a meticulous comparison of our proposed estimator against the established methods, aiming to evaluate its performance comprehensively. The computation of MSE values served as a pivotal metric in assessing the efficiency of the proposed estimator and several existing ones.
The comparison set included well-known estimators such as the standard unbiased variance, difference type, and those proposed by [3,10,16,28]. In both numerical and simulation studies, we examined our proposed estimator’s performance across three distinct populations to understand its characteristics. From Table 2 and Table 3, it is clear that the proposed estimator performed better. From the graphical representation (Figure 1), we can also conclude that the proposed estimator achieved the greatest efficiency in comparison to the other considered estimators.
Upon a thorough examination and interpretation of the results, our findings unequivocally indicate the superior performance of the proposed logarithmic-type estimator. The proposed estimator consistently received favorable assessment metrics in both MSE and PRE values, suggesting its heightened accuracy and efficiency compared to the considered alternatives.
Author Contributions
Conceptualization, G.R.V.T. and F.D.; methodology, F.D., G.R.V.T. and O.A.; software, G.R.V.T.; validation, G.R.V.T. and F.D.; formal analysis, G.R.V.T.; investigation, F.D.; resources, F.D., G.R.V.T. and O.A.; data curation, G.R.V.T.; writing—original draft preparation, G.R.V.T. and F.D.; writing—review and editing, F.D. and O.A.; visualization, G.R.V.T. and F.D.; supervision, F.D.; project administration, F.D. and O.A. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
Data are contained within the article.
Acknowledgments
We highly appreciate the efforts of the reviewers and the assigned editor along with Assistant Editor, for making improvements to this article.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Wakimoto, K. Stratified random sampling (1) estimation of the population variance. Ann. Inst. Stat. Math. 1971, 23, 233–252. [Google Scholar] [CrossRef]
- Isaki, C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
- Prasad, B.; Singh, H.P. Unbiased estimators of finite population variance using auxiliary information in sample surveys. Commun. Stat. Theory Methods 1992, 21, 1367–1376. [Google Scholar] [CrossRef]
- Kadilar, C.; Cingi, H. Improvement in variance estimation using auxiliary information. Hacet. J. Math. Stat. 2006, 35, 111–115. [Google Scholar]
- Kadilar, C.; Cingi, H. Ratio estimators for the population variance in simple and stratified random sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S.A.T. On improvement in variance estimation using auxiliary information. Commun. Stat. -Theory Methods 2007, 36, 2177–2185. [Google Scholar] [CrossRef]
- Espejo, M.R.; Singh, H.P.; Pineda, M.D.; Nadarajah, S. Optimal estimation of population variance using equilibrated stratified sampling from infinite populations. J. Korean Stat. Soc. 2008, 37, 375–383. [Google Scholar] [CrossRef]
- Singh, R.; Chauhan, P.; Sawan, N.; Smarandache, F. Improvement in estimating the population mean using exponential estimator in simple random sampling. Int. J. Stat. Econ. 2009, 3, 13–18. [Google Scholar]
- Koyuncu, N. Improved Estimators of Finite Population Variance in Stratified Random Sampling. World Appl. Sci. J. 2013, 23, 130–137. [Google Scholar]
- Yadav, S.K.; Kadilar, C. A class of ratio-cum-dual to ratio estimator of population variance. J. Reliab. Stat. Stud. 2013, 6, 29–34. [Google Scholar]
- Yadav, S.K.; Kadilar, C.; Shabbir, J.; Gupta, S. Improved family of estimators of population variance in simple random sampling. J. Stat. Theory Pract. 2015, 9, 219–226. [Google Scholar] [CrossRef]
- Yadav, S.K.; Mishra, S.S.; Kumar, S.; Kadilar, C. A new improved class of estimators for the population variance. J. Stat. Appl. Probab. 2016, 5, 385–392. [Google Scholar] [CrossRef]
- Sanaullah, A.; Asghar, A.; Hanif, M. General class of exponential estimator for estimating finite population variance. J. Reliab. Stat. Stud. 2017, 10, 1–16. [Google Scholar]
- Bhushan, S.; Kumari, C. A new log type estimator for estimating the population variance. Int. J. Comp. App. Math. 2018, 13, 43–54. [Google Scholar]
- Etebong, P.C. Improved family of ratio estimators of finite population variance in stratified random sampling. Biostat. Biom. Open Access J. 2018, 5, 55659. [Google Scholar] [CrossRef]
- Muili, J.O.; Singh, R.V.K.; Audu, A. Study of Efficiency of Some Finite Population Variance Estimators in Stratified Random Sampling. Cont. J. Appl. Sci. 2018, 13, 1–17. [Google Scholar]
- Shabbir, J.; Gupta, S. Using rank of the auxiliary variable in estimating variance of the stratified sample mean. Int. J. Comput. Theor. Stat. 2019, 6, 172–181. [Google Scholar] [CrossRef]
- Cekim, H.O.; Kadilar, C. In-type estimators for the population variance in stratified random sampling. Commun. Stat.-Simul. Comput. 2020, 49, 1665–1677. [Google Scholar] [CrossRef]
- Shahzad, U.; Ahmad, I.; Almanjahie, I.M.; Al-Noor, N.H. L-Moments Based Calibrated Variance Estimators Using Double Stratified Sampling. Comput. Mater. Contin. 2021, 68, 3412–3430. [Google Scholar] [CrossRef]
- Yasmeen, U.; Noor-ul-Amin, M. Estimation of Finite Population Variance Under Stratified Sampling Technique. J. Reliab. Stat. Stud. 2021, 14, 565–584. [Google Scholar] [CrossRef]
- Ahmad, S.; Hussain, S.; Shabbir, J.; Zahid, E.; Aamir, M.; Onyango, R. Improved estimation of finite population variance using dual supplementary information under stratified random sampling. Math. Probl. Eng. 2022, 2022, 3813952. [Google Scholar] [CrossRef]
- Aloraini, B.; Khalil, S.; Qureshi, M.N.; Gupta, S. Estimation of Population Variance for a Sensitive Variable in Stratified Sampling Using Randomized Response Technique: Accepted: June 2022. REVSTAT-Stat. J. 2022. Available online: https://revstat.ine.pt/index.php/REVSTAT/article/view/508 (accessed on 2 March 2024).
- Niaz, I.; Sanaullah, A.; Saleem, I.; Shabbir, J. An improved efficient class of estimators for the population variance. Concurr. Comput. Pract. Exp. 2022, 34, e6620. [Google Scholar] [CrossRef]
- Sanaullah, A.; Niaz, I.; Shabbir, J.; Ehsan, I. A class of hybrid type estimators for variance of a finite population in simple random sampling. Commun. Stat. Simul. Comput. 2022, 51, 5609–5619. [Google Scholar] [CrossRef]
- Ahmad, S.; Adichwal, N.K.; Aamir, M.; Shabbir, J.; Alsadat, N.; Elgarhy, M.; Ahmad, H. An enhanced estimator of finite population variance using two auxiliary variables under simple random sampling. Sci. Rep. 2023, 13, 21444. [Google Scholar] [CrossRef]
- Ahmad, S.; Al Mutairi, A.; Nassr, S.G.; Alsuhabi, H.; Kamal, M.; Rehman, M.U. A new approach for estimating variance of a population employing information obtained from a stratified random sampling. Heliyon 2023, 9, e21477. [Google Scholar] [CrossRef]
- Frey, J.; Zhang, Y. Nonparametric maximum likelihood estimation of the distribution function using ranked-set sampling. J. Korean Stat. Soc. 2023, 52, 901–920. [Google Scholar] [CrossRef]
- Jan, R.; Jan, T.R.; Danish, F. Generalised Exponential Ratio-Cum-Product Estimator for Estimating Population Variance in Simple Random Sampling. Reliab. Theory Appl. 2023, 18, 625–631. [Google Scholar]
- Javed, S.; Masood, S.; Shokri, A. Generalized Class of Finite Population Variance in the Presence of Random Nonresponse Using Simulation Approach. Complexity 2023, 2023, 6643435. [Google Scholar] [CrossRef]
- Shahzad, U.; Ahmad, I.; Almanjahie, I.M.; Al-Noor, N.H.; Hanif, M. A novel family of variance estimators based on L-moments and calibration approach under stratified random sampling. Commun. Stat. Simul. Comput. 2023, 52, 3782–3795. [Google Scholar] [CrossRef]
- Shahzad, U.; Ahmad, I.; Mufrah Almanjahie, I.; Hanif, M.; Al-Noor, N.H. L-Moments and calibration-based variance estimators under double stratified random sampling scheme: Application of Covid-19 pandemic. Sci. Iran. 2023, 30, 814–821. [Google Scholar] [CrossRef]
- Triveni, G.R.V.; Danish, F. Heuristical Approach for Optimizing Population Mean Using Ratio Estimator in Stratified Random Sampling. J. Reliab. Stat. Stud. 2023, 16, 137–152. [Google Scholar] [CrossRef]
- Triveni, G.R.V.; Danish, F. Leveraging Auxiliary Variables: Advancing Mean Estimation Through Conditional and Unconditional Post-Stratification. Reliab. Theory Appl. 2023, 18, 57–68. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
