New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application
Abstract
:1. Introduction
- Traditional estimators for the finite population mean often disregard extreme values, as these are typically viewed as challenging due to their potential to produce misleading results or inflated mean squared errors (MSEs). The inefficiency of stratified two-phase sampling designs highlights the necessity for a more robust approach that addresses these concerns effectively.
- The complex data structure of stratified two-phase sampling creates significant challenges for existing estimators. This highlights the importance of developing estimators that are more reliable and efficient.
- Two-phase sampling allows researchers to select specific clusters or strata that accurately represent the entire population, ensuring that different sub-populations are properly represented.
- Two-phase sampling is a useful method in a variety of research situations because it offers more accurate representation and control of variability along with cost savings, enhanced precision, and flexibility.
- This paper introduces a new class of estimators for the finite population mean, using extreme values of auxiliary variables to enhance accuracy and stability, particularly in the presence of outliers.
- The new estimators improve bias and MSE performance compared to existing methods by using extreme values, leading to more accurate finite population mean estimates, particularly with outliers or skewed data.
- The study includes theoretical analysis of the biases and MSE of the estimators, along with Monte Carlo simulations and real-life applications. These show that the proposed estimators outperform traditional methods in percent relative efficiency (PRE), confirming the theoretical improvements with practical results.
- The estimators presented in this paper are very useful and mathematically robust, providing useful methods for real-life applications where auxiliary data are frequently available, such as economic data analysis, public health, and education.
2. Methodology and Notation
- The population mean is estimated in the first phase by selecting a sample of size .
- In order to observe v and u, respectively, a sample size of is chosen for the second phase.
3. Existing Estimators
4. Proposed Generalized Estimators
Properties of the Suggested Estimator
5. Mathematical Comparison
6. Results and Discussion
6.1. Simulation Study
- Population 1:
- Population 2:
- Population 3:
- Population 4: ,
- Population 5:
- Population 6:
- Population 7:
- Population 8:
- Step 1: First, we start by generating a population of 1500 observations using the particular probability distributions defined above.
- Step 2: Apply the simple random sampling without replacement (SRSWOR) approach to obtain a first-phase sample of size from a population of size .
- Step 3: Determine the second-phase sample size from the first-phase sample using the SRSWOR approach.
- Step 4: Using the information from the previous phases, we estimate the population total, the auxiliary variable extreme, and the extreme values of the ranks of the auxiliary variable.
- Step 5: We employ the (SRSWOR) technique to provide different sample sizes for each stratum in each population. The given sample sizes are 15%, 30%, and 45%.
- Step 6: Using all of the estimators described in this article, the mean squared errors and percent relative efficiencies values can be determined for each sample size. This step makes sure that each estimator’s and are examined for a set of sample sizes.
- Step 8: Finally, one can obtain the MSEs and PREs for each estimator over all replications using the following formulas:
6.2. Numerical Examples
- Data 1. This dataset, which was compiled in Pakistan in 2012 and contains data from various divisions, was taken from page 135 of the Bureau of Statistics [30]. The Pakistan Bureau of Statistics website provides a download link: https://www.pbs.gov.pk/content/microdata (accessed on 1 November 2024). The variables contained in the dataset are described as follows:
- Y: This variable provides the total number of students enrolled in schools in 2012. It maintains records of all the registered students across all schools.
- X: The total number of public elementary and secondary schools in 2012.
- Group 1: Consists of the divisions of Lahore, Sargodha, Gujranwala, and Rawalpindi; these divisions reflect areas with perhaps different student populations and educational facilities.
- Group 2: Consists of the departments of Multan, Bahawalpur, Faisalabad, D.G. Khan, and Sahiwal. These divisions have distinct educational features from Group 1 and provide a varied viewpoint on the dynamics of regional education.
- Data 2. This population, which consists information from different divisions, was selected from page 226 [30] of the Bureau of Statistics, collected in Pakistan in 2012. It is accessible for download on the Pakistan Bureau of Statistics website using the following URL: https://www.pbs.gov.pk/content/microdata (accessed on 1 November 2024).
- Y: The variable Y represents the employment levels that were reported throughout the various departments in the year 2012, and it serves as a measure of the distribution of the workforce.
- X: This refers to the number of factories that were officially registered by the departments in the year 2012, which is an indication of the level of industrial activity.
- Group 1: Includes the divisions of Sargodha, Gujranwala, Rawalpindi, and Lahore, which represent some of the more industrialized and heavily inhabited districts.
- Group 2: Comprises the divisions of Multan, Bahawalpur, Faisalabad, D.G. Khan, and Sahiwal, including areas noted for agricultural activities as well as expanding industrial sectors.
- Data 3. This dataset, obtained from page 24 [1], provides statistics on food costs and the weekly income of households.
- Y: Indicates the family’s food expenditure, which is connected to their working status and shows how food prices can change depending on their work environment.
- : Represents the family’s weekly income, which is an important reflection of their financial situation.
7. Discussion
- The mean squared error values for all suggested estimators are consistently smaller compared to those of the estimators discussed in Section 3, comprising all simulated situations and real datasets, as shown in Table 2 and Table 7. It is evident from this that the suggested estimators are more precise than existing estimators.
7.1. Limitations and Practical Challenges
7.1.1. Impact of Small Sample Sizes
7.1.2. Sensitivity to Highly Skewed Populations
7.1.3. Dependence on Auxiliary Data
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
- Khoshnevisan, M.; Singh, R.; Chauhan, P.; Sawan, N. A general family of estimators for estimating population mean using known value of some population parameter (s). Far East J. Theor. Stat. 2007, 22, 181–191. [Google Scholar]
- Rueda, M.M.; Arcos, A.; Martınez-Miranda, M.D.; Román, Y. Some improved estimators of finite population quantile using auxiliary information in sample surveys. Comput. Stat. Data Anal. 2004, 45, 825–848. [Google Scholar] [CrossRef]
- Särndal, C.E. Sample survey theory vs. general statistical theory: Estimation of the population mean. Int. Stat. Rev. Int. Stat. 1972, 40, 1–12. [Google Scholar] [CrossRef]
- Tarima, S.; Pavlov, D. Using auxiliary information in statistical function estimation. ESAIM Probab. Stat. 2006, 10, 11–23. [Google Scholar] [CrossRef]
- Sukhatme, B.V. Some ratio-type estimators in two-phase sampling. J. Am. Stat. Assoc. 1962, 57, 628–632. [Google Scholar] [CrossRef]
- Erinola, A.Y.; Singh, R.V.K.; Audu, A.; James, T. Modified class of estimator for finite population mean under two-phase sampling using regression estimation approach. Asian J. Probab. Stat. 2021, 4, 52–64. [Google Scholar] [CrossRef]
- Garg, N.; Srivastava, M. A general class of estimators of a finite population mean using multi-auxiliary information under two stage sampling scheme. J. Reliab. Stat. Stud. 2009, 2, 103–118. [Google Scholar]
- Guha, S.; Chandra, H. Improved estimation of finite population mean in two-phase sampling with subsampling of the nonrespondents. Math. Popul. Stud. 2021, 28, 24–44. [Google Scholar] [CrossRef]
- Kamal, A.; Amir, N.; Dastagir, H. Some exponential type predictive estimators of finite population mean in two-phase sampling. J. Stat. Comput. Interdiscip. Res. 2020, 2, 51–57. [Google Scholar] [CrossRef]
- Khare, B.B.; Khare, S. Generalized synthetic estimator for domain mean in two phase sampling using single auxiliary character. J. Reliab. Stat. Stud. 2019, 12, 139–151. [Google Scholar]
- Singh, H.P.; Vishwakarma, G.K. Modified exponential ratio and product estimators for finite population mean in double sampling. Austrian J. Stat. 2007, 36, 217–225. [Google Scholar] [CrossRef]
- Singh, H.P.; Espeio, M.R. Double sampling ratio-product estimator of a finite population mean in sample surveys. J. Appl. Stat. 2007, 34, 71–85. [Google Scholar] [CrossRef]
- Vishwakarma, G.K.; Zeeshan, S.M. Generalized ratio-cum-product estimator for finite population mean under two-phase sampling scheme. J. Mod. Appl. Stat. Methods 2020, 19, 1–16. [Google Scholar] [CrossRef]
- Zaman, T.; Kadilar, C. New class of exponential estimators for finite population mean in two-phase sampling. Commun. Stat.-Theory Methods 2021, 50, 874–889. [Google Scholar] [CrossRef]
- Mohanty, S.; Sahoo, J. A note on improving the ratio method of estimation through linear transformation using certain known population parameters. Sankhyā Indian J. Stat. Ser. B 1995, 57, 93–102. [Google Scholar]
- Khan, M.; Shabbir, J. Some improved ratio, product, and regression estimators of finite population mean when using minimum and maximum values. Sci. World J. 2013, 2013, 431868. [Google Scholar] [CrossRef] [PubMed]
- Khan, M. Improvement in estimating the finite population mean under maximum and minimum values in double sampling scheme. J. Stat. Appl. Probab. Lett. 2015, 2, 115–121. [Google Scholar]
- Walia, G.S.; Kaur, H.; Sharma, M. Ratio type estimator of population mean through efficient linear transformation. Am. J. Math. Stat. 2015, 5, 144–149. [Google Scholar]
- Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 1–15. [Google Scholar] [CrossRef]
- Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1899402. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
- Cekim, H.O.; Cingi, H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacet. J. Math. Stat. 2017, 46, 685–694. [Google Scholar]
- Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon 2024, 10, e33402. [Google Scholar] [CrossRef]
- Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry 2024, 16, 957. [Google Scholar] [CrossRef]
- Dawoud, I.; Awwad, F.A.; Tag Eldin, E.; Abonazel, M.R. New Robust Estimators for Handling Multicollinearity and Outliers in the Poisson Model: Methods, Simulation and Applications. Axioms 2022, 11, 612. [Google Scholar] [CrossRef]
- Bhushan, S.; Kumar, A.; Alsadat, N.; Mustafa, M.S.; Alsolmi, M.M. Some Optimal Classes of Estimators Based on Multi-Auxiliary Information. Axioms 2023, 12, 515. [Google Scholar] [CrossRef]
- Ullah, A.; Shabbir, J.; Alomair, A.M.; Alomair, M.A. Ratio-type estimator for estimating the neutrosophic population mean in simple random sampling under intuitionistic fuzzy cost function. Axioms 2023, 12, 890. [Google Scholar] [CrossRef]
- Bureau of Statistics. Punjab Development Statistics; Government of the Punjab: Lahore, Pakistan, 2013.
Subsets of | ||||
---|---|---|---|---|
1 | −1 | |||
−1 | 1 | |||
−1 | −1 | |||
0 | 1 | |||
1 | 1 | |||
0 | −1 | |||
1 | 0 | |||
−1 | 0 |
Estimator | Data-1 | Data-2 | Data-3 | Data-4 | Data-5 | Data-6 | Data-7 | Data-8 |
---|---|---|---|---|---|---|---|---|
9.18 × | 8.85 × | 3.96 × | 4.05 × | 7.15 × | 8.05 × | 6.92 × | 7.452× | |
8.62 × | 8.40 × | 3.05 × | 3.15 × | 5.12 × | 6.00 × | 5.972× | 6.50 × | |
7.95 × | 8.25 × | 2.08 × | 2.48 × | 4.69 × | 5.35 × | 5.58 × | 5.95 × | |
7.75 × | 7.80 × | 2.65 × | 2.29 × | 3.40 × | 4.05 × | 4.10 × | 4.58 × | |
8.15 × | 8.00 × | 3.50 × | 3.75 × | 6.58 × | 6.95 × | 5.47 × | 5.95 × | |
7.02 × | 7.45 × | 3.10 × | 3.05 × | 3.20 × | 3.88 × | 3.55 × | 3.88 × | |
8.70 × | 8.45 × | 3.95 × | 5.20 × | 8.05 × | 8.45 × | 7.60 × | 7.92 × | |
6.10 × | 7.45 × | 3.35 × | 4.55 × | 3.95 × | 4.35 × | 5.05 × | 5.30× | |
6.45 × | 7.65 × | 4.80 × | 5.50 × | 6.50 × | 6.92 × | 5.00 × | 5.48 × | |
5.10 × | 5.90 × | 4.30 × | 5.05 × | 6.80 × | 7.25 × | 4.30 × | 4.58 × | |
5.40 × | 6.25 × | 3.80 × | 3.20 × | 3.25 × | 3.78 × | 4.55 × | 4.88 × | |
4.25 × | 5.00 × | 2.55 × | 3.15 × | 2.65 × | 3.28 × | 3.05 × | 3.28 × | |
4.60 × | 5.40 × | 2.40 × | 2.30 × | 2.40 × | 2.90 × | 3.45 × | 3.65 × | |
2.90 × | 3.25 × | 1.15 × | 1.70 × | 2.05 × | 2.48 × | 2.45 × | 2.62 × |
Estimator | Data-1 | Data-2 | Data-3 | Data-4 | Data-5 | Data-6 | Data-7 | Data-8 |
---|---|---|---|---|---|---|---|---|
100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
106.49 | 105.07 | 129.83 | 128.57 | 139.64 | 132.87 | 129.50 | 124.50 | |
115.53 | 107.27 | 190.38 | 163.31 | 152.48 | 145.76 | 141.24 | 134.32 | |
118.45 | 113.46 | 149.43 | 176.99 | 208.82 | 189.32 | 189.10 | 175.32 | |
112.64 | 111.87 | 113.14 | 109.06 | 109.19 | 145.53 | 136.42 | 132.43 | |
130.83 | 119.73 | 127.74 | 133.12 | 225.11 | 211.54 | 222.54 | 211.20 | |
286.78 | 604.73 | 253.16 | 212.65 | 243.57 | 267.14 | 282.89 | 271.12 | |
410.73 | 324.20 | 319.35 | 239.87 | 491.56 | 462.12 | 422.43 | 396.54 | |
389.19 | 314.87 | 221.58 | 201.82 | 298.51 | 271.18 | 427.60 | 398.90 | |
489.52 | 407.33 | 250.84 | 217.77 | 285.94 | 261.34 | 492.32 | 468.30 | |
465.64 | 386.71 | 263.08 | 339.89 | 589.72 | 523.74 | 465.99 | 432.45 | |
584.34 | 484.15 | 402.98 | 348.93 | 729.62 | 698.34 | 700.21 | 645.18 | |
538.94 | 446.71 | 423.96 | 468.63 | 797.84 | 726.12 | 620.44 | 578.32 | |
852.26 | 741.82 | 869.51 | 630.85 | 922.98 | 898.27 | 868.23 | 810.20 |
Estimator | Data 1 | Data 2 | Data 3 |
---|---|---|---|
2458644474 | 2277471492 | 4.991 | |
2325327219 | 2106243059 | 4.564 | |
2086419318 | 2005323090 | 4.879 | |
2360073402 | 2123289511 | 4.628 | |
2621040437 | 2568789003 | 4.879 | |
2279370249 | 2091394778 | 4.540 | |
1685405190 | 1691442012 | 3.108 | |
1489232432 | 1479656632 | 2.940 | |
1405592234 | 1378267780 | 2.728 | |
1482944924 | 1451874310 | 2.835 | |
1637713868 | 1660835587 | 3.206 | |
1243061255 | 1290701980 | 2.501 | |
1553568342 | 1523808743 | 2.730 | |
1163004043 | 1211251209 | 2.261 |
Estimator | Data 1 | Data 2 | Data 3 |
---|---|---|---|
100 | 100 | 100 | |
105.733 | 108.120 | 109.356 | |
117.840 | 113.571 | 117.840 | |
104.177 | 107.262 | 107.842 | |
103.804 | 105.659 | 102.293 | |
107.865 | 108.8973 | 109.696 | |
145.879 | 134.647 | 160.586 | |
165.095 | 153.919 | 169.762 | |
174.919 | 165.242 | 179.532 | |
165.795 | 156.864 | 176.049 | |
150.127 | 137.128 | 155.676 | |
197.780 | 176.452 | 199.560 | |
158.258 | 149.459 | 182.820 | |
211.405 | 188.026 | 220.743 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alghamdi, A.S.; Alrweili, H. New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application. Mathematics 2025, 13, 329. https://doi.org/10.3390/math13030329
Alghamdi AS, Alrweili H. New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application. Mathematics. 2025; 13(3):329. https://doi.org/10.3390/math13030329
Chicago/Turabian StyleAlghamdi, Abdulaziz S., and Hleil Alrweili. 2025. "New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application" Mathematics 13, no. 3: 329. https://doi.org/10.3390/math13030329
APA StyleAlghamdi, A. S., & Alrweili, H. (2025). New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application. Mathematics, 13(3), 329. https://doi.org/10.3390/math13030329