A Novel Robust Transformation Approach to Finite Population Median Estimation Using Monte Carlo Simulation and Empirical Data
Abstract
1. Introduction
Rationale for the Proposed Class of Estimators
- Handling outliers and heavy tails: Conventional median-type estimators often lose efficiency when the auxiliary variable is affected by extreme values. Robust transformations such as Hodges–Lehmann location and Gini mean difference, and resistant averages such as trimmed and winsorized means provide stable adjustments that reduce the influence of outliers.
- Quartile-based measures: Median ratio, Bowley’s skewness, interquartile range, geometric quartile mean, quartile deviation, and percentile ranges describe spread and asymmetry.
- Robust dispersion: The median absolute deviation and skewness-adjusted indices help reduce the influence of extreme values.
- Variability and shape: The coefficient of variation and Moors’ kurtosis provide insight into variability and tail behavior, especially in skewed or heavy-tailed data.
- Improved efficiency: By combining resistant measures of location, scale, and shape, the proposed class can achieve lower bias and mean squared error compared to traditional quantile-based methods.
- Novelty in median estimation: While robust measures such as trimmed means or Gini mean difference have been studied in the context of mean estimation, their use in designing median estimators under two-phase sampling has not been explored. This gives the proposed class originality and adds value to the literature.
- Practical Relevance: Many real-life datasets in economics, health, and social sciences exhibit skewness or contain irregular observations. The proposed transformations make the estimators more reliable for such applications.
- Overall contribution: Together, these measures and transformations enhance the efficiency, stability, and reliability of median-based estimation in the presence of outliers or non-normality.
2. Survey Design and Preliminaries
- (i)
- The process begins with selecting a sample of elements. At this stage, information is collected only on the auxiliary factor x, which is then used to approximate the median value of the population.
- (ii)
- A second-phase sub-sample of size () is taken from the first-phase selection. At this stage, data are collected on both the main study characteristic y and the auxiliary factor x.
3. Existing Approaches for Median Estimation
4. Proposed Class of Robust Median Estimators
- Hodges–Lehmann:
- Gini Mean Difference:
- Trimmed Mean (10%): Let Then
- Winsorized Mean (10%): Let Then
- Median ratio:
- Bowley Skewness:
- Inter quartile range:
- Geometric Quartile Mean:
- Quartile Deviation:
- 10–90% Range:
- Median Absolute Deviation (MAD):
- Skewness Adjusted:
- Coefficient of Variation:
- Moors’ kurtosis: Let be the sample quantile of order It is simple, bounded, and robust, making it a suitable choice when the target is the median. Using octiles :
5. Theoretical Comparison with Existing Estimators
- (i)
- (ii)
- (iii)
- (iv)
- (v)
- (vi)
- (vii)
- By comparing the MSE of the newly developed family of estimators (A4) with the MSE in equation , the following condition is obtained, capturing their relative performance:
- (viii)
6. Results and Discussion
6.1. Monte Carlo Simulation Study
- Population 1: The random variable X follows a Cauchy distribution characterized by parameters and Since the Cauchy distribution does not have a defined mean or variance, the theoretical population correlation with Y is undefined. For our simulation, we used a correlation of which refers to the sample correlation calculated from the generated data set and reflects the observed inverse association between X and
- Population 2: X follows a uniform distribution bounded between 16 and 22, and is statistically independent of Y in terms of correlation ().
- Population 3: The variable X is modeled using an exponential distribution with a strong skew, where the rate parameter is . Its correlation with Y is positive and equals .
- Population 4: The random variable X is distributed as a gamma law with shape and scale . Its dependence on Y is positive, with .
- Population 5: The variable X is assumed to follow a log-normal distribution with mild skewness, defined by parameters and . Its correlation with Y is .
6.2. Simulation Steps Under Two-Phase Sampling
- Population setup: Generate the study population by selecting one of the five distributions for the auxiliary variable X (Cauchy, Uniform, Exponential, Gamma, or Log-normal) with their specified parameters, where For each unit, the corresponding study variable Y is obtained from
- First-phase sampling: From the generated population of size N, draw a simple random sample of size . At this stage, only the values of the auxiliary variable X are recorded.
- Second-Phase Sampling: From the first-phase sample, select a subsample of size . For the units in this subsample, observe both the auxiliary variable X and the study variable Y.
- Estimator formation: Using the information from phase-I and the paired observations from phase II, construct the proposed estimators based on the sample median.
- Repetition: Repeat the above process a large number of times (for instance, 25,000 iterations) to assess the sampling behavior of the estimators.
- Performance assessment: For each estimator, mean squared error (MSE), relative efficiency, and, when applicable, the coverage probability of confidence intervals.
- Comparison across populations: Finally, compare the results across the five populations to evaluate how the estimators perform under heavy-tailed, skewed, and moderately correlated settings.
6.3. Application to Survey Data
- Y: Denotes the average number of employees per district in 2010, reflecting workforce distribution across regions.
- X: Represents the total value of factory registrations in the same year, indicating the scale of industrial activity in each district.
- Y: Aggregate number of students registered in all schools during the 2012–2013 academic session.
- X: Total number of government-managed middle school institutions recorded for the 2012–2013 academic session.
- Y: Aggregate employment across all industrial sectors in each district.
- X: Fraction of registered factories in the corresponding district for the year 2012.
6.4. Analysis of Simulation and Empirical Results
- Performance on simulated data: Table 3 provides strong evidence that the proposed estimators consistently deliver higher percent relative efficiency (PRE) than conventional approaches. This advantage is observed across a broad range of distributions including all five different cases. The graphical representation in Figure 1 further illustrates this pattern, where the proposed estimators are repeatedly identified as the most efficient, appearing across all scenarios. Such findings confirm their robustness under diverse distributional conditions.
- Application to real data sets: The benefits of the proposed estimators are not confined to artificial populations; they extend convincingly to real-world data. Table 4 summarizes their performance across socio-economic and environmental populations, where they consistently achieve higher PRE values than their traditional counterparts. This trend is echoed in Figure 2, which clearly highlights the superiority of the proposed methods over exponential and difference-type estimators in all three populations considered. These results underscore the practical reliability of the proposed class of estimators when applied to empirical datasets.
- Impact of correlation and sample size: Another important aspect of efficiency is its stability across varying survey conditions. Figure 1 and Figure 2 indicate that the proposed estimators maintain their effectiveness regardless of the correlation level between the study and auxiliary variables. Complementary evidence from Table 3 and Table 4 shows that efficiency does not deteriorate even when the second-phase sample size is much smaller than the first-phase size This property is particularly valuable for practitioners conducting surveys with limited resources, as it ensures that reliable results can still be obtained under constrained sampling conditions.
7. Conclusions and Research Directions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
- A proof of Theorem 1, as detailed in Section 4
References
- Daraz, U.; Alomair, M.A.; Albalawi, O.; Al Naim, A.S. New Techniques for Estimating Finite Population Variance Using Ranks of Auxiliary Variable in Two-Stage Sampling. Mathematics 2024, 12, 2741. [Google Scholar] [CrossRef]
- Zaman, T.; Bulut, H. A simulation study: Robust ratio double sampling estimator of finite population mean in the presence of outliers. Sci. Iran. 2021, 31, 1330–1341. [Google Scholar] [CrossRef]
- Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 1–15. [Google Scholar] [CrossRef]
- Alomair, M.A.; Daraz, U. Dual transformation of auxiliary variables by using outliers in stratified random sampling. Mathematics 2024, 12, 2829. [Google Scholar] [CrossRef]
- Gross, S. Median estimation in sample surveys. In Proceedings of the Section on Survey Research Methods, American Statistical Association Ithaca, Alexandria, VA, USA. 1980. Available online: http://www.asasrms.org/Proceedings/papers/1980_037.pdf (accessed on 27 August 2025).
- Sedransk, J.; Meyer, J. Confidence intervals for the quantiles of a finite population: Simple random and stratified simple random sampling. J. R. Stat. Soc. Ser. B 1978, 40, 239–252. [Google Scholar] [CrossRef]
- Philip, S.; Sedransk, J. Lower bounds for confidence coefficients for confidence intervals for finite population quantiles. Commun. Stat. Theory Methods 1983, 12, 1329–1344. [Google Scholar] [CrossRef]
- Kuk, Y.C.A.; Mak, T.K. Median estimation in the presence of auxiliary information. J. R. Stat. Soc. Ser. B 1989, 51, 261–269. [Google Scholar] [CrossRef]
- Rao, T.J. On certail methods of improving ration and regression estimators. Commun. Stat. Theory Methods 1991, 20, 3325–3340. [Google Scholar] [CrossRef]
- Singh, S.; Joarder, A.H.; Tracy, D.S. Median estimation using double sampling. Aust. N. Z. J. Stat. 2001, 43, 33–46. [Google Scholar] [CrossRef]
- Khoshnevisan, M.; Singh, H.P.; Singh, S.; Smarandache, F. A General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling; Virginia Polytechnic Institute and State University: Blacksburg, VA, USA, 2002. [Google Scholar]
- Singh, S. Advanced Sampling Theory with Applications: How Michael Selected Amy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; Volume 2. [Google Scholar]
- Gupta, S.; Shabbir, J.; Ahmad, S. Estimation of median in two-phase sampling using two auxiliary variables. Commun. Stat. Theory Methods 2008, 37, 1815–1822. [Google Scholar] [CrossRef]
- Aladag, S.; Cingi, H. Improvement in estimating the population median in simple random sampling and stratified random sampling using auxiliary information. Commun. Stat. Theory Methods 2015, 44, 1013–1032. [Google Scholar] [CrossRef]
- Solanki, R.S.; Singh, H.P. Some classes of estimators for median estimation in survey sampling. Commun. Stat. Theory Methods 2015, 44, 1450–1465. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon 2024, 10, e33402. [Google Scholar] [CrossRef]
- Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry 2024, 16, 957. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S. A generalized class of difference type estimators for population median in survey sampling. Hacet. J. Math. Stat. 2017, 46, 1015–1028. [Google Scholar] [CrossRef]
- Irfan, M.; Maria, J.; Shongwe, S.C.; Zohaib, M.; Bhatti, S.H. Estimation of population median under robust measures of an auxiliary variable. Math. Probl. Eng. 2021, 2021, 4839077. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S.; Narjis, G. On improved class of difference type estimators for population median in survey sampling. Commun. Stat. Theory Methods 2022, 51, 3334–3354. [Google Scholar] [CrossRef]
- Hussain, M.A.; Javed, M.; Zohaib, M.; Shongwe, S.C.; Awais, M.; Zaagan, A.A.; Irfan, M. Estimation of population median using bivariate auxiliary information in simple random sampling. Heliyon 2024, 10, e28891. [Google Scholar] [CrossRef]
- Bhushan, S.; Kumar, A.; Lone, S.A.; Anwar, S.; Gunaime, N.M. An efficient class of estimators in stratified random sampling with an application to real data. Axioms 2023, 12, 576. [Google Scholar] [CrossRef]
- Stigler, S.M. Linear functions of order statistics. Ann. Math. Stat. 1969, 40, 770–788. [Google Scholar] [CrossRef]
- Daraz, U.; Almulhim, F.A.; Alomair, M.A.; Alomair, A.M. Population median estimation using auxiliary variables: A simulation study with real data across sample sizes and parameters. Mathematics 2025, 13, 1660. [Google Scholar] [CrossRef]
- Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1–15. [Google Scholar] [CrossRef]
- Daraz, U.; Wu, J.; Agustiana, D.; Emam, W. Finite population variance estimation using Monte Carlo simulation and real life application. Symmetry 2025, 17, 84. [Google Scholar] [CrossRef]
- Daraz, U.; Agustiana, D.; Wu, J.; Emam, W. Twofold auxiliary information under two-phase sampling: An improved family of double-transformed variance estimators. Axioms 2025, 14, 64. [Google Scholar] [CrossRef]
- Singh, H.P.; Vishwakarma, G.K. Modified exponential ratio and product estimators for finite population mean in double sampling. Austrian J. Stat. 2007, 36, 217–225. [Google Scholar] [CrossRef]
- Bureau of Statistics. Punjab Development Statistics Government of the Punjab, Lahore, Pakistan; Bureau of Statistics: Islamabad, Pakistan, 2013.
- Bureau of Statistics. Punjab Development Statistics Government of the Punjab, Lahore, Pakistan; Bureau of Statistics: Islamabad, Pakistan, 2014.
Estimator | ||||
---|---|---|---|---|
Hodges–Lehmann(X) | GMD(X) | 1 | Winsor_range10%(X) | |
Trim–Mean10%(X) | Winsorized Mean10%(X) | 1 | MAD(X) | |
Skew_adj(X) | 1 | CV(X) | ||
Median_ratio | 1 | 1 | Bowley_skewness(X) | |
1 | ||||
1 | ||||
Hodges–Lehmann | MAD(X) | 1 | Skew_adj(X) | |
1 |
Symbols | Set 1 (Statistic) | Set 2 (Statistic) | Set 3 (Statistic) |
---|---|---|---|
N | 36 | 36 | 36 |
168.5 | 1016.5 | 171.5 | |
10,484.5 | 115,223.5 | 10,494 | |
0.00015 | 0.00022 | 0.00019 | |
0.00016 | 0.00024 | 0.00021 | |
199.50 | 1027 | 201.25 | |
192.22 | 441.95 | 396.49 | |
235.50 | 1019.13 | 236.00 | |
270.67 | 1031.61 | 270.67 | |
89.5 | 729 | 87.5 | |
347 | 1242.25 | 352.5 | |
3.88 | 1.703 | 4.03 | |
0.386 | −0.1225 | 0.366 | |
257.5 | 512.25 | 175.62 | |
128.75 | 256.125 | 265 | |
751 | 844 | 762.5 | |
92.50 | 289 | 99 | |
0.386 | −0.1225 | 0.366 | |
2.60 | 0.396 | 2.64 | |
2.10 | 1.125 | 2.18 | |
0.912 | 0.796 | 0.519 | |
0.0107 | 0.0242 | 0.0288 | |
0.0035 | 0.0123 | 0.0140 | |
0.00072 | 0.0119 | 0.0148 |
Estimator | Pop1 | Pop2 | Pop3 | Pop4 | Pop5 |
---|---|---|---|---|---|
100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
196.15 | 169.71 | 95.99 | 190.70 | 151.17 | |
164.68 | 159.12 | 99.42 | 232.71 | 139.17 | |
160.64 | 169.87 | 114.34 | 159.98 | 193.08 | |
60.91 | 90.10 | 72.55 | 60.59 | 47.22 | |
199.77 | 244.77 | 127.85 | 242.25 | 200.80 | |
224.02 | 199.57 | 154.70 | 252.88 | 230.97 | |
200.79 | 238.59 | 143.50 | 250.09 | 255.57 | |
274.81 | 300.02 | 191.82 | 281.45 | 261.83 | |
284.20 | 310.00 | 200.71 | 292.62 | 274.32 | |
284.42 | 300.00 | 188.92 | 299.87 | 299.23 | |
300.65 | 319.99 | 199.79 | 200.30 | 259.98 | |
296.34 | 280.01 | 179.73 | 278.66 | 269.67 | |
272.02 | 315.02 | 254.16 | 281.47 | 251.80 | |
301.87 | 325.00 | 239.22 | 260.07 | 270.99 | |
294.46 | 276.00 | 205.80 | 255.72 | 310.09 |
Estimator | Set 1 | Set 2 | Set 3 |
---|---|---|---|
100.00 | 100.00 | 100.00 | |
134.54 | 123.16 | 143.53 | |
140.82 | 145.87 | 155.98 | |
130.74 | 142.56 | 175.06 | |
95.62 | 65.92 | 86.49 | |
206.36 | 245.20 | 216.71 | |
239.47 | 252.38 | 274.63 | |
227.10 | 285.95 | 251.85 | |
251.88 | 310.43 | 290.91 | |
258.79 | 315.91 | 301.07 | |
292.68 | 298.08 | 299.37 | |
299.47 | 302.83 | 308.74 | |
286.98 | 299.55 | 284.46 | |
261.80 | 290.24 | 310.50 | |
257.94 | 331.82 | 334.29 | |
297.72 | 352.29 | 321.82 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alshanbari, H.M. A Novel Robust Transformation Approach to Finite Population Median Estimation Using Monte Carlo Simulation and Empirical Data. Axioms 2025, 14, 737. https://doi.org/10.3390/axioms14100737
Alshanbari HM. A Novel Robust Transformation Approach to Finite Population Median Estimation Using Monte Carlo Simulation and Empirical Data. Axioms. 2025; 14(10):737. https://doi.org/10.3390/axioms14100737
Chicago/Turabian StyleAlshanbari, Huda M. 2025. "A Novel Robust Transformation Approach to Finite Population Median Estimation Using Monte Carlo Simulation and Empirical Data" Axioms 14, no. 10: 737. https://doi.org/10.3390/axioms14100737
APA StyleAlshanbari, H. M. (2025). A Novel Robust Transformation Approach to Finite Population Median Estimation Using Monte Carlo Simulation and Empirical Data. Axioms, 14(10), 737. https://doi.org/10.3390/axioms14100737