Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling
Abstract
1. Introduction
1.1. Quantile Background
1.2. Research Objectives
1.3. Key Contributions
2. Methodology and Notations
3. Stratified Two-Phase Existing Estimators
4. A New Stratified Family of Estimators
4.1. Conceptual Explanation of the Proposed Transformations
4.2. Underlying Logic for Each Quantile Measure Used in the Methodology
- Quartile deviation (QD): When variation increases by outliers, the quartile deviation (QD) is a useful tool for capturing variability around the median.
- Median absolute deviation (MAD): A highly effective scale measure that reduces the impact of extreme observations is the median absolute deviation (MAD).
- Trimmed mean (TM): Reductions in tail values under high skewness to provide a stable location measure.
- Mid range (MR): Useful for auxiliary variables with wide range, it reflects the behavior of extreme values.
- Interquartile range IQR: The IQR provides moderate efficiency and robustness by summarizing the middle half of the distribution.
- Decile mean (DM): The shape of the distribution is summarized and aspects of inequality that go beyond conventional dispersion measures are captured by the decile mean (DM), which is the average of the distribution’s decile values.
- Skewness measure : Corrects the estimator for directional asymmetry in each stratum.
- Quartile average (QA): A balanced alternative for the median is the quartile average (QA), a smooth measure of central tendency.
- Product measure (PM): Product measure represents a balanced distribution around the median, which is particularly helpful in skewed environments.
4.3. Remark on the Structure of the Proposed Family
5. Evaluation Framework and Conditions
- (i)
- Upon comparing the MSE expression derived for the proposed estimator family (27) with the corresponding variance of the sample median (2), the condition stated below is obtained:
- (ii)
- When the MSE from Equation (27) is compared with that from Equation (5), the following condition is derived:
- (iii)
- A specific condition can be derived by evaluating the MSE of the estimators from Equation (27) against the MSE presented in Equation (7):
- (iv)
- A specific condition follows from evaluating the MSE of the estimators in Equation (27) against that in Equation (12):
- (v)
- The following condition is derived by examining the mean squared error of the estimators in Equation (27) alongside the MSE provided in Equation (13):
- (vi)
- The MSE of the proposed estimator family, as expressed in Equation (27), is examined to establish the following condition:where
- (vii)
- A comparison between the MSE of the new estimator family in (27) and MSE yields the following condition:whereand
- (viii)
- A specific condition is derived by analyzing the mean squared error of the estimators expressed in Equation (27) against the MSE formulation in Equation (22):whereand
Practical Feasibility and Computational Considerations
6. Analysis of Results
6.1. Simulation Study
- Population 1: The first population assumes a heavy-tailed Cauchy distribution for X, specified with location and scale . The association between X and Y is negative, fixed at .
- Population 2: In the second case, X is uniformly distributed between 17 and 24. This distribution is considered independent of Y, i.e., no correlation is introduced.
- Population 3: For the third population, X is modeled by an exponential distribution with parameter , capturing a strong right-skew. The correlation with Y is set at .
- Population 4: The fourth design specifies X as following a gamma distribution, parameterized by and . The correlation with Y is moderately strong, at .
- Population 5: Finally, the fifth population generates X from a log-normal distribution with parameters 11 and 6, representing a mildly skewed distribution. A correlation of is introduced with Y.
- Step 1: As part of the simulation design, a finite population consisting of units is generated for the variables X and The population is partitioned into L strata, which are defined either through prior knowledge of strata boundaries or through the use of an auxiliary variable.
- Step 2: The first stage of sampling involves drawing a stratified sample of total size m. For each stratum hth , a subsample of size is selected using SRSWOR. The distribution of across strata is arranged by fixed-quota allocation rules.
- Step 3: Following the first-phase stratified sampling, a second-phase subsample comprising n total observations is selected. Within each stratum h, units are drawn from the first-phase units using SRSWOR.
- Step 4: Consistent with the two-phase design, multiple settings are examined with . For each pair, the stratum-specific allocations are assigned based on the selected allocation strategy.
- Scheme 1: The total first-phase sample size m takes values 300, 500, 800, and for each m, the second-phase size n assumes 0.10 m, 0.20 m, 0.30 m, 0.40 m (rounded). Both and are equally distributed across all strata.
- Scheme 2: Four pairs , , , and are examined, keeping equal stratum allocation.
- Scheme 3: A finer set of designs combines 150, 250, 400, 600, 900 with 0.10 m, 0.25 m, 0.40 m, deriving and by equal allocation across strata.
- Step 5: The efficiency of the estimators is examined by deriving the necessary stratum-level statistics from the selected samples in accordance with the previously outlined methodology. In the case of existing estimators that depend on unknown constants, the corresponding parameters are optimized using stratified estimates.
- Step 6 (Simulation of stratified samples): For each population and each chosen :
- 1.
- Per-stratum allocations are determined according to the allocation rule.
- 2.
- From each stratum h, units are drawn from the population units by SRSWOR (first phase).
- 3.
- From the first-phase units in each stratum, units are drawn by SRSWOR (second phase).
- Step 7: For each pairing of and for all estimators, the MSE is evaluated using the stratified sampling design. This involves applying the design weights together with the stratum-level statistics obtained from the sampled data.
- Step 8: To obtain reliable results, the sampling process is repeated 20,000 times. For every estimator and arrangement, the mean squared error (MSE) is then computed as the average over these replications. The empirical MSE values for each estimator are obtained as:andwhere t (), , is the estimate from replication u, and is the true population parameter.
6.2. Real-Life Application
- Population 1. This information collected from [32], which provides comprehensive details on government schools for the academic year 2012–2013, is used for empirical evaluation. Primary and middle school enrollment data by gender represents the population. In particular, denotes the total number of government primary schools for both boys and girls, while represents the total number of enrolled students. Simultaneously, represents the total number of government-run middle schools that accept both genders, while records the overall number of students enrolled. It is accessible for download using the following URL: https://repository.lahoreschool.edu.pk/xmlui/bitstream/handle/123456789/13900/Dev-2014.pdf?sequence=1&isAllowed=y (accessed on 28 September 2025). The summary statistics is obtained as:
- Population 2. A finite population is examined using statistics from [33] to demonstrate the empirical performance of the suggested estimators. Information on industrial activity, particularly the number of registered factories and related employment levels, is provided by these data at the district and division levels. In this case, represents the total number of factories registered in 2010, while represents employment by division and district in 2010. The stated employment levels for 2012 are represented by , while the number of registered factories is represented by . The following website provides a download link: https://repository.lahoreschool.edu.pk/xmlui/bitstream/handle/123456789/13023/2013.pdf?sequence=1&isAllowed=y (accessed on 28 September 2025). The summary statistics is presented as:
- Population 3. The data on page [1] illustrates the amount of money people earn and spend on food. Here, Y represents the family’s food expenses, which vary according to their employment status and demonstrate how work can impact food expenditures. Weekly income is reflected in the variable X, providing a quick overview of the household’s financial situation. The data is divided into two groups, and the statistics are summarized as follows:
6.3. Discussion and Results
- The proposed transformation-based estimators continued to outperform the existing ones with both simulated and real-world data. They obtained significantly smaller errors in the mean squared in the simulated populations (Table 4, Figure 1), which suggests a higher accuracy of the median estimation. The same trends were noted with empirical data; the approach was highly effective (Table 5, Figure 2). Regardless of the underlying data, the suggested estimators performed well and effectively, achieving the smallest values of MSE in any practical situation where they could be applied. Such findings show that the transformation-based method is useful in the production of reliable and accurate median estimates in a variety of contexts.
- Figure 1 and Figure 2 reveal that the proposed estimators perform reliably across varying correlation levels between study and auxiliary variables. As indicated in Table 4 and Table 5, their efficiency is sustained even when is considerably smaller than , making them suitable for cost-limited stratified sampling.
- In addition to Figure 1 and Figure 2, the comparative patterns shown in Figure 3 and Figure 4 clearly demonstrate the consistency of the proposed estimators across both simulated and real data settings. Figure 3 highlights the relative efficiency trends presented in Table 6, where the quantile based estimators such as , , and show much higher efficiency values compared with traditional approaches, regardless of the shape of the distribution or the degree of skewness. The efficiency curves remain stable and clearly separated from the existing methods, confirming the strength of the transformation based formulation under a wide range of population conditions. In a similar way, Figure 4, which summarizes the empirical findings from Table 7, supports this pattern using real data. The proposed estimators provide higher precision and smaller sampling variability in all three populations, showing that the quantile transformations work effectively beyond the simulated framework. Taken together, the graphical and tabular evidence confirms that the new family of estimators improves numerical accuracy and gives a more reliable performance in different and realistic data situations.
- When the underlying population deviates from normality, especially when there is moderate to high skewness or a small percentage of extreme observations in the data, the suggested estimators show notable efficiency gains. According to simulation results, the quantile-based transformations maintain stability and produce smaller mean squared errors than conventional ratio or regression-type estimators when applied to right-skewed and heavy-tailed distributions, such as exponential or log-normal models. The performance advantage increases further when contamination levels of up to 10–15 % are added, demonstrating the suggested class of estimators stability in irregular and heterogeneous data environments.
| Estimator | |||||
|---|---|---|---|---|---|
| Estimator | Population-1 | Population-2 | Population-3 |
|---|---|---|---|
| Estimator | |||||
|---|---|---|---|---|---|
| 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | |
| 112.13 | 106.85 | 113.71 | 118.26 | 106.23 | |
| 129.06 | 114.88 | 133.02 | 139.02 | 120.00 | |
| 123.02 | 112.19 | 125.14 | 130.73 | 117.68 | |
| 69.51 | 88.64 | 78.87 | 74.03 | 68.60 | |
| 134.12 | 116.75 | 136.59 | 143.94 | 124.93 | |
| 137.90 | 118.68 | 138.27 | 146.15 | 126.04 | |
| 142.50 | 121.28 | 140.88 | 148.44 | 128.31 | |
| Estimator | Population-1 | Population-2 | Population-3 |
|---|---|---|---|
| 100 | 100 | 100 | |
| 186 | 250 | 270 | |
| 186 | 272 | 332 | |
| 150 | 234 | 240 | |
| 65 | 123 | 104 | |
| 200 | 614 | 57 | |
| 814 | 590 | 98 | |
| 785 | 554 | 169 | |
| 3480 | 1442 | 471 | |
| 1833 | 763 | 413 | |
| 1786 | 1016 | 459 | |
| 4373 | 2016 | 509 | |
| 1972 | 633 | 429 | |
| 1805 | 880 | 431 | |
| 1833 | 773 | 439 | |
| 1964 | 628 | 397 |




7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
- Särndal, C.E. Sample survey theory vs. general statistical theory: Estimation of the population mean. Int. Stat. Rev. Int. Stat. 1972, 40, 1–12. [Google Scholar] [CrossRef]
- Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 20. [Google Scholar] [CrossRef]
- Alomair, M.A.; Daraz, U. Dual transformation of auxiliary variables by using outliers in stratified random sampling. Mathematics 2024, 12, 2829. [Google Scholar] [CrossRef]
- Gross, S. Median estimation in sample surveys. In Proceedings of the Section on Survey Research Methods, American Statistical Association Ithaca, Alexandria, VA, USA, 7–9 May 1980. [Google Scholar]
- Sedransk, J.; Meyer, J. Confidence intervals for the quantiles of a finite population: Simple random and stratified simple random sampling. J. R. Stat. Soc. Ser. B (Methodol.) 1978, 40, 239–252. [Google Scholar] [CrossRef]
- Philip, S.; Sedransk, J. Lower bounds for confidence coefficients for confidence intervals for finite population quantiles. Commun. Stat.-Theory Methods 1983, 12, 1329–1344. [Google Scholar] [CrossRef]
- Kuk, Y.C.A.; Mak, T.K. Median estimation in the presence of auxiliary information. J. R. Stat. Soc. Ser. B 1989, 51, 261–269. [Google Scholar] [CrossRef]
- Rao, T.J. On certail methods of improving ration and regression estimators. Commun. Stat.-Theory Methods 1991, 20, 3325–3340. [Google Scholar] [CrossRef]
- Singh, S.; Joarder, A.H.; Tracy, D.S. Median estimation using double sampling. Aust. N. Z. J. Stat. 2001, 43, 33–46. [Google Scholar] [CrossRef]
- Khoshnevisan, M.; Singh, H.P.; Singh, S.; Smarandache, F. A General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling; Virginia Polytechnic Institute and State University: Blacksburg, VA, USA, 2002. [Google Scholar]
- Gupta, S.; Shabbir, J.; Ahmad, S. Estimation of median in two-phase sampling using two auxiliary variables. Commun. Stat.-Theory Methods 2008, 37, 1815–1822. [Google Scholar] [CrossRef]
- Singh, S. Advanced Sampling Theory with Applications: How Michael Selected Amy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; Volume 2. [Google Scholar]
- Subzar, M.; Lone, S.A.; Ekpenyong, E.J.; Salam, A.; Aslam, M.; Raja, T.A.; Almutlak, S.A. Efficient class of ratio cum median estimators for estimating the population median. PLoS ONE 2023, 18, e0274690. [Google Scholar] [CrossRef]
- Iseh, M.J. Model formulation on efficiency for median estimation under a fixed cost in survey sampling. Model Assist. Stat. Appl. 2023, 18, 373–385. [Google Scholar] [CrossRef]
- Hoaglin, D.C.; Mosteller, F.; Tukey, J.W. Understanding Robust and Exploratory Data Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
- Chen, M.; Chen, W.X.; Yang, R.; Zhou, Y.W. Exponential-Poisson parameters estimation in moving extremes ranked set sampling design. Acta Math. Appl. Sin. Engl. Ser. 2025, 41, 973–984. [Google Scholar] [CrossRef]
- Alshanbari, H.M. A generalized estimation strategy for the finite population median using transformation methods under a two-phase sampling design. Symmetry 2025, 17, 1696. [Google Scholar] [CrossRef]
- Alshanbari, H.M. A novel robust transformation approach to finite population median estimation using Monte Carlo simulation and empirical data. Axioms 2025, 14, 737. [Google Scholar] [CrossRef]
- Alghamdi, A.S.; Almulhim, F.A. Improved median estimation in stratified surveys via nontraditional auxiliary measures. Symmetry 2025, 17, 1136. [Google Scholar] [CrossRef]
- Alghamdi, A.S.; Almulhim, F.A. Stratified median estimation using auxiliary transformations: A robust and efficient approach in asymmetric populations. Symmetry 2025, 17, 1127. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S. A generalized class of difference type estimators for population median in survey sampling. Hacet. J. Math. Stat. 2017, 46, 1015–1028. [Google Scholar] [CrossRef]
- Irfan, M.; Maria, J.; Shongwe, S.C.; Zohaib, M.; Bhatti, S.H. Estimation of population median under robust measures of an auxiliary variable. Math. Probl. Eng. 2021, 2021, 4839077. [Google Scholar] [CrossRef]
- Shabbir, J.; Gupta, S.; Narjis, G. On improved class of difference type estimators for population median in survey sampling. Commun. Stat.-Theory Methods 2022, 51, 3334–3354. [Google Scholar] [CrossRef]
- Hussain, M.A.; Javed, M.; Zohaib, M.; Shongwe, S.C.; Awais, M.; Zaagan, A.A.; Irfan, M. Estimation of population median using bivariate auxiliary information in simple random sampling. Heliyon 2024, 10, e28891. [Google Scholar] [CrossRef]
- Bhushan, S.; Kumar, A.; Lone, S.A.; Anwar, S.; Gunaime, N.M. An efficient class of estimators in stratified random sampling with an application to real data. Axioms 2023, 12, 576. [Google Scholar] [CrossRef]
- Stigler, S.M. Linear functions of order statistics. Ann. Math. Stat. 1969, 40, 770–788. [Google Scholar] [CrossRef]
- Daraz, U.; Almulhim, F.A.; Alomair, M.A.; Alomair, A.M. Population median estimation using auxiliary variables: A simulation study with real data across sample sizes and parameters. Mathematics 2025, 13, 1660. [Google Scholar] [CrossRef]
- Aladag, S.; Cingi, H. Improvement in estimating the population median in simple random sampling and stratified random sampling using auxiliary information. Commun. Stat.-Theory Methods 2015, 44, 1013–1032. [Google Scholar] [CrossRef]
- Solanki, R.S.; Singh, H.P. Some classes of estimators for median estimation in survey sampling. Commun. Stat.-Theory Methods 2015, 44, 1450–1465. [Google Scholar] [CrossRef]
- Singh, H.P.; Vishwakarma, G.K. Modified exponential ratio and product estimators for finite population mean in double sampling. Austrian J. Stat. 2007, 36, 217–225. [Google Scholar] [CrossRef]
- Bureau of Statistics. Punjab Development Statistics 2014; Government of the Punjab, Lahore: Islamabad, Pakistan, 2014.
- Bureau of Statistics. Punjab Development Statistics 2013; Government of the Punjab, Lahore: Islamabad, Pakistan, 2013.
| Symbol | Description | Symbol | Description |
|---|---|---|---|
| N | Population size | L | Number of strata |
| Units in stratum h | First-phase sample size in stratum h | ||
| Second-phase sample size in stratum h | m | Total first-phase sample size | |
| n | Total second-phase sample size | Y | Study variable |
| X | Auxiliary variable | Population medians of Y and X | |
| Second-phase sample medians | First-phase sample median of X | ||
| Probability density at medians | Stratum weight | ||
| Correlation between Y and X | Joint probability function | ||
| Relative error terms | Median coefficients | ||
| Covariance coefficient | Sampling constants | ||
| First and third quartiles | Interquartile range | ||
| Quartile deviation | Quartile average | ||
| Trimmed mean | Decile mean | ||
| Median absolute deviation | Mid-range of | ||
| Standard deviation of X | Skewness of X | ||
| – | Proposed estimators (quantile-based) | Mean squared error | |
| Bias of estimator | Transformation constants | ||
| Calibration parameters | Covariance of Y and X in h stratum |
| Estimator | ||||
|---|---|---|---|---|
| 1 | ||||
| 1 | ||||
| 1 | ||||
| 1 | 1 | |||
| 1 | ||||
| 1 | ||||
| 1 | ||||
| 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Almulhim, F.A.; Aljohani, H.M. Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling. Entropy 2025, 27, 1191. https://doi.org/10.3390/e27121191
Almulhim FA, Aljohani HM. Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling. Entropy. 2025; 27(12):1191. https://doi.org/10.3390/e27121191
Chicago/Turabian StyleAlmulhim, Fatimah A., and Hassan M. Aljohani. 2025. "Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling" Entropy 27, no. 12: 1191. https://doi.org/10.3390/e27121191
APA StyleAlmulhim, F. A., & Aljohani, H. M. (2025). Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling. Entropy, 27(12), 1191. https://doi.org/10.3390/e27121191

