Next Article in Journal
Geometry-Based Bounds on the Capacity of Peak-Limited and Band-Limited Signals over the Additive White Gaussian Noise Channel at a High SNR
Previous Article in Journal
Representative Points of the Inverse Gaussian Distribution and Their Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling

by
Fatimah A. Almulhim
1 and
Hassan M. Aljohani
2,*
1
Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
2
Department of Mathematics and Statistics, College of Science, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(12), 1191; https://doi.org/10.3390/e27121191
Submission received: 2 October 2025 / Revised: 11 November 2025 / Accepted: 17 November 2025 / Published: 24 November 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

Most traditional estimators assume normality and remain sensitive to extreme observations, which limits their usefulness in practical applications. To improve accuracy, we introduce quintile-based median estimators using transformation methods in a stratified two-phase sampling technique. The design allows for efficient use of auxiliary data and enhances robustness across heterogeneous strata. Stratified sampling further reduces variability by ensuring representation from all subgroups within the population. Bias and mean squared error expressions are obtained through first-order approximations. The efficiency of the proposed estimators is evaluated using the mean squared error (MSE) as the benchmark criterion. The effectiveness of the proposed estimators is examined by conducting simulations under various skewed distributions. To strengthen the conclusions, additional analysis is performed on real population datasets. Simulation and empirical studies confirm the superior performance of the proposed methods. The findings show that the suggested estimators perform well in practical situations involving median estimation as well as achieving higher precision and effectiveness than existing estimators.

1. Introduction

In survey research, obtaining precise estimates is more challenging when the population is biased or includes outliers. The median provides a more accurate picture of central tendency in such cases, while the arithmetic mean can provide misleading findings. In different fields such as monitoring computer network traffic, ecological species distribution studies, and public policy evaluations, where extreme responses significantly influence average outcomes, median-based analysis has achieved significance due to its statistical reliability. Stratified two-phase sampling is an effective modification of stratification methods that achieves a balance between survey efficiency and precision. The population is divided into strata, from which the initial sample is selected for each group, followed by a second subsample to enhance estimations. This method effectively estimates the finite population median due to its robustness against outliers and skewness. Applications in real life highlight the method’s improvements: classifying by occupation in economic surveys helps in the production of accurate estimates of median income, and stratifying by age or illness severity in medical studies improves the computation of median recovery times. These applications emphasize the effectiveness of stratified two-phase sampling as an accurate method of survey study. Details concerning auxiliary information are available in [1,2,3,4].
The precision of median estimation has progressively improved through the use of auxiliary information. Foundational studies [5,6,7] introduced the role of auxiliary variables in developing efficient estimators. Regression and ratio-based approaches [8,9] further enhanced estimation accuracy. Later, generalized estimators employing multiple auxiliary variables and double sampling methods [10] were proposed to address the absence of complete auxiliary data [11]. Two-phase designs offered an optimal balance between cost and precision [12], while known auxiliary medians enabled the formulation of unbiased estimators [13]. An efficient class of ratio-cum-median estimators for estimating the population median was proposed by [14], while [15] discussed the formulation of efficiency for median estimation under a fixed cost in survey sampling. A detailed understanding of robust and exploratory data analysis was introduced by [16], providing valuable insights into handling outliers and non-normal data structures. This foundation supports the development of more reliable and efficient statistical estimators in complex sampling scenarios.
Over time, scholars have also explored transformation-based and non-parametric strategies to enhance the robustness of median estimators under non-normal or skewed conditions. The inclusion of auxiliary information not only improves efficiency but also mitigates the influence of extreme values, ensuring stability across diverse population structures. Recent advancements integrate techniques such as exponential-type transformations, calibration adjustments, and Bayesian refinements to achieve higher precision without inflating survey costs. Exponential-Poisson parameters estimation in moving extremes ranked set sampling design discussed by [17]. Some generalized classes of estimators under two-phase sampling approach introduced by [18,19]. Improved median estimation in stratified surveys via non traditional auxiliary measures discussed by [20,21]. Furthermore, simulation-based validation and empirical analyses using real-world datasets have confirmed the practical utility of these enhanced estimators in socioeconomic, agricultural, and industrial sampling frameworks. The detailed reviews on this topic can be found in [22,23,24,25,26,27,28,29,30], which collectively provide comprehensive insights into the theoretical evolution, comparative efficiency, and modern extensions of median estimation methodologies.
Populations that deviate from normality or display skewed patterns often challenge the reliability of conventional estimators such as the ratio, regression, exponential, and product types. These estimators assume data symmetry and tend to perform poorly in the presence of extreme or irregular values. To overcome the constraints identified by [28], this study introduces a class of transformation-based estimators constructed within a two-phase stratified random sampling design. The proposed framework strengthens estimator performance by combining the robustness of the central measure with the efficiency improvement obtained through auxiliary data collected during the first phase. This dual advantage makes our estimators uniquely suited for modern survey contexts, where complex populations rarely align with standard distributions. These transformations capture distributional features such as spread, skewness, and tail behavior, allowing the proposed estimators to remain efficient and stable even in highly irregular settings. This technique is especially effective for heterogeneous and unstable datasets, such as those produced by climate-sensitive agricultural yields, defective manufacturing processes, or uneven academic performance distributions. The proposed estimator class offers major improvements in survey methodology. It performs reliably in skewed or non-normal populations and remains stable against outliers, ensuring accurate results under real data conditions. The two-phase stratified design provides cost-efficient robustness by utilizing first-phase auxiliary data, while transformation-based refinements enhance efficiency beyond classical ratio and regression estimators. Its flexibility supports applications in agriculture, healthcare, and economics, with simulation results confirming strong cross-distributional consistency and reliability.

1.1. Quantile Background

Quantiles present a distributional summary by dividing the data into ordered sets and identifying the number that represents such divisions. The pth quantile of a continuous random variable X, with distribution function F ( x ) , is defined as
Q ( p ) = inf { x : F ( x ) p } , 0 < p < 1 .
The median Q ( 0.50 ) , the first and third quartiles Q ( 0.25 ) and Q ( 0.75 ) , and quantile-based measures like the interquartile range and median absolute deviation are significant special scenarios. Unlike means and variances, quantiles are not affected by extreme values and are also stable in case of skewed or heavy-tailed distributions. Due to such attributes, quantile-based methods are particularly well adapted to robust inference, especially in survey sampling situations where population values often contain strong outliers or are no longer normality.

1.2. Research Objectives

The primary purpose of this research is to develop efficient median estimators when it comes to stratified two-phase sampling and in particular populations that have skewness or extreme cases. The proposed approach increases accuracy and robustness through quantile-based modifications in order to effectively model distributional characteristics such as shape, spread, and tail behavior. The paper also aims at obtaining theoretical features of these estimators, including expressions of mean squared error and first-order bias and constructing analytic. Extensive Monte Carlo simulations are carried out over several skewed distributions to evaluate practical performance, and empirical utility is demonstrated by analyzing actual survey data from various application domains.

1.3. Key Contributions

A general family of double-exponential median estimators using quantile-based measures, such as quartile deviation, trimmed mean, decile mean, quartile average, product measure, interquartile range, and median absolute deviation, is shown in this study. The suggested estimators overcome the drawbacks of existing two-phase stratified median techniques by improving robustness to skewness, wide tails, and heterogeneous strata. Under the stratified two-phase approach, unified theoretical results for bias and mean squared error are presented, together with analytical superiority criteria that illustrate situations in which new estimators attain higher efficiency. In comparison to conventional and modern estimators, empirical assessments, involving simulations and actual datasets, consistently show improvements in mean squared error. With every aspect considered, the study provides a flexible methodological framework that can be expanded to include multi-phase sampling designs and additional reliable location measures.

2. Methodology and Notations

In this section, we begin by laying out the notations and earlier work on median estimation. Assume a population of size N is expressed as:
ξ = { ξ 1 , ξ 2 , , ξ N } .
Let the population be separated into L strata without overlap, with the hth stratum including N h units, such that
h = 1 L N h = N .
For every stratum, Y is the main variable under study, while X is treated as an auxiliary input to provide support for estimation. The first phase consists of selecting a simple random sample without replacement of size m h from stratum hth, where the overall sample size is given by:
h = 1 L m h = m .
Only values of the auxiliary variable x are collected at this point. This phase helps in estimating the overall population median of X denoted by M x .
The second-phase sample sizes in each stratum, drawn from the corresponding first-phase samples, are denoted by n h with n h < m h , satisfy the condition
h = 1 L n h = n .
All sub-sampling is carried out using simple random sampling without replacement (SRSWOR) within each stratum. This design ensures that auxiliary information obtained in the first phase enhances estimation in the second phase. In this stage, both Y and X values are recorded.
In the hth stratum, the medians of Y and X are denoted by M y h and M x h . Their two-phase sample analogues are M ´ x h , M ^ x h , and M ^ y h . The probability densities at the respective medians are expressed as f y h ( M y h ) and f x h ( M x h ) . The correlation between these medians is defined by
ρ ( M y h , M x h ) = 4 P 11 h ( y h , x h ) 1 ,
with P 11 h representing the probability that
P ( y h M y h , x h M x h ) .
The following expressions for relative error and corresponding expected values serve as the basis for first-order approximations of biases and mean squared errors:
e 0 h = M ^ y h M y h M y h ,
e 1 h = M ^ x h M x h M x h
and
e 2 h = M ´ x h M x h M x h ,
such that E e i h = 0 for i = 0 , 1 , 2 .
Moreover,
E e 0 h 2 = θ 1 h C M y h 2 ,
E e 1 h 2 = θ 1 h C M x h 2 ,
E e 2 h 2 = θ 2 h C M x h 2 ,
E e 0 h e 1 h = θ 1 h C M y x h = ρ y x h C M y h C M x h ,
E e 0 h e 2 h = θ 2 h C M y x h = ρ y x h C M y h C M x h ,
E e 1 h e 2 h = θ 2 h C M x h 2 ,
where
C M y h = 1 M y h f y h ( M y h ) ,
C M x h = 1 M y h f x h ( M x h ) ,
Cov ( M y h , M x h ) = ρ y x h 4 f y h ( M y h ) f x h ( M x h ) ,
θ 1 h = 1 4 1 n h 1 N h , θ 2 h = 1 4 1 m h 1 N h , θ 3 h = 1 4 1 n h 1 m h .
Table 1 summarizes all the notations and symbols used in this study.

3. Stratified Two-Phase Existing Estimators

Various researchers have introduced a number of median estimators which are characterized by different degree of efficiency with respect to the sampling design, and the underlying population. We present the summarization of these estimators along with their theoretical properties, including bias, variance, and mean squared error, in this section. The review not only offers a point of reference to compare but also shows the possible ineffectiveness of current methods, which inspires the necessity of more effective ones.
The conventional sample median estimator and its variance expression, originally introduced by [5], are extended to the framework of stratified two-phase sampling and are defined as follows:
M ^ y s t = h = 1 L W h M ^ y h
and
V a r ( M ^ y s t ) = h = 1 L θ 1 h W h 2 M y h 2 C M y h 2 .
The ratio estimator proposed by [10] in the context of two-phase sampling is adapted in this study to stratified two-phase sampling, defined as:
M ^ A s t = h = 1 L W h M ^ y h M ^ x h M ´ x h .
The following expressions represent the first-order approximations of the bias and MSE for M ^ A s t :
B i a s M ^ A s t h = 1 L θ 3 h W h M y h C M x h 2 C M y x h
and
M S E M ^ A s t h = 1 L W h 2 M y h 2 θ 1 h C M y h 2 + θ 3 h C M x h 2 2 C M y x h .
In line with the approach described in [13], the difference-type estimator is modified for stratified two-phase sampling, leading to the expression for M ^ D 1 s t :
M ^ D 1 s t = h = 1 L W h M ^ y h + d 1 h M ´ x h M ^ x h .
Based on the first-order approximation, the minimum MSE of M ^ D 1 corresponding to the optimal d 1 h is defined as:
M S E M ^ D 1 s t m i n h = 1 L W h 2 M y h 2 C M y h 2 θ 1 h θ 3 h ρ y x h 2 ,
where
d 1 h o p t = ρ y x h M y h C M y h M x h C M x h .
Within the two-phase sampling framework, Ref. [31] introduced median versions of the exponential ratio and product estimators. In this study, these estimators are further generalized to stratified two-phase sampling as follows:
M ^ R e s t = h = 1 L W h M ^ y h exp M ´ x h M ^ x h M ´ x h + M ^ x h
and
M ^ P e s t = h = 1 L W h M ^ y h exp M ^ x h M ´ x h M ´ x h + M ^ x h .
The first-order approximations of the biases and MSEs for ( M ^ R e s t , M ^ P e s t ) are given below:
B i a s M ^ R e s t 1 2 h = 1 L θ 3 h W h M y h 3 4 C M x h 2 C M y x h ,
B i a s M ^ P e s t 1 2 h = 1 L θ 3 h W h M y 3 4 C M x 2 + C M y x ,
M S E M ^ R e s t h = 1 L W h 2 M y h 2 θ 1 h C M y h 2 + θ 3 h C M x h 2 1 4 K h
and
M S E M ^ P e s t h = 1 L W h 2 M y h 2 θ 1 h C M y h 2 + θ 3 h C M x h 2 1 4 + K h ,
where
K h = ρ y x h C M y h C M x h .
Following the work of [9,12], who introduced difference-type estimators for median estimation in a two-phase setting, this study extends those formulations to stratified two-phase sampling, defined as:
M ^ D 2 s t = h = 1 L W h d 2 h M ^ y h + d 3 h M ´ x h M ^ x h ,
M ^ D 3 s t = h = 1 L W h d 4 h M ^ y h + d 5 h M ´ x h M ^ x h M ´ x h M ^ x h ,
M ^ D 4 s t = h = 1 L W h d 6 h M ^ y h + d 7 h M ´ x h M ^ x h M ´ x h M ^ x h M ´ x h + M ^ x h .
The estimators ( D ^ 2 s t , D ^ 3 s t , D ^ 4 s t ) achieve their minimum biases and mean squared errors under first-order approximation, at the optimal values, which are given as:
B i a s M ^ D 2 s t m i n h = 1 L W h M y h θ 1 h C M y h 2 C M x h 2 θ 3 h C M y x h 2 C M x h 2 1 + θ 1 h C M y h 2 θ 3 h C M y x h 2 ,
B i a s M ^ D 3 s t m i n h = 1 L W h M y h d 4 h ( o p t ) 1 + θ 3 h d 4 h ( o p t ) M y h C M x h 2 C M y x h + d 5 h ( o p t ) M x h C M x h 2 ,
B i a s M ^ D 4 s t m i n h = 1 L W h M y h 1 + θ 3 h 4 C M x h 2 1 + d 6 h ( o p t ) M x h + 8 ρ y x h M y h M x h ,
M S E M ^ D 2 s t m i n h = 1 L W h 2 M y h 2 θ 1 h C M y h 2 C M x h 2 θ 3 h C M y x h 2 C M x h 2 1 + θ 1 h C M y h 2 θ 3 h C M y x h 2 ,
M S E M ^ D 3 s t min h = 1 L W h 2 [ B i a s M ^ D 3 s t min 2 + θ 1 h M y h 2 d 4 h ( opt ) 2 C M y h 2 + θ 3 h C M x h 2 ( d 4 h ( opt ) M y h + d 5 h ( opt ) M x h ) 2 2 d 4 h ( opt ) M y h C M y x h d 4 h ( opt ) M y h + d 5 h ( opt ) M x ]
and
M S E M ^ D 4 s t m i n h = 1 L W h 2 B i a s M ^ D 4 s t m i n + 1 4 θ 3 h d 6 h ( o p t ) 2 M y h 2 C M x h 2 ,
where
d 2 h ( o p t ) = C M x h 2 C M x h 2 1 + θ 1 h C M y h 2 θ 3 h C M y x h 2 ,
d 3 h ( o p t ) = M y h C M y x h M x h C M x h 2 1 + θ 1 h C M y h 2 θ 3 h C M y x h 2 ,
d 4 h ( o p t ) = C M x h 2 C M x h 2 1 + θ 1 h C M y h 2 1 + θ 3 h C M x h 2 + θ 3 h C M y x h 2 1 + C M x h 2 ,
d 5 h ( o p t ) = M y h C M x h 2 θ 1 h C M y h 2 1 + C M y x h 1 + θ 3 h C M y x h C M x h 2 1 + θ 1 h C M y h 2 1 + θ 3 h C M x h 2 + θ 3 h C M y x h 2 1 + C M x h 2 ,
d 6 h ( o p t ) = 1 8 8 θ 2 h C M x h 2 1 + θ 1 h C M x h 2 1 ρ y x h 2 ,
and
d 7 h o p t = M y h M x h 1 2 + d 6 h o p t ρ y x h M y h M x h 1 .

4. A New Stratified Family of Estimators

Inspired by the modified forms of estimators proposed by [28], we introduce a class of stratified transformation-based double exponential estimators within a two-phase stratified random sampling design. The framework strengthens estimation accuracy by combining the stability of a robust central tendency measure with the efficiency derived from auxiliary information obtained during the first phase. This combination provides a practical and reliable tool for modern survey research, where complex populations rarely follow theoretical assumptions. A stratified transformed double exponential family of estimators is expressed as follows:
E ^ s t = h = 1 L W h M ^ y h exp V 1 h t 1 h M ^ x h M ´ x h t 1 h M ´ x h + M ^ x h + 2 t 2 h exp V 2 h t 3 h M ´ x h M ^ x h t 3 h M ´ x h + M ^ x h + 2 t 4 h ,
where V i h , i = 1 , 2 be predetermined constants. The symbols t 1 h , t 2 h , t 3 h , t 4 h correspond to known characteristics of the population related to the auxiliary variable X. Using Equation (23) as a basis, new estimators are derived through different selections of these quantities. The chosen parameter values are specified in Table 2, and the resulting estimator formulations are displayed in Table 3.
where
I Q R h = Q 3 h Q 1 h ,
M R h = X h m a x + X h m i n 2 ,
Q A h = Q 3 h + Q 1 h 2 ,
Q D h = Q 3 h Q 1 h 2 ,
T M h = Q 1 h + 2 Q 2 h + Q 3 h 4 ,
D M h = i = 1 9 D i h 9 ,
M A D h = median X i h M x h : i = 1 , , N h ,
σ X h = 1 N h i = 1 N h X i h X h ¯ 2 ,
S k ( X h ) = 1 N h i = 1 N h X i h X h ¯ 3 σ X h 3 ,
P M = Q 1 h Q 3 h .
Table 3. Generalized family of estimators under stratified two-phase sampling.
Table 3. Generalized family of estimators under stratified two-phase sampling.
E ^ 1 s t = h = 1 L W h M ^ y h exp V 1 h Q D h M ^ x h M ´ x h Q D h M ´ x h + M ^ x h + 2 M A D h exp V 2 h M ´ x h M ^ x h M ´ x h + M ^ x h + 2 ( X h m a x X h m i n )
E ^ 2 s t = h = 1 L W h M ^ y h exp V 1 h T M h M ^ x h M ´ x h T M h M ´ x h + M ^ x h + 2 M R h exp V 2 h M ´ x h M ^ x h M ´ x h + M ^ x h + 2 I Q R h
E ^ 3 s t = h = 1 L W h M ^ y h exp V 1 h D M h M ^ x h M ´ x h D M h M ´ x h + M ^ x h + 2 M A D h exp V 2 h M ´ x h M ^ x h M ´ x h + M ^ x h + 2 Q D h
E ^ 4 s t = h = 1 L W h M ^ y h exp V 1 h S k ( X h ) M ^ x h M ´ x h S k ( X h ) M ´ x h + M ^ x h + 2 exp V 2 h M ´ x h M ^ x h M ´ x h + M ^ x h + 2 ( Q 3 h Q 2 h )
E ^ 5 s t = h = 1 L W h M ^ y h exp V 1 h log ( Q 3 h 1 ) M ^ x h M ´ x h l o g ( Q 3 h 1 ) M ´ x h + M ^ x h + 2 log ( Q 1 h 1 ) exp V 2 h M ´ x h M ^ x h M ´ x h + M ^ x h + 2 log ( M R h 1 )
E ^ 6 s t = h = 1 L W h M ^ y h exp V 1 h Q A h M ^ x h M ´ x h Q A h M ´ x h + M ^ x h + 2 σ X h exp V 2 h M ´ x h M ^ x h M ´ x h + M ^ x h + 2 Q D h
E ^ 7 s t = h = 1 L W h M ^ y h exp V 1 h X h m e d i a n M ^ x h M ´ x h X h m e d i a n M ´ x h + M ^ x h + 2 Q R h exp V 2 h M ´ x h M ^ x h M ´ x h + M ^ x h + 2 I Q R h
E ^ 8 s t = h = 1 L W h M ^ y h exp V 1 h Q 1 h Q 3 h M ^ x h M ´ x h Q 1 h Q 3 h M ´ x h + M ^ x h + 2 X h m a x X h m i n exp V 2 h M ´ x h M ^ x h M ´ x h + M ^ x h + 2 I Q R h

4.1. Conceptual Explanation of the Proposed Transformations

The calibration constants t i h and V i h control how auxiliary data influences the estimation exponential structure. Moment-based measures, which are commonly used in classical exponential estimators, can prove ineffective when the population is skew or contains extreme values. By introducing quantile-based scaling quantities, the suggested framework, on the other hand, provides continuity and resistance against heterogeneous strata, asymmetric distributions, and heavy tails. A significant distributional feature, such as median-centered spread, tail behavior, or resistance to extreme observations, is reflected in each chosen transformation. Because of this, the auxiliary variable improves efficiency and maintains robustness for median estimation under stratified two-phase sampling by providing distributionally significant data even in the presence of deviations from normality.

4.2. Underlying Logic for Each Quantile Measure Used in the Methodology

  • Quartile deviation (QD): When variation increases by outliers, the quartile deviation (QD) is a useful tool for capturing variability around the median.
  • Median absolute deviation (MAD): A highly effective scale measure that reduces the impact of extreme observations is the median absolute deviation (MAD).
  • Trimmed mean (TM): Reductions in tail values under high skewness to provide a stable location measure.
  • Mid range (MR): Useful for auxiliary variables with wide range, it reflects the behavior of extreme values.
  • Interquartile range IQR: The IQR provides moderate efficiency and robustness by summarizing the middle half of the distribution.
  • Decile mean (DM): The shape of the distribution is summarized and aspects of inequality that go beyond conventional dispersion measures are captured by the decile mean (DM), which is the average of the distribution’s decile values.
  • Skewness measure Sk ( X h ) : Corrects the estimator for directional asymmetry in each stratum.
  • Quartile average (QA): A balanced alternative for the median is the quartile average (QA), a smooth measure of central tendency.
  • Product measure (PM): Product measure represents a balanced distribution around the median, which is particularly helpful in skewed environments.

4.3. Remark on the Structure of the Proposed Family

In particular, the suggested class of estimators has eight distinct variants, each represented by E ^ 1 s t through E ^ 8 s t . Since each estimator is developed using a different quantile-based transformation of the auxiliary variable, they all show different tail behavior and sensitivity to distributional asymmetry. For example, E ^ 1 s t and E ^ 3 s t are more resilient to extreme values because they rely on quartile deviation and median absolute deviation, respectively. By using skewness and logarithmic quantile measures, estimators like E ^ 4 s t and E ^ 5 s t can more directly adapt to linear asymmetry. A combination of composite quantile summaries like the quartile average and geometric quartile product, E ^ 6 s t E ^ 8 s t offers a balanced response to both central and tail information. Even when the population deviates greatly from symmetry, this adaptable structure ensures that the family can accommodate different distributional patterns across strata, enabling effective median estimation.
The following theorem presents the mathematical expressions of the bias and mean squared error associated with the ratio–product-type family of estimators E ^ s t .
Theorem 1. 
Let E ^ s t denote an exponential ratio–product family of estimators constructed under a stratified two-phase sampling scheme for the estimation of the population median M y . The resulting bias and mean squared error (MSE) formulations are presented below:
B i a s E ^ s t h = 1 L W h M y h 1 2 k 1 h k 2 h θ 3 h C M y x h 1 8 k 1 h 2 k 2 h 2 2 k 1 h k 2 h θ 3 h C M x h 2
and
M S E E ^ s t h = 1 L W h 2 M y h 2 θ 1 h C M y h 2 + 1 4 θ 3 h C M x h 2 k 1 h 2 k 2 h 2 2 k 1 h k 2 h + θ 3 h C M y x h k 1 h k 2 h .
Proof. 
For completeness, the concepts necessary for proving the theorem are summarized below:
e 0 h = M ^ y h M y h M y h ,
e 1 h = M ^ x h M x h M x h
and
e 2 h = M ´ x h M x h M x h ,
such that E e i h = 0 for i = 0 , 1 , 2 .
Moreover,
E e 0 h 2 = θ 1 h C M y h 2 ,
E e 1 h 2 = θ 1 h C M x h 2 ,
E e 2 h 2 = θ 2 h C M x h 2 ,
E e 0 h e 1 h = θ 1 h C M y x h = ρ y x h C M y h C M x h ,
E e 0 h e 2 h = θ 2 h C M y x h = ρ y x h C M y h C M x h ,
E e 1 h e 2 h = θ 2 h C M x h 2 ,
where
C M y h = 1 M y h f y h ( M y h ) ,
C M x h = 1 M y h f x h ( M x h ) ,
Cov ( M y h , M x h ) = ρ y x h 4 f y h ( M y h ) f x h ( M x h )
θ 1 h = 1 4 1 n h 1 N h , θ 2 h = 1 4 1 m h 1 N h ,
θ 3 h = 1 4 1 n h 1 m h .
Expressing (23) through relative errors allows us to obtain an analytical expression:
E ^ s t = h = 1 L W h M y h 1 + e 0 h exp V 1 h k 1 h e 1 h e 2 h 2 1 + k 1 h e 1 h + e 2 h 2 1 × exp V 2 h k 2 h e 1 h e 2 h 2 1 + k 2 h e 1 h + e 2 h 2 1
where V 1 h , V 2 h , k 1 h , and k 2 h are defined as:
V 1 h = V 2 h = 1 ,
k 1 h = t 1 h M x h t 1 h M x h + t 2 h
and
k 2 h = t 3 h M x h t 3 h M x h + t 4 h .
The right-hand side of Equation (24) is approximated via a first-order Taylor expansion, disregarding higher-order terms ( e i h > 2 ) as their effects are negligible. This yields:
E ^ s t = h = 1 L W h M y h 1 + e 0 h exp k 1 h e 1 h e 2 h 2 1 k 1 h e 1 h + e 2 h 2 + k 1 h 2 e 1 h + e 2 h 2 4 × exp k 2 h e 1 h e 2 h 2 1 k 2 h e 1 h + e 2 h 2 + k 2 h 2 e 1 h + e 2 h 2 4 ,
E ^ s t = h = 1 L W h M y h 1 + e 0 h exp k 1 h e 1 h e 2 h 2 k 1 h 2 e 1 h 2 e 2 h 2 4 exp k 2 h e 2 h e 1 h 2 k 2 h 2 e 1 h 2 e 2 h 2 4 .
After simplifying, we obtain:
E ^ s t h = 1 L W h M y h h = 1 L W h M y h [ e 0 h + k 1 h 2 e 1 h e 2 h + e 0 h e 1 h e 0 h e 2 h + k 2 h 2 e 2 h e 1 h e 0 h e 1 h + e 0 h e 2 h k 1 h 2 8 e 1 h 2 3 e 2 h 2 + 2 e 1 h e 2 h k 2 h 2 8 e 1 h 2 3 e 2 h 2 + 2 e 1 h e 2 h k 1 h k 2 h 4 e 1 h 2 e 2 h 2 2 e 1 h e 2 h ] .
The bias of E ^ s t is computed by applying the expectation operator to Equation (25) and replacing each term e 0 h , e 1 h , e 1 h , e 0 h 2 , e 1 h 2 , e 2 h 2 , e 0 h e 1 h , e 0 h e 2 h , e 1 h e 2 h ) with its expected value, leading to:
B i a s E ^ s t h = 1 L W h M y h [ 1 2 k 1 h k 2 h θ 1 h C M y x h θ 2 h C M y x h 1 8 k 1 h 2 k 2 h 2 × θ 1 h C M x h 2 θ 2 h C M x h 2 k 1 h k 2 h 4 θ 1 h C M x h 2 θ 2 h C M x h 2 ] .
After simplification, we get:
B i a s E ^ s t h = 1 L W h M y h 1 2 k 1 h k 2 h θ 3 h C M y x h 1 8 k 1 h 2 k 2 h 2 2 k 1 h k 2 h θ 3 h C M x h 2 ,
where
θ 3 h = θ 1 h θ 2 h .
Under a first-order approximation, the MSE of E ^ s t is derived by squaring Equation (25) and taking expectations, keeping only terms up to second order in e h s :
M S E E ^ s t h = 1 L W h 2 M y h 2 θ 1 h C M y h 2 + 1 4 θ 3 h C M x h 2 k 1 h 2 + k 2 h 2 2 k 1 h k 2 h + θ 3 h C M y x h k 1 h k 2 h .

5. Evaluation Framework and Conditions

From the MSE formulation of the proposed estimator (Theorem 1, Equation (27)),
M S E E ^ s t h = 1 L W h 2 M y h 2 θ 1 h C M y h 2 + 1 4 θ 3 h C M x h 2 k 1 h 2 + k 2 h 2 2 k 1 h k 2 h + θ 3 h C M y x h k 1 h k 2 h .
the superiority of E ^ s t over its counterparts in Section 2 is ensured under the following inequalities.
(i)
Upon comparing the MSE expression derived for the proposed estimator family (27) with the corresponding variance of the sample median (2), the condition stated below is obtained:
V a r ( M ^ y s t ) > M S E E ^ s t if
h = 1 L θ 3 h W h 2 M y h 2 C M x h 2 k 1 h 2 + k 2 h 2 2 k 1 h k 2 h + 4 C M y x h k 1 h k 2 h > 0 .
(ii)
When the MSE from Equation (27) is compared with that from Equation (5), the following condition is derived:
M S E ( M ^ A s t ) > M S E E ^ s t if
4 h = 1 L W h 2 M y h 2 C M x h 2 > h = 1 L θ 3 h W h 2 M y h 2 C M x h 2 k 1 h 2 k 2 h 2 2 k 1 h k 2 h + θ 3 h C M y x h k 1 h k 2 h + 2 .
(iii)
A specific condition can be derived by evaluating the MSE of the estimators from Equation (27) against the MSE presented in Equation (7):
M S E ( M ^ D 1 s t ) m i n > M S E E ^ s t if
h = 1 L θ 3 h W h 2 M y h 2 C M x h 2 k 1 h 2 + k 2 h 2 2 k 1 h k 2 h + 4 C M y x h ( k 1 h k 2 h ) > 4 h = 1 L θ 3 h W h 2 M y h 2 C M y h 2 ρ y x h 2 .
(iv)
A specific condition follows from evaluating the MSE of the estimators in Equation (27) against that in Equation (12):
M S E ( M ^ R e s t ) > M S E E ^ s t if
h = 1 L θ 3 h W h 2 M y h 2 C M x h 2 1 4 K h > h = 1 L θ 3 h W h 2 M y h 2 C M x h 2 k 1 h 2 + k 2 h 2 2 k 1 h k 2 h + 4 C M y x h ( k 1 h k 2 h ) .
(v)
The following condition is derived by examining the mean squared error of the estimators in Equation (27) alongside the MSE provided in Equation (13):
M S E ( M ^ P e s t ) > M S E E ^ s t if
h = 1 L θ 3 h W h 2 M y h 2 C M x h 2 1 + 4 K h > h = 1 L θ 3 h W h 2 M y h 2 C M x h 2 k 1 h 2 + k 2 h 2 2 k 1 h k 2 h + 4 C M y x h ( k 1 h k 2 h ) .
(vi)
The MSE of the proposed estimator family, as expressed in Equation (27), is examined to establish the following condition:
M S E ( M ^ D 2 s t ) min > M S E E ^ s t if
h = 1 L W h 2 M y h 2 ( 1 R h ) θ 1 h C M y h 2 C M x h 2 θ 3 h C M y x h 2 R h C M x h 2 C M x h 2 1 + θ 1 h C M y h 2 θ 3 h C M y x h 2 > 0 ,
where
θ 1 h C M y h 2 + 1 4 θ 3 h C M x h 2 k 1 h 2 k 2 h 2 2 k 1 h k 2 h + θ 3 h C M y x h k 1 h k 2 h .
(vii)
A comparison between the MSE of the new estimator family in (27) and MSE M ^ D 3 s t yields the following condition:
M S E ( M ^ D 3 s t ) min > M S E E ^ s t if
4 h = 1 L W h 2 B D 3 h + Ψ D 3 h > h = 1 L θ 3 h W h 2 M y h 2 C M x h 2 k 1 h 2 k 2 h 2 2 k 1 h k 2 h + 4 C M y x h k 1 h k 2 h ,
where
Ψ D 3 h = θ 3 h C M x h 2 d 4 h ( o p t ) M y h + d 5 h ( o p t ) M x h 2 2 d 4 h ( o p t ) M y h C M y x h d 4 h ( o p t ) M y h + d 5 h ( o p t ) M x h
and
B D 3 h = B i a s M ^ D 3 s t m i n 2 + θ 1 h M y h 2 ( d 4 h ( o p t ) 2 1 ) C M y h 2 .
(viii)
A specific condition is derived by analyzing the mean squared error of the estimators expressed in Equation (27) against the MSE formulation in Equation (22):
M S E ( M ^ D 4 s t ) min > M S E E ^ s t if
4 h = 1 L W h 2 M y h 2 B D 4 s t + θ 3 h 4 d 6 h ( o p t ) 2 C M x h 2 θ 1 h C M y h 2 > h = 1 L θ 3 h W h 2 M y h 2 Z h ,
where
B D 4 s t = B i a s M ^ D 4 s t m i n M y h 2
and
Z h = C M x h 2 k 1 h 2 k 2 h 2 2 k 1 h k 2 h + 4 C M y x h k 1 h k 2 h .

Practical Feasibility and Computational Considerations

The quantile-based measures suggested for use in this study, including the median absolute deviation, interquartile range, and trimmed mean, are practically computable even in substantial first-phase samples. Modern statistical applications and fast algorithms make it easy to quickly calculate these measures with little processing time. Even though it takes a little more work to calculate them than traditional mean or variance-based methods, the extra cost is small compared to the cost of collecting all the data. More importantly, these strong measures give stable and consistent results even when there are outliers or skewed distributions, which is where simpler measures often fail. Hence, the modest increase in computational effort is justified by the significant improvement in robustness, precision, and reliability that these methods offer in stratified two-phase.

6. Analysis of Results

The performance of the proposed estimators is investigated through a combination of simulation and empirical analysis. Efficiency in various tightly controlled scenarios can be obtained from simulations of populations set up under positively skewed distributional forms. Further evidence from actual datasets is added to these findings to ensure that the conclusions are both theoretically sound and practically applicable.

6.1. Simulation Study

A comprehensive simulation study was conducted under stratified two-phase sampling to assess the finite population parameters and behavior of the proposed estimators. The simulations studied populations derived from various non-normal distributions, such as heavy-tailed, asymmetric, and moderately skewed models. Different combinations of first-phase and second-phase sample sizes were examined to reflect practical survey conditions. For each design, the empirical mean squared error and percent relative efficiency values are calculated through repeated sampling to analyze estimation accuracy. To demonstrate the effectiveness and strength of the suggested estimators, we compared their performance to the outcomes of a number of existing techniques. This setup gives us a controlled scenario to show how the new estimators work with different types of distributional structures and sampling schemes.
In statistical analysis, the choice of distributions to use in simulation studies should represent conditions in which the estimators will be used. As a robust estimator, the median works well when the data have skewness, heavy tails, or other irregular distributions. The study uses the auxiliary variable X to test its behavior in such a situation by constructing five representative distributions, with each distribution being selected because of its distributional properties.
  • Population 1: The first population assumes a heavy-tailed Cauchy distribution for X, specified with location λ 1 = 21 and scale λ 2 = 16 . The association between X and Y is negative, fixed at 0.40 .
  • Population 2: In the second case, X is uniformly distributed between 17 and 24. This distribution is considered independent of Y, i.e., no correlation is introduced.
  • Population 3: For the third population, X is modeled by an exponential distribution with parameter λ 5 = 0.5 , capturing a strong right-skew. The correlation with Y is set at 0.60 .
  • Population 4: The fourth design specifies X as following a gamma distribution, parameterized by λ 6 = 23 and λ 7 = 15 . The correlation with Y is moderately strong, at 0.68 .
  • Population 5: Finally, the fifth population generates X from a log-normal distribution with parameters 11 and 6, representing a mildly skewed distribution. A correlation of 0.57 is introduced with Y.
In all populations, the outcome variable is expressed as
Y = ρ y x X + e ,
where e is drawn from a standard normal distribution, ensuring that randomness is properly incorporated.
The study compared the MSE of the proposed estimators with those of existing alternatives across different distributions and correlation levels. The procedures adopted are consistent with the approaches presented in [28]. All calculations were carried out using R software (latest v. 4.4.0), ensuring a systematic assessment of robustness and efficiency within the stratified two-phase sampling scheme.
  • Step 1: As part of the simulation design, a finite population consisting of N = 1000 units is generated for the variables X and Y . The population is partitioned into L strata, which are defined either through prior knowledge of strata boundaries or through the use of an auxiliary variable.
  • Step 2: The first stage of sampling involves drawing a stratified sample of total size m. For each stratum hth ( h = 1 , 2 , , L ) , a subsample of size m h is selected using SRSWOR. The distribution of m h across strata is arranged by fixed-quota allocation rules.
  • Step 3: Following the first-phase stratified sampling, a second-phase subsample comprising n total observations is selected. Within each stratum h, n h units are drawn from the m h first-phase units using SRSWOR.
  • Step 4: Consistent with the two-phase design, multiple ( m , n ) settings are examined with n < m < N . For each pair, the stratum-specific allocations { m h , n h } are assigned based on the selected allocation strategy.
  • Scheme 1: The total first-phase sample size m takes values 300, 500, 800, and for each m, the second-phase size n assumes 0.10 m, 0.20 m, 0.30 m, 0.40 m (rounded). Both m h and n h are equally distributed across all strata.
  • Scheme 2: Four ( m , n ) pairs ( 200 , 50 ) , ( 300 , 75 ) , ( 500 , 125 ) , and ( 800 , 200 ) are examined, keeping equal stratum allocation.
  • Scheme 3: A finer set of designs combines m 150, 250, 400, 600, 900 with n 0.10 m, 0.25 m, 0.40 m, deriving m h and n h by equal allocation across strata.
  • Step 5: The efficiency of the estimators is examined by deriving the necessary stratum-level statistics from the selected samples in accordance with the previously outlined methodology. In the case of existing estimators that depend on unknown constants, the corresponding parameters are optimized using stratified estimates.
  • Step 6 (Simulation of stratified samples): For each population and each chosen ( m , n ) :
    1.
    Per-stratum allocations { m h , n h } are determined according to the allocation rule.
    2.
    From each stratum h, m h units are drawn from the N h population units by SRSWOR (first phase).
    3.
    From the m h first-phase units in each stratum, n h units are drawn by SRSWOR (second phase).
  • Step 7: For each pairing of ( m , n ) and for all estimators, the MSE is evaluated using the stratified sampling design. This involves applying the design weights together with the stratum-level statistics obtained from the sampled data.
  • Step 8: To obtain reliable results, the sampling process is repeated 20,000 times. For every estimator and ( m , n ) arrangement, the mean squared error (MSE) is then computed as the average over these replications. The empirical MSE values for each estimator are obtained as:
    M S E ( M ^ t ) min = u = 1 U M ^ t u M y 2 U
    and
    PRE = 100 × Var ( M ^ y s t ) MSE ( M ^ t ) min
    where t ( t = y s t , A s t , D 1 s t , R e s t , P e s t , D 2 s t , D 3 s t D 4 s t , E s t 1 , E s t 2 , , E s t 8 ), U = 20 , 000 , M ^ u t is the estimate from replication u, and M y is the true population parameter.
The proposed estimators showed the greatest improvements when the underlying population exhibited skewness or heavy tails. This behaviour is expected because the methodology utilizes quantile based information that remains stable under extreme values and asymmetric shapes. Traditional estimators that depend on means or variance measures are more sensitive to outlying observations, which can distort median estimation in irregular populations. By contrast, the quantile driven structure used here captures central tendency and scale in a way that reflects the true distributional form, leading to superior precision in these challenging settings.

6.2. Real-Life Application

This section reports an empirical study utilizing real population data, with the key characteristics of the datasets summarized below. These datasets serve as practical benchmarks, allowing the proposed estimators to be evaluated under varied and representative conditions.
  • Population 1. This information collected from [32], which provides comprehensive details on government schools for the academic year 2012–2013, is used for empirical evaluation. Primary and middle school enrollment data by gender represents the population. In particular, X 1 denotes the total number of government primary schools for both boys and girls, while Y 1 represents the total number of enrolled students. Simultaneously, X 2 represents the total number of government-run middle schools that accept both genders, while Y 2 records the overall number of students enrolled. It is accessible for download using the following URL: https://repository.lahoreschool.edu.pk/xmlui/bitstream/handle/123456789/13900/Dev-2014.pdf?sequence=1&isAllowed=y (accessed on 28 September 2025). The summary statistics is obtained as:
    N 1 = 36 , m 1 = 18 , n 1 = 9 , N 2 = 36 , m 2 = 18 , n 2 = 9 , X 1 m i n = 388 , X 1 m a x = 1534 , X 2 m i n = 84 , X 2 m a x = 478 , M x 1 = 1016.500 , M y 1 = 116230 , σ x 1 = 402.609 , σ x 2 = 424.937 , M x 2 = 206 , M y 2 = 49661 , f x 1 ( M x 1 ) = 0.000951993 , f y 1 ( M y 1 ) = 0.00000835 , f x 2 ( M x 2 ) = 0.004094403 , f y 2 ( M y 2 ) = 0.0000143374 , ρ y 1 x 1 = 0.084 , ρ y 2 x 2 = 0.875 , T M 1 = 891.188 , T M 2 = 210.688 , D M 1 = 982.650 , D M 2 = 231 , S k 1 = 1.008 , S k 2 = 1.023 , Q A 1 = 891.875 , Q A 2 = 215.375 , Q D 1 = 982.650 , Q D 2 = 62.875 , M A D 1 = 289 , M A D 2 = 267 , Q R 1 = 378.250 , Q R 2 = 125.750 , M R 1 = 961 , M R R 2 = 281 .
  • Population 2. A finite population is examined using statistics from [33] to demonstrate the empirical performance of the suggested estimators. Information on industrial activity, particularly the number of registered factories and related employment levels, is provided by these data at the district and division levels. In this case, X 1 represents the total number of factories registered in 2010, while Y 1 represents employment by division and district in 2010. The stated employment levels for 2012 are represented by Y 2 , while the number of registered factories is represented by X 2 . The following website provides a download link: https://repository.lahoreschool.edu.pk/xmlui/bitstream/handle/123456789/13023/2013.pdf?sequence=1&isAllowed=y (accessed on 28 September 2025). The summary statistics is presented as:
    N 1 = 36 , m 1 = 18 , n 1 = 9 , N 2 = 36 , m 2 = 18 , n 2 = 9 , X 1 m i n = 24 , X 1 m a x = 1986 , X 2 m i n = 24 , X 2 m a x = 2055 , M x 1 = 168.500 , M y 1 = 10484.500 , σ 1 = 438.519 , σ 2 = 452.713 , M x 2 = 171.500 , M y 2 = 10494.500 , f x 1 ( M x 1 ) = 0.002463666 , f y 1 ( M y 1 ) = 0.00004033736 , f x 2 ( M x 2 ) = 0.002315051 , f y 2 ( M y 2 ) = 0.00004086913 , ρ y 1 x 1 = 0.912 , ρ y 2 x 2 = 0.5194465 , T M 1 = 193.438 , T M 2 = 195.750 , D M 1 = 432.500 , D M 2 = 431.500 , S k 1 = 2.106 , S k 2 = 2.345 , Q A 1 = 218.375 , Q A 2 = 220 , Q D 1 = 127.125 , Q D 2 = 132.500 , M A D 1 = 193.438 , M A D 2 = 99 , Q R 1 = 252.25 , Q R 2 = 265 , M R 1 = 1005 , M R 2 = 1039.500 .
  • Population 3. The data on page [1] illustrates the amount of money people earn and spend on food. Here, Y represents the family’s food expenses, which vary according to their employment status and demonstrate how work can impact food expenditures. Weekly income is reflected in the variable X, providing a quick overview of the household’s financial situation. The data is divided into two groups, and the statistics are summarized as follows:
    N 1 = 36 , m 1 = 18 , n 1 = 9 , N 2 = 36 , m 2 = 18 , n 2 = 9 , X 1 m i n = 28 , X 1 m a x = 95 , X 2 m i n = 15 , X 2 m a x = 75 , M x 1 = 60.664 , M y 1 = 52.728 , σ 1 = 10.130 , σ 2 = 12.750 , M x 2 = 44.205 , M y 2 = 46.339 , f x 1 ( M x 1 ) = 0.0004729 , f y 1 ( M y 1 ) = 0.0000219 , f x 2 ( M x 2 ) = 0.0003791 , f y 2 ( M y 2 ) = 0.0000481 , ρ y 1 x 1 = 0.337 , ρ y 2 x 2 = 0.496 , T M 1 = 61.623 , T M 2 = 45.287 , D M 1 = 61.600 , D M 2 = 45.242 , S k 1 = 0.300 , S k 2 = 0.500 , Q A 1 = 54.287 , Q A 2 = 48.940 , Q D 1 = 6.840 , Q D 2 = 8.614 , M A D 1 = 8.100 , M A D 2 = 10.200 , Q R 1 = 13.721 , Q R 2 = 17.201 , M R 1 = 61.500 , M R 2 = 45 .

6.3. Discussion and Results

Newly introduced transformation-based estimators have great benefits when compared with conventional median estimation methods. These estimators always had smaller mean squared errors in both simulated and real data which implies greater accuracy in determining the population median. These proved to be reliable with symmetric, moderately skewed, and heavy-tailed data, and the performance was not dependent on the characteristics of the distribution. Table 4, Table 5, Table 6 and Table 7 summarize the main results and demonstrate different kinds of significant trends.
  • The proposed transformation-based estimators continued to outperform the existing ones with both simulated and real-world data. They obtained significantly smaller errors in the mean squared in the simulated populations (Table 4, Figure 1), which suggests a higher accuracy of the median estimation. The same trends were noted with empirical data; the approach was highly effective (Table 5, Figure 2). Regardless of the underlying data, the suggested estimators performed well and effectively, achieving the smallest values of MSE in any practical situation where they could be applied. Such findings show that the transformation-based method is useful in the production of reliable and accurate median estimates in a variety of contexts.
  • Figure 1 and Figure 2 reveal that the proposed estimators perform reliably across varying correlation levels between study and auxiliary variables. As indicated in Table 4 and Table 5, their efficiency is sustained even when n h is considerably smaller than m h , making them suitable for cost-limited stratified sampling.
  • In addition to Figure 1 and Figure 2, the comparative patterns shown in Figure 3 and Figure 4 clearly demonstrate the consistency of the proposed estimators across both simulated and real data settings. Figure 3 highlights the relative efficiency trends presented in Table 6, where the quantile based estimators such as E ^ 5 s t , E ^ 6 s t , and E ^ 8 s t show much higher efficiency values compared with traditional approaches, regardless of the shape of the distribution or the degree of skewness. The efficiency curves remain stable and clearly separated from the existing methods, confirming the strength of the transformation based formulation under a wide range of population conditions. In a similar way, Figure 4, which summarizes the empirical findings from Table 7, supports this pattern using real data. The proposed estimators provide higher precision and smaller sampling variability in all three populations, showing that the quantile transformations work effectively beyond the simulated framework. Taken together, the graphical and tabular evidence confirms that the new family of estimators improves numerical accuracy and gives a more reliable performance in different and realistic data situations.
  • When the underlying population deviates from normality, especially when there is moderate to high skewness or a small percentage of extreme observations in the data, the suggested estimators show notable efficiency gains. According to simulation results, the quantile-based transformations maintain stability and produce smaller mean squared errors than conventional ratio or regression-type estimators when applied to right-skewed and heavy-tailed distributions, such as exponential or log-normal models. The performance advantage increases further when contamination levels of up to 10–15 % are added, demonstrating the suggested class of estimators stability in irregular and heterogeneous data environments.
Table 4. MSE values under stratified two-phase sampling for simulation data sets.
Table 4. MSE values under stratified two-phase sampling for simulation data sets.
Estimator C ( 21 , 16 ) Uni ( 17 , 24 ) Exp ( 0.5 ) Gam ( 23 , 15 ) LN ( 11 , 6 )
M ^ y s t 3.42 × 10 2 7.18 × 10 3 1.12 × 10 1 2.85 × 10 0 4.26 × 10 2
M ^ A s t 3.05 × 10 2 6.72 × 10 3 9.85 × 10 2 2.41 × 10 0 4.01 × 10 2
M ^ D 1 s t 2.65 × 10 2 6.25 × 10 3 8.42 × 10 2 2.05 × 10 0 3.55 × 10 2
M ^ R e s t 2.78 × 10 2 6.40 × 10 3 8.95 × 10 2 2.18 × 10 0 3.62 × 10 2
M ^ P e s t 4.92 × 10 2 8.10 × 10 3 1.42 × 10 1 3.85 × 10 0 6.21 × 10 2
M ^ D 2 s t 2.55 × 10 2 6.15 × 10 3 8.20 × 10 2 1.98 × 10 0 3.41 × 10 2
M ^ D 3 s t 2.48 × 10 2 6.05 × 10 3 8.10 × 10 2 1.95 × 10 0 3.38 × 10 2
M ^ D 4 s t 2.40 × 10 2 5.92 × 10 3 7.95 × 10 2 1.92 × 10 0 3.32 × 10 2
E ^ 1 s t 1 . 95 × 10 2 4 . 82 × 10 3 6 . 45 × 10 2 1 . 58 × 10 0 2 . 85 × 10 2
E ^ 2 s t 1 . 85 × 10 2 4 . 65 × 10 3 6 . 22 × 10 2 1 . 52 × 10 0 2 . 76 × 10 2
E ^ 3 s t 1 . 72 × 10 2 4 . 50 × 10 3 6 . 05 × 10 2 1 . 48 × 10 0 2 . 69 × 10 2
E ^ 4 s t 1 . 80 × 10 2 4 . 58 × 10 3 6 . 12 × 10 2 1 . 50 × 10 0 2 . 71 × 10 2
E ^ 5 s t 1 . 68 × 10 2 4 . 40 × 10 3 5 . 95 × 10 2 1 . 45 × 10 0 2 . 65 × 10 2
E ^ 6 s t 1 . 60 × 10 2 4 . 25 × 10 3 5 . 80 × 10 2 1 . 41 × 10 0 2 . 58 × 10 2
E ^ 7 s t 1 . 74 × 10 2 4 . 52 × 10 3 6 . 10 × 10 2 1 . 49 × 10 0 2 . 70 × 10 2
E ^ 8 s t 1 . 66 × 10 2 4 . 35 × 10 3 5 . 88 × 10 2 1 . 44 × 10 0 2 . 62 × 10 2
Table 5. Analysis of mean squared error performance using empirical populations.
Table 5. Analysis of mean squared error performance using empirical populations.
EstimatorPopulation-1Population-2Population-3
M ^ y s t 7.04 × 10 4 2.48 × 10 3 3.62 × 10 3
M ^ A s t 3.79 × 10 4 9.92 × 10 2 1.34 × 10 3
M ^ D 1 s t 3.78 × 10 4 9.12 × 10 2 1.09 × 10 3
M ^ R e s t 4.68 × 10 4 1.06 × 10 3 1.51 × 10 3
M ^ P e s t 1.08 × 10 5 2.01 × 10 3 3.47 × 10 3
M ^ D 2 s t 3.52 × 10 4 4.04 × 10 2 6.32 × 10 3
M ^ D 3 s t 8.64 × 10 3 4.20 × 10 2 3.71 × 10 3
M ^ D 4 s t 8.96 × 10 3 4.48 × 10 2 2.14 × 10 3
E ^ 1 s t 2 . 02 × 10 3 1 . 72 × 10 2 7 . 68 × 10 2
E ^ 2 s t 3 . 84 × 10 3 3 . 25 × 10 2 8 . 76 × 10 2
E ^ 3 s t 3 . 94 × 10 3 2 . 44 × 10 2 7 . 88 × 10 2
E ^ 4 s t 1 . 61 × 10 3 1 . 23 × 10 2 7 . 12 × 10 2
E ^ 5 s t 3 . 57 × 10 3 3 . 92 × 10 2 8 . 44 × 10 2
E ^ 6 s t 3 . 90 × 10 3 2 . 82 × 10 2 8 . 40 × 10 2
E ^ 7 s t 3 . 84 × 10 3 3 . 21 × 10 2 8 . 24 × 10 2
E ^ 8 s t 3 . 58 × 10 3 3 . 95 × 10 2 9 . 12 × 10 2
Table 6. Relative efficiencies (baseline = M ^ y s t ) for simulation data sets.
Table 6. Relative efficiencies (baseline = M ^ y s t ) for simulation data sets.
Estimator C ( 21 , 16 ) Uni ( 17 , 24 ) Exp ( 0.5 ) Gam ( 23 , 15 ) LN ( 11 , 6 )
M ^ y s t 100.00100.00100.00100.00100.00
M ^ A s t 112.13106.85113.71118.26106.23
M ^ D 1 s t 129.06114.88133.02139.02120.00
M ^ R e s t 123.02112.19125.14130.73117.68
M ^ P e s t 69.5188.6478.8774.0368.60
M ^ D 2 s t 134.12116.75136.59143.94124.93
M ^ D 3 s t 137.90118.68138.27146.15126.04
M ^ D 4 s t 142.50121.28140.88148.44128.31
E ^ 1 s t 175 . 38 148 . 96 173 . 64 180 . 38 149 . 47
E ^ 2 s t 184 . 86 154 . 41 180 . 06 187 . 50 154 . 35
E ^ 3 s t 198 . 84 159 . 56 185 . 12 192 . 57 158 . 36
E ^ 4 s t 190 . 00 156 . 77 183 . 01 190 . 00 157 . 20
E ^ 5 s t 203 . 57 163 . 18 188 . 24 196 . 55 160 . 75
E ^ 6 s t 213 . 75 168 . 94 193 . 10 202 . 13 165 . 12
E ^ 7 s t 196 . 55 158 . 85 183 . 61 191 . 28 157 . 78
E ^ 8 s t 206 . 02 165 . 06 190 . 48 197 . 92 162 . 60
Table 7. Relative efficiencies (baseline = M ^ y s t ) for actual data sets.
Table 7. Relative efficiencies (baseline = M ^ y s t ) for actual data sets.
EstimatorPopulation-1Population-2Population-3
M ^ y s t 100100100
M ^ A s t 186250270
M ^ D 1 s t 186272332
M ^ R e s t 150234240
M ^ P e s t 65123104
M ^ D 2 s t 20061457
M ^ D 3 s t 81459098
M ^ D 4 s t 785554169
E ^ 1 s t 34801442471
E ^ 2 s t 1833763413
E ^ 3 s t 17861016459
E ^ 4 s t 43732016509
E ^ 5 s t 1972633429
E ^ 6 s t 1805880431
E ^ 7 s t 1833773439
E ^ 8 s t 1964628397
Figure 1. Mean squared error (MSE) values for the proposed and conventional estimators are calculated from simulated datasets and depicted in the graphical summary.
Figure 1. Mean squared error (MSE) values for the proposed and conventional estimators are calculated from simulated datasets and depicted in the graphical summary.
Entropy 27 01191 g001
Figure 2. Mean squared error (MSE) values for the proposed and conventional estimators are calculated from real three populations and depicted in the graphical summary.
Figure 2. Mean squared error (MSE) values for the proposed and conventional estimators are calculated from real three populations and depicted in the graphical summary.
Entropy 27 01191 g002
Figure 3. Relative efficiency values for the proposed and conventional estimators are calculated from simulated datasets and depicted in the graphical summary.
Figure 3. Relative efficiency values for the proposed and conventional estimators are calculated from simulated datasets and depicted in the graphical summary.
Entropy 27 01191 g003
Figure 4. Felative efficiency values for the proposed and conventional estimators are calculated from real three populations and depicted in the graphical summary.
Figure 4. Felative efficiency values for the proposed and conventional estimators are calculated from real three populations and depicted in the graphical summary.
Entropy 27 01191 g004

7. Conclusions

This study developed a family of transformation-based median estimators for stratified two-phase sampling using quantile and robust-scale measures. The transformations considered include quartile deviation, interquartile range, median absolute deviation, trimmed mean, decile mean, quartile average, mid-range, and skewness. These measures provide a balanced way to use auxiliary information while maintaining resistance to outliers and skewness. Results from simulation and empirical analyses consistently demonstrated that the new estimators achieve lower mean squared errors and higher relative efficiencies than conventional median estimators. These improvements hold across various population structures, confirming the robustness and adaptability of the proposed approach.
Most existing median estimators in stratified two-phase sampling assume symmetric distributions and stable auxiliary information between phases. Their performance declines with skewed populations, outliers, or heterogeneous strata. Traditional ratio, regression, and exponential estimators were designed for standard conditions and cannot adapt to extreme data or varied distributions. This study proposed a generalized double-exponential median estimator using quantile-based transformations that capture spread, skewness, and tail behavior, making it more efficient and stable in irregular and non-normal stratified populations.
The proposed framework is highly flexible across survey types, samples, and strata, making it useful for socio-economic, environmental, and health studies. Future work may extend it to multi-phase and multi-auxiliary settings, complex sampling designs (clustered, multistage, or adaptive), and robust location measures such as quantile regression or trimmed means. Further research can also develop optimal data-based calibration rules and confidence interval procedures to enhance efficiency and applicability in modern, high-dimensional survey contexts.

Author Contributions

Conceptualization, F.A.A. and H.M.A.; Methodology, F.A.A. and H.M.A.; Software, F.A.A. and H.M.A.; Validation, F.A.A. and H.M.A.; Formal analysis, F.A.A. and H.M.A.; Investigation, F.A.A. and H.M.A.; Resources, F.A.A. and H.M.A.; Data curation, F.A.A. and H.M.A.; Writing—original draft, F.A.A.; Writing—review & editing, H.M.A.; Visualization, H.M.A.; Supervision, H.M.A.; Project administration, F.A.A. and H.M.A.; Funding acquisition, F.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R515), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
  2. Särndal, C.E. Sample survey theory vs. general statistical theory: Estimation of the population mean. Int. Stat. Rev. Int. Stat. 1972, 40, 1–12. [Google Scholar] [CrossRef]
  3. Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 20. [Google Scholar] [CrossRef]
  4. Alomair, M.A.; Daraz, U. Dual transformation of auxiliary variables by using outliers in stratified random sampling. Mathematics 2024, 12, 2829. [Google Scholar] [CrossRef]
  5. Gross, S. Median estimation in sample surveys. In Proceedings of the Section on Survey Research Methods, American Statistical Association Ithaca, Alexandria, VA, USA, 7–9 May 1980. [Google Scholar]
  6. Sedransk, J.; Meyer, J. Confidence intervals for the quantiles of a finite population: Simple random and stratified simple random sampling. J. R. Stat. Soc. Ser. B (Methodol.) 1978, 40, 239–252. [Google Scholar] [CrossRef]
  7. Philip, S.; Sedransk, J. Lower bounds for confidence coefficients for confidence intervals for finite population quantiles. Commun. Stat.-Theory Methods 1983, 12, 1329–1344. [Google Scholar] [CrossRef]
  8. Kuk, Y.C.A.; Mak, T.K. Median estimation in the presence of auxiliary information. J. R. Stat. Soc. Ser. B 1989, 51, 261–269. [Google Scholar] [CrossRef]
  9. Rao, T.J. On certail methods of improving ration and regression estimators. Commun. Stat.-Theory Methods 1991, 20, 3325–3340. [Google Scholar] [CrossRef]
  10. Singh, S.; Joarder, A.H.; Tracy, D.S. Median estimation using double sampling. Aust. N. Z. J. Stat. 2001, 43, 33–46. [Google Scholar] [CrossRef]
  11. Khoshnevisan, M.; Singh, H.P.; Singh, S.; Smarandache, F. A General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling; Virginia Polytechnic Institute and State University: Blacksburg, VA, USA, 2002. [Google Scholar]
  12. Gupta, S.; Shabbir, J.; Ahmad, S. Estimation of median in two-phase sampling using two auxiliary variables. Commun. Stat.-Theory Methods 2008, 37, 1815–1822. [Google Scholar] [CrossRef]
  13. Singh, S. Advanced Sampling Theory with Applications: How Michael Selected Amy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; Volume 2. [Google Scholar]
  14. Subzar, M.; Lone, S.A.; Ekpenyong, E.J.; Salam, A.; Aslam, M.; Raja, T.A.; Almutlak, S.A. Efficient class of ratio cum median estimators for estimating the population median. PLoS ONE 2023, 18, e0274690. [Google Scholar] [CrossRef]
  15. Iseh, M.J. Model formulation on efficiency for median estimation under a fixed cost in survey sampling. Model Assist. Stat. Appl. 2023, 18, 373–385. [Google Scholar] [CrossRef]
  16. Hoaglin, D.C.; Mosteller, F.; Tukey, J.W. Understanding Robust and Exploratory Data Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
  17. Chen, M.; Chen, W.X.; Yang, R.; Zhou, Y.W. Exponential-Poisson parameters estimation in moving extremes ranked set sampling design. Acta Math. Appl. Sin. Engl. Ser. 2025, 41, 973–984. [Google Scholar] [CrossRef]
  18. Alshanbari, H.M. A generalized estimation strategy for the finite population median using transformation methods under a two-phase sampling design. Symmetry 2025, 17, 1696. [Google Scholar] [CrossRef]
  19. Alshanbari, H.M. A novel robust transformation approach to finite population median estimation using Monte Carlo simulation and empirical data. Axioms 2025, 14, 737. [Google Scholar] [CrossRef]
  20. Alghamdi, A.S.; Almulhim, F.A. Improved median estimation in stratified surveys via nontraditional auxiliary measures. Symmetry 2025, 17, 1136. [Google Scholar] [CrossRef]
  21. Alghamdi, A.S.; Almulhim, F.A. Stratified median estimation using auxiliary transformations: A robust and efficient approach in asymmetric populations. Symmetry 2025, 17, 1127. [Google Scholar] [CrossRef]
  22. Shabbir, J.; Gupta, S. A generalized class of difference type estimators for population median in survey sampling. Hacet. J. Math. Stat. 2017, 46, 1015–1028. [Google Scholar] [CrossRef]
  23. Irfan, M.; Maria, J.; Shongwe, S.C.; Zohaib, M.; Bhatti, S.H. Estimation of population median under robust measures of an auxiliary variable. Math. Probl. Eng. 2021, 2021, 4839077. [Google Scholar] [CrossRef]
  24. Shabbir, J.; Gupta, S.; Narjis, G. On improved class of difference type estimators for population median in survey sampling. Commun. Stat.-Theory Methods 2022, 51, 3334–3354. [Google Scholar] [CrossRef]
  25. Hussain, M.A.; Javed, M.; Zohaib, M.; Shongwe, S.C.; Awais, M.; Zaagan, A.A.; Irfan, M. Estimation of population median using bivariate auxiliary information in simple random sampling. Heliyon 2024, 10, e28891. [Google Scholar] [CrossRef]
  26. Bhushan, S.; Kumar, A.; Lone, S.A.; Anwar, S.; Gunaime, N.M. An efficient class of estimators in stratified random sampling with an application to real data. Axioms 2023, 12, 576. [Google Scholar] [CrossRef]
  27. Stigler, S.M. Linear functions of order statistics. Ann. Math. Stat. 1969, 40, 770–788. [Google Scholar] [CrossRef]
  28. Daraz, U.; Almulhim, F.A.; Alomair, M.A.; Alomair, A.M. Population median estimation using auxiliary variables: A simulation study with real data across sample sizes and parameters. Mathematics 2025, 13, 1660. [Google Scholar] [CrossRef]
  29. Aladag, S.; Cingi, H. Improvement in estimating the population median in simple random sampling and stratified random sampling using auxiliary information. Commun. Stat.-Theory Methods 2015, 44, 1013–1032. [Google Scholar] [CrossRef]
  30. Solanki, R.S.; Singh, H.P. Some classes of estimators for median estimation in survey sampling. Commun. Stat.-Theory Methods 2015, 44, 1450–1465. [Google Scholar] [CrossRef]
  31. Singh, H.P.; Vishwakarma, G.K. Modified exponential ratio and product estimators for finite population mean in double sampling. Austrian J. Stat. 2007, 36, 217–225. [Google Scholar] [CrossRef]
  32. Bureau of Statistics. Punjab Development Statistics 2014; Government of the Punjab, Lahore: Islamabad, Pakistan, 2014.
  33. Bureau of Statistics. Punjab Development Statistics 2013; Government of the Punjab, Lahore: Islamabad, Pakistan, 2013.
Table 1. List of notations used in the study.
Table 1. List of notations used in the study.
SymbolDescriptionSymbolDescription
NPopulation sizeLNumber of strata
N h Units in stratum h m h First-phase sample size in stratum h
n h Second-phase sample size in stratum hmTotal first-phase sample size
nTotal second-phase sample sizeYStudy variable
XAuxiliary variable M y h , M x h Population medians of Y and X
M ^ y h , M ^ x h Second-phase sample medians M ˙ x h First-phase sample median of X
f y h , f x h Probability density at medians W h Stratum weight
ρ y x h Correlation between Y and X P 11 h Joint probability function
e 0 h , e 1 h , e 2 h Relative error terms C M y h , C M x h Median coefficients
C M y x h Covariance coefficient θ 1 h , θ 2 h , θ 3 h Sampling constants
Q 1 h , Q 3 h First and third quartiles I Q R h Interquartile range
Q D h Quartile deviation Q A h Quartile average
T M h Trimmed mean D M h Decile mean
M A D h Median absolute deviation M R h Mid-range of X h
σ X h Standard deviation of X S k ( X h ) Skewness of X
E 1 s t E 8 s t Proposed estimators (quantile-based) M S E Mean squared error
B i a s Bias of estimator k 1 h , k 2 h Transformation constants
t i h , V i h Calibration parameters Cov ( M y h , M x h ) Covariance of Y and X in h stratum
Table 2. Proposed estimator transformations in stratified two-phase sampling.
Table 2. Proposed estimator transformations in stratified two-phase sampling.
Estimator t 1 h t 2 h t 3 h t 4 h
E ^ s t 1 Q D h M A D h 1 X h max X h min
E ^ s t 2 T M h M R h 1 I Q R h
E ^ s t 3 D M h M A D h 1 Q D h
E ^ s t 4 Skewness ( X h ) 11 Q 3 h Q 2 h
E ^ s t 5 log ( Q 3 h + 1 ) log ( Q 1 h + 1 ) 1 log ( M R h + 1 )
E ^ s t 6 Q A h σ X h 1 Q D h
E ^ s t 7 X median I Q R h 1 M A D h
E ^ s t 8 Q 1 h · Q 3 h X h max · X h min 1 I Q R h
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Almulhim, F.A.; Aljohani, H.M. Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling. Entropy 2025, 27, 1191. https://doi.org/10.3390/e27121191

AMA Style

Almulhim FA, Aljohani HM. Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling. Entropy. 2025; 27(12):1191. https://doi.org/10.3390/e27121191

Chicago/Turabian Style

Almulhim, Fatimah A., and Hassan M. Aljohani. 2025. "Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling" Entropy 27, no. 12: 1191. https://doi.org/10.3390/e27121191

APA Style

Almulhim, F. A., & Aljohani, H. M. (2025). Median Estimation with Quantile Transformations: Applications to Stratified Two-Phase Sampling. Entropy, 27(12), 1191. https://doi.org/10.3390/e27121191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop