A Generalized Estimation Strategy for the Finite Population Median Using Transformation Methods Under a Two-Phase Sampling Design

Alshanbari, Huda M.

doi:10.3390/sym17101696

Open AccessArticle

A Generalized Estimation Strategy for the Finite Population Median Using Transformation Methods Under a Two-Phase Sampling Design

by

Huda M. Alshanbari

Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O.Box 84428, Riyadh 11671, Saudi Arabia

Symmetry 2025, 17(10), 1696; https://doi.org/10.3390/sym17101696

Submission received: 16 August 2025 / Revised: 24 September 2025 / Accepted: 2 October 2025 / Published: 10 October 2025

(This article belongs to the Section Mathematics)

Download

Browse Figures

Versions Notes

Abstract

This paper introduces an efficient improved class of estimators for the finite population median under a two-phase sampling scheme. The proposed estimators are developed using transformation techniques to improve the estimation precision over that of conventional approaches. Two-phase sampling is employed to reduce data collection costs and enhance estimation accuracy, especially when complete auxiliary information is not easily available. Expressions for the bias and mean squared error (MSE) are derived using a first-order approximation. To assess performance, simulation studies were carried out using data generated from various statistical distributions, alongside several real-life datasets. Estimators are compared using the mean squared error criterion, and the results show that the proposed methods consistently outperform the existing ones in terms of accuracy and efficiency. Graphical comparisons further support the improved performance of the new estimators, highlighting their practical effectiveness in median estimation problems.

Keywords:

auxiliary information; two-phase sampling; transformation methods; distributions; bias; mean squared errors

1. Introduction

Median estimation is important in survey sampling, especially when data are skewed or contain outliers, as it offers a more reliable measure of central tendency than the mean. Using a two-phase sampling design allows researchers to effectively use auxiliary information in the first phase, reducing costs while improving the accuracy of median estimates in the second phase. This approach is particularly useful when complete auxiliary data are not available initially, allowing for better resource allocation and more dependable inference for population parameters in complex surveys. More details about auxiliary information can be found at [1,2,3,4,5].

The median offers a more reliable measure of central tendency than the mean when data are skewed or contain extreme values. Its robustness makes it especially useful in real-life applications such as income analysis, healthcare outcomes, and environmental studies, where outliers can distort average-based measures. Over the years, the use of auxiliary information has greatly contributed to more accurate estimation of the population median in survey sampling. The initial foundation for this work was laid by early contributions such as [6,7,8], which inspired a wide range of estimators developed for finite populations using different sampling methods [9]. Several improved regression and ratio estimators were later introduced to enhance the precision of median estimates, as discussed in [10]. To address situations with limited auxiliary data, double sampling techniques were explored by [11,12], and a more general class of estimators using two auxiliary variables was proposed in [13]. Using the known median of the auxiliary variable, ref. [14] introduced an estimator that was minimum and unbiased. Further developments to increase the accuracy of median estimation using two-phase sampling were offered by [15]. Additional estimators under both simple and stratified sampling designs were developed by [16,17]. More recently, new estimators utilizing auxiliary data under various sampling strategies have been proposed by [18,19,20,21], demonstrating the effectiveness of these methods. For an in-depth understanding of this area, we refer to [22,23,24,25,26,27].

Traditional estimators such as ratio, exponential, regression, and product types often assume symmetrical distributions and are highly sensitive to extreme values, which limits their reliability in skewed or non-normal populations. To overcome these shortcomings, we introduce new estimators using transformation techniques with a double-phase sampling approach to enhance accuracy and stability. These methods are significantly effective in scenarios where data variability is a concern. For instance, in agricultural research, yield data often fluctuate due to climatic factors; in industrial production, occasional defects can distort measurements; and in academic testing, performance scores tend to deviate from normality. In such cases, the median offers a more dependable summary than the mean. Two-phase sampling further supports cost efficiency by using initial auxiliary data before collecting full information. The newly developed estimators, designed to be more resistant to data irregularities, provide improved performance and consistency, making them well suited to complex survey applications.

Advantages of the Proposed Estimators under Two-Phase Sampling

In practical scenarios, the enhanced median estimation approach introduced in this study offers significant benefits and meaningful insights:

Enhanced Resistance to Data Irregularities: Utilizing measures such as the interquartile range and mid-range, the estimators effectively minimize the impact of extreme observations and asymmetric distributions. The approach offers consistent median estimates across diverse sampling conditions, outperforming many classical methods.
Efficient Use of Partial Auxiliary Information: By employing multiple transformation techniques, these estimators flexibly adjust to different underlying population characteristics. The two-phase sampling scheme utilizes initial auxiliary data to improve estimates with minimal additional sampling effort.
Practical Advantages for Complex Surveys: The proposed methods are particularly useful when full auxiliary data are unavailable or costly to obtain, ensuring reliable median estimation in challenging scenarios.
Practical Applications: The two-phase sampling scheme utilizes preliminary auxiliary information to enhance the estimation accuracy while minimizing additional data collection efforts. This method proves particularly effective in real-life contexts such as forestry management, where initial satellite imagery data (first phase) guide the selection of sample plots for detailed ground measurements (second phase). By combining these data sources, the proposed estimators deliver reliable median estimates of tree biomass, supporting sustainable resource planning and conservation efforts.

Unlike existing approaches that primarily extend ratio, regression, or exponential estimators under restrictive distributional assumptions, the proposed methods introduce a new combination of transformation techniques with two-phase sampling. This combination provides a systematic way to achieve median estimation that is both resistant to skewness and adaptable to heterogeneous populations. Importantly, the framework is not a minor modification of earlier methods but represents a broader advancement by allowing flexible use of auxiliary information, offering improved stability and efficiency across various real-world survey conditions.

2. Methodology and Notations

In this section, we discuss both the newly developed median-based estimators and those previously introduced in the literature. Let the target population include N elements, expressed as

δ = {1, 2, \dots, N} .

This study focuses on a variable of interest Y, supported by an auxiliary variable X. A simple random sample of size n is drawn without replacement, yielding paired observations:

(y_{i}, x_{i}), i = 1, 2, \dots, n .

Although Y and X are strongly correlated, the true population median

M_{x}

of the auxiliary variable is not available. The subsequent discussion sets out the steps of the two-phase sampling strategy.

(i): First, a fixed sample $s_{m}$ of m units is selected from the population

$s_{m} \subset δ .$

At this step, only observations of the auxiliary variable x are recorded to estimate the population median $M_{x}$ .
(ii): Next, a sub-sample $s_{n}$ of n elements is chosen from within $s_{m}$ . In this phase, information on both the study variable y and the auxiliary variable x is obtained.

For clarity of notation, let

M_{y}

and

M_{x}

denote the population medians of the study and auxiliary variables. From the first-phase sample, the median of the auxiliary variable is indicated as

M_{x}^{'}

. In the second-phase sample, we use

{\hat{M}}_{y}

and

{\hat{M}}_{x}

to represent the medians of the study and auxiliary variables, respectively. The probability density functions corresponding to the population medians are written as

f_{y} (M_{y})

for the study variable and

f_{x} (M_{x})

for the auxiliary variable. The relationship between

M_{y}

and

M_{x}

is quantified by the correlation coefficient

ρ_{y x}

, given by

ρ (M_{y}, M_{x}) = 4 P_{11} (y, x) - 1,

where

P_{11}

represents

P (y \leq M_{y} \cap x \leq M_{x}) .

First-order approximations of biases and mean squared errors are obtained using the relative error expressions and their associated expectations. Let

e_{0} = (\frac{{\hat{M}}_{y} - M_{y}}{M_{y}}),

e_{1} = (\frac{{\hat{M}}_{x} - M_{x}}{M_{x}})

and

e_{2} = (\frac{{\overset{´}{M}}_{x} - M_{x}}{M_{x}}),

such that

E (e_{i}) = 0

for

i = 0, 1, 2

.

Moreover,

E (e_{0}^{2}) = θ_{1} C_{M_{y}}^{2},

E (e_{1}^{2}) = θ_{1} C_{M_{x}}^{2},

E (e_{2}^{2}) = θ_{2} C_{M_{x}}^{2},

E (e_{0} e_{1}) = θ_{1} C_{M_{y x}} = ρ_{y x} C_{M_{y}} C_{M_{x}},

E (e_{0} e_{2}) = θ_{2} C_{M_{y x}} = ρ_{y x} C_{M_{y}} C_{M_{x}},

E (e_{1} e_{2}) = θ_{2} C_{M_{x}}^{2},

where

C_{M_{y}} = \frac{1}{M_{y} f_{y} (M_{y})},

C_{M_{x}} = \frac{1}{M_{y} f_{x} (M_{x})},

are the mathematical representations of the population coefficients of variation for the medians of Y and X, and

θ_{1} = \frac{1}{4} (\frac{1}{n} - \frac{1}{N}),

θ_{2} = \frac{1}{4} (\frac{1}{m} - \frac{1}{N}),

θ_{3} = \frac{1}{4} (\frac{1}{n} - \frac{1}{m}) .

All of the abbreviations and notations used throughout this manuscript are summarized for clarity. Table 1 provides a comprehensive list to ensure uniformity and ease of reference.

Before presenting the new class of estimators, we examine several existing median estimators that have been introduced by different researchers. Their associated bias, variance, and mean squared error formulations are outlined as benchmarks for comparison.

The usual sample median estimator and its variance formula proposed by [6] are presented below.

{\hat{M}}_{G R} = {\hat{M}}_{y}

(1)

and

V ({\hat{M}}_{y}) = θ_{1} M_{y}^{2} C_{M_{y}}^{2} .

(2)

In the setting of two-phase sampling, a ratio-type estimator was suggested by [11] to achieve a greater estimation efficiency, and it is defined as

{\hat{M}}_{S A} = (\frac{\hat{M_{y}}}{\hat{M_{x}}}) \overset{´}{M_{x}} .

(3)

For

{\hat{M}}_{S A}

, the first-order expressions of the bias and mean squared error (MSE) are as follows:

B i a s ({\hat{M}}_{S A}) ≅ θ_{3} M_{y} (C_{M_{x}}^{2} - C_{M_{y x}})

(4)

and

M S E ({\hat{M}}_{S A}) ≅ M_{y}^{2} [θ_{1} C_{M_{y}}^{2} + θ_{3} (C_{M_{x}}^{2} - 2 C_{M_{y x}})] .

(5)

Based on the formulation provided in [14], the difference-type estimator

{\hat{M}}_{D_{1}}

is defined as

{\hat{M}}_{D_{1}} = {\hat{M}}_{y} + d_{1} ({\overset{´}{M}}_{x} - {\hat{M}}_{x}) .

(6)

The first-order approximation yields the following expression for the minimum mean squared error of

{\hat{M}}_{D_{1}}

when

d_{1}

is chosen optimally:

M S E {({\hat{M}}_{D_{1}})}_{m i n} ≅ M_{y}^{2} C_{M_{y}}^{2} (θ_{1} - θ_{3} ρ_{y x}^{2}),

(7)

where

d_{1 (o p t)} = \frac{ρ_{y x} M_{y} C_{M_{y}}}{M_{x} C_{M_{x}}} .

As introduced by [28], the median forms of the exponential ratio and product estimators under two-phase sampling are defined below:

{\hat{M}}_{R e} = {\hat{M}}_{y} exp (\frac{{\overset{´}{M}}_{x} - {\hat{M}}_{x}}{{\overset{´}{M}}_{x} + {\hat{M}}_{x}})

(8)

and

{\hat{M}}_{P e} = {\hat{M}}_{y} exp (\frac{{\hat{M}}_{x} - {\overset{´}{M}}_{x}}{{\overset{´}{M}}_{x} + {\hat{M}}_{x}}) .

(9)

The first-order approximations for the biases and mean squared errors of (

{\hat{M}}_{R e}, {\hat{M}}_{P e}

) are given by

B i a s ({\hat{M}}_{R e}) ≅ \frac{θ_{3}}{2} M_{y} (\frac{3}{4} C_{M_{x}}^{2} - C_{M_{y x}}),

(10)

B i a s ({\hat{M}}_{P e}) ≅ \frac{θ_{3}}{2} M_{y} (\frac{3}{4} C_{M_{x}}^{2} + C_{M_{y x}}),

(11)

M S E ({\hat{M}}_{R e}) ≅ M_{y}^{2} [θ_{1} C_{M_{y}}^{2} + θ_{3} C_{M_{x}}^{2} (\frac{1}{4} - K)]

(12)

and

M S E ({\hat{M}}_{P e}) ≅ M_{y}^{2} [θ_{1} C_{M_{y}}^{2} + θ_{3} C_{M_{x}}^{2} (\frac{1}{4} + K)],

(13)

where

K = \frac{ρ_{y x} C_{M_{y}}}{C_{M_{x}}} .

According to [10,15], the difference-based estimators for the median are expressed as

{\hat{M}}_{D_{2}} = d_{2} {\hat{M}}_{y} + d_{3} ({\overset{´}{M}}_{x} - {\hat{M}}_{x}),

(14)

{\hat{M}}_{D_{3}} = [d_{4} {\hat{M}}_{y} + d_{5} ({\overset{´}{M}}_{x} - {\hat{M}}_{x})] (\frac{{\overset{´}{M}}_{x}}{{\hat{M}}_{x}}),

(15)

{\hat{M}}_{D_{4}} = [d_{6} {\hat{M}}_{y} + d_{7} ({\overset{´}{M}}_{x} - {\hat{M}}_{x})] (\frac{{\overset{´}{M}}_{x} - {\hat{M}}_{x}}{{\overset{´}{M}}_{x} + {\hat{M}}_{x}}) .

(16)

The first-order approximation forms of the minimum biases and mean squared errors of the estimators, corresponding to the optimal values of

d_{i}

(i = 2, 3, \dots, 7)

, are defined as

B i a s {({\hat{M}}_{D_{2}})}_{m i n} ≅ - M_{y} [\frac{θ_{1} C_{M_{y}}^{2} C_{M_{x}}^{2} - θ_{3} C_{M_{y x}}^{2}}{C_{M_{x}}^{2} (1 + θ_{1} C_{M_{y}}^{2}) - θ_{3} C_{M_{y x}^{2}}}],

(17)

B i a s {({\hat{M}}_{D_{3}})}_{m i n} ≅ M_{y} (d_{4 (o p t)} - 1) + θ_{3} [d_{4 (o p t)} M_{y} (C_{M_{x}}^{2} - C_{M_{y x}}) + d_{5 (o p t)} M_{x} C_{M_{x}}^{2}],

(18)

B i a s {({\hat{M}}_{D_{4}})}_{m i n} ≅ M_{y} [- 1 + \frac{θ_{3}}{4} C_{M_{x}}^{2} \{- 1 + d_{6 (o p t)} (\frac{M_{x} + 8 ρ_{y x} M_{y}}{M_{x}})\}],

(19)

M S E {({\hat{M}}_{D_{2}})}_{m i n} ≅ M_{y}^{2} [\frac{θ_{1} C_{M_{y}}^{2} C_{M_{x}}^{2} - θ_{3} C_{M_{y x}}^{2}}{C_{M_{x}}^{2} (1 + θ_{1} C_{M_{y}}^{2}) - θ_{3} C_{M_{y x}^{2}}}],

(20)

\begin{matrix} M S E {({\hat{M}}_{D_{3}})}_{min} ≅ & B i a s {({\hat{M}}_{D_{3}})}_{min}^{2} + θ_{1} M_{y}^{2} d_{4 (opt)}^{2} C_{M_{y}}^{2} + θ_{3} [C_{M_{x}}^{2} {(d_{4 (opt)} M_{y} + d_{5 (opt)} M_{x})}^{2} \\ - 2 d_{4 (opt)} M_{y} C_{M_{y x}} (d_{4 (opt)} M_{y} + d_{5 (opt)} M_{x})] \end{matrix}

(21)

and

M S E {({\hat{M}}_{D_{4}})}_{m i n} ≅ B i a s {({\hat{M}}_{D_{4}})}_{m i n} + \frac{θ_{3}}{4} d_{6 (o p t)}^{2} M_{y}^{2} C_{M_{x}}^{2},

(22)

where

d_{2 (o p t)} = \frac{C_{M_{x}}^{2}}{C_{M_{x}}^{2} (1 + θ_{1} C_{M_{y}}^{2}) - θ_{3} C_{M_{y x}}^{2}},

d_{3 (o p t)} = \frac{M_{y} C_{M_{y x}}}{M_{x} [C_{M_{x}}^{2} (1 + θ_{1} C_{M_{y}}^{2}) - θ_{3} C_{M_{y x}}^{2}]},

d_{4 (o p t)} = [\frac{C_{M_{x}}^{2}}{C_{M_{x}}^{2} (1 + θ_{1} C_{M_{y}}^{2} \{1 + θ_{3} C_{M_{x}}^{2}\}) θ_{3} C_{M_{y x}}^{2} (1 + C_{M_{x}}^{2})}],

d_{5 (o p t)} = M_{y} [\frac{C_{M_{x}}^{2} (θ_{1} C_{M_{y}}^{2} - 1) + C_{M_{y x}} (1 + - θ_{3} C_{M_{y x}})}{C_{M_{x}}^{2} (1 + θ_{1} C_{M_{y}}^{2} \{1 + θ_{3} C_{M_{x}}^{2}\}) θ_{3} C_{M_{y x}}^{2} (1 + C_{M_{x}}^{2})}],

d_{6 (o p t)} = \frac{1}{8} [\frac{8 - θ_{2} C_{M_{x}}^{2}}{1 + θ_{1} C_{M_{x}}^{2} (1 - ρ_{y x}^{2})}],

and

d_{7 (o p t)} = \frac{M_{y}}{M_{x}} [\frac{1}{2} + d_{6 (o p t)} (\frac{ρ_{y x} M_{y}}{M_{x}}) - 1] .

3. Proposed General Class of Estimators

Motivated by the modified estimators suggested in [29,30,31], we propose a generalized double-exponential class of median estimators within the framework of two-phase sampling. This approach utilizes various transformations of the auxiliary variable to estimate the finite population median. The family of estimators is defined as follows:

{\hat{M}}_{A} = {\hat{M}}_{y} exp [V_{1} \{\frac{t_{1} ({\hat{M}}_{x} - {\overset{´}{M}}_{x})}{t_{1} ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 t_{2}}\}] exp [V_{2} \{\frac{t_{3} ({\overset{´}{M}}_{x} - {\hat{M}}_{x})}{t_{3} ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 t_{4}}\}],

(23)

where

(V_{i}, i = 1, 2)

are predetermined constants. The symbols

t_{1}, t_{2}, t_{3}, t_{4}

denote known population quantities related to the auxiliary variable X. Starting from Equation (23), it is possible to introduce new estimators by selecting different combinations of these population quantities. The specific parameter settings appear in Table 2, while the complete expressions for the resulting estimators are provided in Table 3.

where

I Q R = Q_{3} - Q_{1},

M R = \frac{X_{m a x} + X_{m i n}}{2},

Q A = \frac{Q_{3} + Q_{1}}{2},

Q D = \frac{Q_{3} - Q_{1}}{2},

T M = \frac{Q_{1} + 2 Q_{2} + Q_{3}}{4},

D M = \frac{\sum_{i = 1}^{9} D_{i}}{9},

M A D = median (|X_{i} - M_{x}| : i = 1, \dots, N),

σ_{X} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(X_{i} - \bar{X})}^{2}},

S k (X) = \frac{\frac{1}{N} \sum_{i = 1}^{N} {(X_{i} - \bar{X})}^{3}}{σ_{X}^{3}} .

Justification for the Choice of Transformation Components

The selection of the transformation components

t_{1}, t_{2}, t_{3}, t_{4}

in the proposed class of estimators is guided by the goal of enhancing robustness and efficiency in the estimation of the population median under two-phase sampling. Since the auxiliary variable X is only fully observed in the first-phase sample, it is crucial to extract as much reliable information as possible using robust statistical summaries. The rationale for using different combinations is outlined below:

Robustness to outliers: Measures such as the quartile deviation (QD), median absolute deviation (MAD), and interquartile range (IQR) are less influenced by extreme observations compared to conventional measures like the mean or standard deviation. Incorporating these robust statistics into $t_{1}, t_{2}$ , or $t_{4}$ enhances the estimator performance in datasets with heavy tails, such as those generated from Cauchy or log-normal distributions.
Combining location and spread: Estimators like ${\hat{M}}_{A_{2}}$ and ${\hat{M}}_{A_{6}}$ simultaneously use measures of central tendency (e.g., trimean, quartile average) and dispersion (e.g., IQR, standard deviation). This dual usage improves the adaptability of estimators to different shapes of distributions, particularly when the data exhibit moderate skewness or non-normality.
Skewness sensitivity: The estimator ${\hat{M}}_{A_{4}}$ uses skewness of X to adjust the dynamic to asymmetry in the distribution. By accounting for skewness explicitly in $t_{1}$ or $t_{4}$ , this estimator is particularly useful when the auxiliary variable is significantly non-symmetric.
Transformational stability: Transformations such as $log (Q_{3} + 1)$ , $log (Q_{1} + 1)$ , and $\sqrt{Q_{1} \cdot Q_{3}}$ , used in estimators like ${\hat{M}}_{A_{5}}$ and ${\hat{M}}_{A_{8}}$ , contribute to scale stability. These transformations are known to mitigate the effect of skewness and reduce heteroscedasticity, thereby stabilizing the variance in the estimator.
Geometric and midrange features: Estimators such as ${\hat{M}}_{A_{7}}$ and ${\hat{M}}_{A_{8}}$ utilize geometric means or midrange components to capture distributional symmetry and central spread. These are especially effective in settings where the auxiliary variable is uniformly distributed or symmetrically bounded.
Computational simplicity and availability: All transformation components used in the proposed estimators, such as $Q_{1}$ , $Q_{3}$ , $M A D$ , $I Q R$ , and midrange, can be readily computed from the first-phase sample data. This makes the estimators highly practical and convenient, especially in real survey applications where full population data is inaccessible.

In summary, the proposed estimators are introduced upon a carefully selected set of transformation techniques that collectively aim to enhance estimator robustness, reduce the bias and MSE, and offer flexible applicability across various underlying population distributions. Expressions for the bias and MSE of the family of ratio–product-type estimators

{\hat{M}}_{A}

are presented in the theorem below.

Theorem 1.

Consider a two-phase sampling approach where

{\hat{M}}_{A}

presents an exponential ratio–product family designed for estimating the median

M_{y}

of the finite population. The corresponding bias and MSE formulations are outlined below:

\begin{matrix} B i a s ({\hat{M}}_{A}) ≅ M_{y} [\frac{1}{2} (k_{1} - k_{2}) θ_{3} C_{M_{y x}} - \frac{1}{8} (k_{1}^{2} - k_{2}^{2} - 2 k_{1} k_{2}) θ_{3} C_{M_{x}}^{2}] \end{matrix}

and

\begin{matrix} M S E ({\hat{M}}_{A}) ≅ M_{y}^{2} [θ_{1} C_{M_{y}}^{2} + \frac{1}{4} θ_{3} C_{M_{x}}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + θ_{3} C_{M_{y x}} (k_{1} - k_{2})] . \end{matrix}

Proof.

The concepts listed below are reconsidered, as they play a role in the proof of the theorem:

e_{0} = (\frac{{\hat{M}}_{y} - M_{y}}{M_{y}}),

e_{1} = (\frac{{\hat{M}}_{x} - M_{x}}{M_{x}})

and

e_{2} = (\frac{{\overset{´}{M}}_{x} - M_{x}}{M_{x}}),

such that

E (e_{i}) = 0

for

i = 0, 1, 2

.

E (e_{0}^{2}) = θ_{1} C_{M_{y}}^{2},

E (e_{1}^{2}) = θ_{1} C_{M_{x}}^{2},

E (e_{2}^{2}) = θ_{2} C_{M_{x}}^{2},

E (e_{0} e_{1}) = θ_{1} C_{M_{y x}} = ρ_{y x} C_{M_{y}} C_{M_{x}},

E (e_{0} e_{2}) = θ_{2} C_{M_{y x}} = ρ_{y x} C_{M_{y}} C_{M_{x}},

E (e_{1} e_{2}) = θ_{2} C_{M_{x}}^{2},

where

C_{M_{y}} = \frac{1}{M_{y} f_{y} (M_{y})},

C_{M_{x}} = \frac{1}{M_{y} f_{x} (M_{x})},

θ_{1} = \frac{1}{4} (\frac{1}{n} - \frac{1}{N}),

θ_{2} = \frac{1}{4} (\frac{1}{m} - \frac{1}{N})

and

θ_{3} = θ_{1} - θ_{2} = \frac{1}{4} (\frac{1}{n} - \frac{1}{m}) .

When Equation (23) is represented in terms of relative errors, the corresponding analytical expressions for the bias and mean squared error of

{\hat{M}}_{A}

can be derived as

\begin{matrix} {\hat{M}}_{A} = M_{y} (1 + e_{0}) exp [V_{1} \{\frac{k_{1} (e_{1} - e_{2})}{2} {(1 + \frac{k_{1} (e_{1} + e_{2})}{2})}^{- 1}\}] \times \\ exp [V_{2} \{\frac{- k_{2} (e_{1} - e_{2})}{2} {(1 + \frac{k_{2} (e_{1} + e_{2})}{2})}^{- 1}\}] \end{matrix}

(24)

where

V_{1}, V_{2}, k_{1},

and

k_{2}

are defined as

V_{1} = V_{2} = 1,

k_{1} = \frac{t_{1} M_{x}}{t_{1} M_{x} + t_{2}}

and

k_{2} = \frac{t_{3} M_{x}}{t_{3} M_{x} + t_{4}} .

Expanding Equation (24) on the right-hand side via a first-order Taylor series gives the approximation. Neglecting higher-order terms

(e_{i} > 2)

due to their minimal effect, we obtain

\begin{matrix} {\hat{M}}_{A} = M_{y} (1 + e_{0}) exp [\frac{k_{1} (e_{1} - e_{2})}{2} \{1 - \frac{k_{1} (e_{1} + e_{2})}{2} + \frac{k_{1}^{2} {(e_{1} + e_{2})}^{2}}{4}\}] \times \\ exp [- \frac{k_{2} (e_{1} - e_{2})}{2} \{1 - \frac{k_{2} (e_{1} + e_{2})}{2} + \frac{k_{2}^{2} {(e_{1} + e_{2})}^{2}}{4}\}], \end{matrix}

\begin{matrix} {\hat{M}}_{A} = M_{y} (1 + e_{0}) exp [\frac{k_{1} (e_{1} - e_{2})}{2} - \frac{k_{1}^{2} (e_{1}^{2} - e_{2}^{2})}{4}] exp [\frac{k_{2} (e_{2} - e_{1})}{2} - \frac{k_{2}^{2} (e_{1}^{2} - e_{2}^{2})}{4}] . \end{matrix}

\begin{matrix} {\hat{M}}_{A} - M_{y} ≅ & M_{y} [e_{0} + \frac{k_{1}}{2} (e_{1} - e_{2} + e_{0} e_{1} - e_{0} e_{2}) + \frac{k_{2}}{2} (e_{2} - e_{1} - e_{0} e_{1} + e_{0} e_{2}) \\ - \frac{k_{1}^{2}}{8} (e_{1}^{2} - 3 e_{2}^{2} + 2 e_{1} e_{2}) - \frac{k_{2}^{2}}{8} (e_{1}^{2} - 3 e_{2}^{2} + 2 e_{1} e_{2}) - \frac{k_{1} k_{2}}{4} (e_{1}^{2} e_{2}^{2} - 2 e_{1} e_{2})] . \end{matrix}

(25)

To compute the bias of

{\hat{M}}_{A}

, we take the expectation of Equation (25) and replace each occurrence of (

e_{0}, e_{1}, e_{0}^{2}, e_{1}^{2}, e_{2}^{2}, e_{0} e_{1}, e_{0} e_{2}, e_{1} e_{2}

) with their respective expected values. This results in

\begin{matrix} B i a s ({\hat{M}}_{A}) ≅ & M_{y} [\frac{1}{2} (k_{1} - k_{2}) (θ_{1} C_{M_{y x}} - θ_{2} C_{M_{y x}}) - \frac{1}{8} (k_{1}^{2} - k_{2}^{2}) (θ_{1} C_{M_{x}}^{2} - θ_{2} C_{M_{x}}^{2}) \\ - \frac{k_{1} k_{2}}{4} (θ_{1} C_{M_{x}}^{2} - θ_{2} C_{M_{x}}^{2})] . \end{matrix}

After simplification, we acquire

\begin{matrix} B i a s ({\hat{M}}_{A}) ≅ M_{y} [\frac{1}{2} (k_{1} - k_{2}) θ_{3} C_{M_{y x}} - \frac{1}{8} (k_{1}^{2} - k_{2}^{2} - 2 k_{1} k_{2}) θ_{3} C_{M_{x}}^{2}], \end{matrix}

(26)

where

θ_{3} = θ_{1} - θ_{2} .

The following gives the derivation of the mean squared error for

{\hat{M}}_{A}

using a first-order approximation. When both sides of Equation (25) are squared and expectations are taken, ignoring terms containing powers of e beyond the second

(e > 2)

and inserting the respective expected values, the result is

\begin{matrix} M S E ({\hat{M}}_{A}) ≅ M_{y}^{2} [θ_{1} C_{M_{y}}^{2} + \frac{1}{4} θ_{3} C_{M_{x}}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + θ_{3} C_{M_{y x}} (k_{1} - k_{2})] . \end{matrix}

(27)

□

4. Explicit Comparison Conditions

Using the MSE of the proposed estimator (Theorem 1, Equation (27))

MSE ({\hat{M}}_{A}) ≅ M_{y}^{2} [θ_{1} C_{M y}^{2} + \frac{1}{4} θ_{3} C_{M x}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + θ_{3} C_{M y x} (k_{1} - k_{2})],

the proposed estimator

{\hat{M}}_{A}

is superior to each comparator in Section 2 under the following explicit inequalities.

(i) Compared with the usual sample median estimator ${\hat{M}}_{y}$ : Conducting a comparison between the mean squared error derived for the new family of estimators (27) and the sample median variance (2) yields the condition stated below:

M S E ({\hat{M}}_{y}) > M S E ({\hat{M}}_{A}) if \frac{1}{4} C_{M x}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + C_{M_{y x}} (k_{1} - k_{2}) > 0 .

(ii) Compared with the ratio estimator ${\hat{M}}_{R}$ : From the evaluation of Equation (27) for the proposed family of estimators against the MSE in Equation (5), a specific condition is obtained:

M S E ({\hat{M}}_{R}) > M S E ({\hat{M}}_{A}) if C_{M_{x}}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + 4 C_{M_{y x}} (2 + k_{1} - k_{2}) < 4 C_{M_{x}}^{2} .

(iii) Compared with the difference estimator ${\hat{M}}_{D_{1}}$ : The following condition is derived from examining the mean squared error of the proposed family of estimators in Equation (27) alongside the MSE provided in Equation (7):

M S E {({\hat{M}}_{D_{1}})}_{m i n} > M S E ({\hat{M}}_{A}) if C_{M_{x}}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + 4 C_{M_{y x}} (k_{1} - k_{2}) > 4 C_{M_{y}}^{2} ρ_{y x}^{2} .

(iv) Compared with the exponential–ratio ${\hat{M}}_{R e}$ : Conducting a comparison between the mean squared error derived for the new family of estimators (27) and the MSE obtained in Equation (12) yields the condition stated below.

M S E ({\hat{M}}_{R e}) > M S E ({\hat{M}}_{A}) if C_{M_{x}}^{2} (1 - 4 K) > C_{M_{x}}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + 4 C_{M_{y x}} (k_{1} - k_{2}) .

(v) Compared with the exponential–product ${\hat{M}}_{P e}$ : From the evaluation of Equation (27) for the proposed family of estimators against the MSE derived in Equation (13), a specific condition emerges:

M S E ({\hat{M}}_{P e}) > M S E ({\hat{M}}_{A}) if C_{M_{x}}^{2} (1 + 4 K) > C_{M_{x}}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + 4 C_{M_{y x}} (k_{1} - k_{2}) .

(vi) Compared with the difference-type estimator ${\hat{M}}_{D_{2}}$ : A comparison of the mean squared error expression for the new family of estimators, as given in Equation (27) with the variance in the sample median, as shown in Equation (20), leads to the following condition.

M S E {({\hat{M}}_{D_{2}})}_{min} > M S E ({\hat{M}}_{A}) if \frac{1}{4} C_{M_{x}}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + C_{M_{y x}} (k_{1} - k_{2}) > \frac{C_{M_{y x}}^{2}}{C_{M_{x}}^{2} (1 + θ_{1} C_{M_{y}}^{2}) - θ_{3} C_{M_{y x}}^{2}} .

(vii) Compared with the difference-type estimator ${\hat{M}}_{D_{3}}$ : The condition arises from a comparison between the mean squared error of the newly developed family of estimators in Equation (27) and the sample median variance in Equation

{\hat{M}}_{D_{3}}

, which is defined as

M S E {({\hat{M}}_{D_{3}})}_{min} > M S E ({\hat{M}}_{A}) if \frac{B_{D_{3}}}{θ_{3}} + Ψ_{D_{3}} > \frac{1}{4} C_{M_{x}}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + C_{M_{y x}} (k_{1} - k_{2}),

where

Ψ_{D_{3}} = C_{M_{x}}^{2} {(d_{4 (o p t)} M_{y} + d_{5 (o p t)} M_{x})}^{2} - 2 d_{4 (o p t)} M_{y} C_{M_{y x}} (d_{4 (o p t)} M_{y} + d_{5 (o p t)} M_{x})

and

B_{D_{3}} = \frac{B i a s {({\hat{M}}_{D_{3}})}_{m i n}^{2}}{M_{y}^{2}} + θ_{1} (d_{4 (o p t)}^{2} - 1) C_{M_{y}}^{2} .

(viii) Compared with the difference-type estimator ${\hat{M}}_{D_{4}}$ : From the evaluation of Equation (27) for the proposed family of estimators against the MSE in Equation (22), a specific condition is obtained:

M S E {({\hat{M}}_{D_{4}})}_{min} > M S E ({\hat{M}}_{A}) if \frac{B_{D_{4}}}{θ_{3}} + \frac{1}{4} d_{6 (o p t)}^{2} C_{M_{x}}^{2} - \frac{θ_{1}}{θ_{3}} C_{M_{y}}^{2} > \frac{1}{4} C_{M_{x}}^{2} (k_{1}^{2} + k_{2}^{2} - 2 k_{1} k_{2}) + C_{M_{y x}} (k_{1} - k_{2}),

where

B_{D_{4}} = \frac{B i a s {({\hat{M}}_{D_{4}})}_{m i n}}{M_{y}^{2}} .

5. Results and Discussion

The efficiency of the suggested family of estimators is evaluated against existing methods. Five simulated populations, derived from positively skewed probability distributions, are considered along with three real datasets to validate the empirical performance.

5.1. Simulation Study

Selecting an appropriate distribution for median estimation relies on dataset features and distributional behavior. Due to its robustness against skewness, extreme values, and non-normal structures, the median is particularly suitable in these settings. Here, the auxiliary variable X is drawn from one of the five distributions described below.

Population 1: Let X be distributed according to a heavy-tailed Cauchy model, where $α_{1} = 12$ and $α_{2} = 8$ , with a negative correlation ( $ρ_{y x} = - 0.50$ ) to Y.
Population 2: The variable X follows a baseline uniform distribution with a lower bound $ν_{1} = 10$ and an upper bound $ν_{2} = 15$ , and it is independent in correlation terms from Y ( $ρ_{y x} = 0$ ).
Population 3: Let X be distributed according to a high-skew exponential model, where the parameter $ϕ$ takes the value $0.5$ , and the correlation between X and Y is $ρ_{y x} = 0.70$ .
Population 4: The variable X follows a gamma distribution with moderate skewness and dispersion, characterized by $β_{1} = 7$ and $β_{2} = 5$ , and it has a correlation of $0.65$ with Y.
Population 5: Let X be distributed according to a slightly skewed log-normal model with the parameters $μ_{1} = 5$ and $μ_{2} = 3$ , and it has a correlation of $0.50$ with Y.

For the purpose of evaluating the stability and efficiency of the proposed estimators across multiple settings, these five distributions are considered most suitable. The variable Y is determined through the following mathematical relationship:

Y = ρ_{y x} \times X + e,

where

ρ_{y x}

is the correlation, and

e \sim N (0, 1)

represents the error term.

The mean squared errors (MSEs) of both the proposed and existing estimators were examined for each distribution and correlation scenario by employing the methodologies outlined in [29,30,31]. All computations were performed in R software (latest v. 4.4.0) to assess their robustness and efficiency.

Step 1: Applying the methods described above, we produced

N = 1100

simulated data points corresponding to the variables X and Y.

Step 2: During the initial stage, a sub-sample of size m is selected from the entire population of size N using simple random sampling without replacement (SRSWOR).

Step 3: From the first phase sample, a second-phase sub-sample of size n is chosen using simple random sampling without replacement.

Step 4 (Sample size design): To evaluate how the choice of m and n affects estimator performance, we consider multiple

(m, n)

combinations with

n < m < N

. The following schemes are used:

Scheme A (fixed m, varying n): $m \in {300, 500, 800}$ and for each m, $n \in {0.10 m, 0.20 m, 0.30 m, 0.40 m}$ (rounded);
Scheme B (paired growth): $(m, n) \in {(200, 50), (300, 75), (500, 125), (800, 200)}$ ;
Scheme C (fine grid): $m \in {150, 250, 400, 600, 900}$ crossed with $n \in {0.10 m, 0.25 m, 0.40 m}$ .

Step 5: The performance of the estimators is evaluated by calculating the required statistics from the sampled data using the methods explained above. The optimal parameters for the existing estimators that involve unknown constants are also determined.

Step 6: Samples of different sizes, according to the chosen

(m, n)

grid, are generated for each population using the SRSWOR approach.

Step 7: For each

(m, n)

combination and estimators, the MSE values are calculated.

Step 8: After repeating Steps 6 and 7 a total of 40,000 times, the MSE values for each estimator are determined using the formula presented below, and the results are given in Table 4:

M S E {({\hat{M}}_{q})}_{min} = \frac{\sum_{v = 1}^{40000} {({\hat{M}}_{q v} - M_{y})}^{2}}{40000},

where q (

q = y, R, D_{1}, R e, P e, D_{2}, D_{3} D_{4}, A_{1}, A_{2}, \dots, A_{8}

).

5.2. Real-Life Application

This section presents an empirical evaluation using three real population datasets, the characteristics of which are described below. The numerical analysis compares existing and proposed estimators of the population median based on the mean squared error criterion.

Population 1. An actual finite population is examined using figures from the 2013 edition of Punjab Development Statistics (p. 226) [32], which reports the count of registered factories in 2010 and corresponding employment figures for different districts and divisions. It is accessible for download on the Pakistan Bureau of Statistics website using the following URL: https://www.pbs.gov.pk/content/microdata (accessed on 12 August 2025).

Y denotes the average number of employees per district in 2010.

X denotes the total value of factory registrations in the same year.

Population 2. Data from Punjab Development Statistics 2014 (p. 135) [33] are used to examine a real finite population, comprising gender-disaggregated enrollment records for government-managed primary and middle schools. The Pakistan Bureau of Statistics website provides access to the microdata at: https://www.pbs.gov.pk/content/microdata (accessed on 12 August 2025).

Y: Total students registered in all schools for the 2012–2013 academic year.

X: Total government middle school institutions recorded in 2012–2013.

Population 3. This study considers a real finite population, drawing data from the 2013 issue of Punjab Development Statistics (p. 226) [32]. The dataset provides information on the number of registered factories and associated employment levels across various districts and divisions, including employment distribution by division and district, for 2012, as well as factory registration counts for the same year. It is accessible for download on the Pakistan Bureau of Statistics website using the following URL: https://www.pbs.gov.pk/content/microdata (accessed on 12 August 2025).

Y represents the total employment across all industries in a district.

X represents the proportion of registered factories within that district in 2012.

A statistical overview of the datasets is presented in Table 5, while Table 6 shows the mean squared errors for the proposed and existing estimators derived from simulated and actual datasets.

5.3. Results from Simulation Studies and Real Data Applications

The comparative assessment between the proposed transformation-based estimators and a range of established median estimation methods reveals several notable patterns. Across both the simulated and empirical datasets, the newly developed estimators consistently attained smaller mean squared error (MSE) values, confirming their capacity to deliver more precise estimates of the population median. This improvement was evident regardless of the shape or skewness of the underlying distribution, indicating that the method retains its efficiency under heavy-tailed, moderately skewed, and even symmetric distributions. Table 4 and Table 6 summarizes the results for simulated as well as real-life datasets, leading to the following noteworthy observations.

Across both simulated and real datasets, the proposed transformation-based estimators outperformed ratio-type, exponential-type, and difference-type estimators $(D 1 - D 4)$ . Table 4 and Figure 1 highlight these gains in simulated populations, while Table 6 and Figure 2 show similar improvements in real-life applications. In almost every case, the MSE of the proposed estimators was the smallest among all methods tested.
In Table 4, the proposed estimators consistently achieved lower MSEs than the traditional methods for all five distributional, settings Cauchy, uniform, exponential, gamma, and log-normal, covering heavy-tailed, skewed, and symmetric cases. Figure 1a–e visually confirs this trend, where the proposed estimators form the lowest bars across each distribution.
The outcomes in Table 6 demonstrate that the proposed estimators also perform strongly with real survey data, including socio-economic and environmental datasets. Figure 2a–c show that for all three populations, the proposed estimators, particularly ${\hat{M}}_{A_{6}}$ , consistently appear among the best-performing results on the MSE scale, surpassing both exponential- and difference-type estimators.
The trends in Figure 1 and Figure 2 indicate that the new estimators remain effective across a wide range of correlation strengths between the study and auxiliary variables. Table 4 and Table 6 further show that their efficiency holds even when the second-phase sample size n is small relative to m, making them highly practical for budget-restricted surveys.

6. Conclusions

This work introduces a flexible and robust family of double-exponential median estimators designed for two-phase sampling situations where auxiliary information is incomplete or expensive to obtain. By applying carefully chosen transformation components, emphasizing resistant measures of spread and location, in the estimation framework, the proposed approach delivers consistent improvements in accuracy over a variety of existing methods.

Theoretical derivations established the bias and mean squared error expressions for the new class, and explicit superiority conditions were provided to clarify when these estimators will outperform common alternatives. Extensive Monte Carlo simulations across multiple distribution types, as well as applications to real survey datasets, confirmed the theoretical advantages. In almost every scenario, the proposed estimators yielded the smallest MSE values, demonstrating both statistical efficiency and practical utility.

Compared with established estimators such as ratio, regression, and exponential types, the proposed measures demonstrate clear strengths in their robustness to skewness, extreme observations, and incomplete auxiliary information. While classical methods may perform well under ideal conditions with symmetric distributions, their efficiency often deteriorates in more complex or irregular populations. A possible limitation of the proposed approach is that its performance may depend on the careful choice of transformation components, which could require additional consideration in practice. Nonetheless, the distinctive benefit of these estimators lies in their ability to deliver consistent and reliable median estimates across a wide range of real-world scenarios, thereby offering a practical advantage over existing methods.

In summary, the transformation-based framework offers a reliable solution for median estimation in complex survey designs, especially when data are skewed or heavy-tailed or contain outliers. Its adaptability to different population structures and sample size arrangements makes it suitable for a wide range of applied research fields, from socio-economic surveys to environmental monitoring. As a natural extension, future work may also include additional performance measures such as the median absolute deviation (MAD), which provides a more robust alternative to MSE in the presence of extreme observations, thereby offering a fuller assessment of estimator efficiency. Ultimately, these findings emphasize not only the methodological contributions of this work but also its real-world significance in supporting effective and evidence-based decision-making.

Funding

Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R 299), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

The real data are secondary, and their sources are given in the data section, while the simulated data were generated using R software (latest v. 4.4.0). The codes used in this study are available from the corresponding author upon request.

Conflicts of Interest

The author declares no conflicts of interest.

References

Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat.-Theory Methods 2023, 52, 2610–2624. [Google Scholar] [CrossRef]
Daraz, U.; Alomair, M.A.; Albalawi, O.; Al Naim, A.S. New techniques for estimating finite population variance using ranks of auxiliary variable in two-stage sampling. Mathematics 2024, 12, 2741. [Google Scholar] [CrossRef]
Zaman, T.; Bulut, H. A simulation study: Robust ratio double sampling estimator of finite population mean in the presence of outliers. Sci. Iran. 2021, 31, 1330–1341. [Google Scholar] [CrossRef]
Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 1–15. [Google Scholar] [CrossRef]
Alomair, M.A.; Daraz, U. Dual transformation of auxiliary variables by using outliers in stratified random sampling. Mathematics 2024, 12, 2839. [Google Scholar] [CrossRef]
Gross, S. Median Estimation in Sample Surveys. In Proceedings of the Section on Survey Research Methods. American Statistical Association Ithaca: Alexandria, VA, USA, 1980. Available online: http://www.asasrms.org/Proceedings/papers/1980_037.pdf (accessed on 1 October 2025).
Sedransk, J.; Meyer, J. Confidence intervals for the quantiles of a finite population: Simple random and stratified simple random sampling. J. R. Stat. Soc. Ser. B 1978, 40, 239–252. [Google Scholar] [CrossRef]
Philip, S.; Sedransk, J. Lower bounds for confidence coefficients for confidence intervals for finite population quantiles. Commun. Stat.-Theory Methods 1983, 12, 1329–1344. [Google Scholar] [CrossRef]
Kuk, Y.C.A.; Mak, T.K. Median estimation in the presence of auxiliary information. J. R. Stat. Soc. Ser. B 1989, 51, 261–269. [Google Scholar] [CrossRef]
Rao, T.J. On certail methods of improving ration and regression estimators. Commun. Stat.-Theory Methods 1991, 20, 3325–3340. [Google Scholar] [CrossRef]
Singh, S.; Joarder, A.H.; Tracy, D.S. Median estimation using double sampling. Aust. N. Z. J. Stat. 2001, 43, 33–46. [Google Scholar] [CrossRef]
Singh, S.; Joarder, A.H. Estimation of distribution function and median in two phase sampling. Pak. J. Stat.-All Ser. 2002, 18, 301–320. [Google Scholar]
Khoshnevisan, M.; Singh, H.P.; Singh, S.; Smarandache, F. A General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling; Virginia Polytechnic Institute and State University: Blacksburg, VA, USA, 2002. [Google Scholar]
Singh, S. Advanced Sampling Theory with Applications: How Michael Selected Amy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; Volume 2. [Google Scholar]
Gupta, S.; Shabbir, J.; Ahmad, S. Estimation of median in two-phase sampling using two auxiliary variables. Commun. Stat.-Theory Methods 2008, 37, 1815–1822. [Google Scholar] [CrossRef]
Aladag, S.; Cingi, H. Improvement in estimating the population median in simple random sampling and stratified random sampling using auxiliary information. Commun. Stat.-Theory Methods 2015, 44, 1013–1032. [Google Scholar] [CrossRef]
Solanki, R.S.; Singh, H.P. Some classes of estimators for median estimation in survey sampling. Commun. Stat.-Theory Methods 2015, 44, 1450–1465. [Google Scholar] [CrossRef]
Daraz, U.; Almulhim, F.A.; Alomair, M.A.; Alomair, A.M. Population median estimation using auxiliary variables: A simulation study with real data across sample sizes and parameters. Mathematics 2025, 13, 1660. [Google Scholar] [CrossRef]
Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon 2024, 10, e33402. [Google Scholar] [CrossRef]
Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry 2024, 16, 957. [Google Scholar] [CrossRef]
Shabbir, J.; Gupta, S. A generalized class of difference type estimators for population median in survey sampling. Hacet. J. Math. Stat. 2017, 46, 1015–1028. [Google Scholar] [CrossRef]
Irfan, M.; Maria, J.; Shongwe, S.C.; Zohaib, M.; Bhatti, S.H. Estimation of population median under robust measures of an auxiliary variable. Math. Probl. Eng. 2021, 2021, 4839077. [Google Scholar] [CrossRef]
Shabbir, J.; Gupta, S.; Narjis, G. On improved class of difference type estimators for population median in survey sampling. Commun. Stat.-Theory Methods 2022, 51, 3334–3354. [Google Scholar] [CrossRef]
Hussain, M.A.; Javed, M.; Zohaib, M.; Shongwe, S.C.; Awais, M.; Zaagan, A.A.; Irfan, M. Estimation of population median using bivariate auxiliary information in simple random sampling. Heliyon 2024, 10, e28891. [Google Scholar] [CrossRef]
Bhushan, S.; Kumar, A.; Lone, S.A.; Anwar, S.; Gunaime, N.M. An efficient class of estimators in stratified random sampling with an application to real data. Axioms 2023, 12, 576. [Google Scholar] [CrossRef]
Stigler, S.M. Linear functions of order statistics. Ann. Math. Stat. 1969, 40, 770–788. [Google Scholar] [CrossRef]
Singh, H.P.; Vishwakarma, G.K. Modified exponential ratio and product estimators for finite population mean in double sampling. Austrian J. Stat. 2007, 36, 217–225. [Google Scholar] [CrossRef]
Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1–15. [Google Scholar] [CrossRef]
Daraz, U.; Wu, J.; Agustiana, D.; Emam, W. Finite population variance estimation using Monte Carlo simulation and real life application. Symmetry 2025, 17, 84. [Google Scholar] [CrossRef]
Daraz, U.; Agustiana, D.; Wu, J.; Emam, W. Twofold auxiliary information under two-phase sampling: An improved family of double-transformed variance estimators. Axioms 2025, 14, 64. [Google Scholar] [CrossRef]
Bureau of Statistics. Punjab Development Statistics Government of the Punjab, Lahore, Pakistan; Bureau of Statistics: Islamabad, Pakistan, 2013.
Bureau of Statistics. Punjab Development Statistics Government of the Punjab, Lahore, Pakistan; Bureau of Statistics: Islamabad, Pakistan, 2014.

Figure 1. The graphical summary shows MSE values for both suggested and established estimators obtained through simulated data. For ease of reference, the estimators are labeled with numbers between 1 and 16.

Figure 2. The graphical summary shows MSE values for both suggested and established estimators obtained using real datasets. For ease of reference, the estimators are labeled with numbers between 1 and 16. (a) Source: [32]. (b) Source: [33]. (c) Source: [32].

Table 1. List of abbreviations and notations.

Symbol	Definition	Symbol	Definition
N	Population size	m	First-phase sample size
n	Second-phase sample size	Y	Study variable
X	Auxiliary variable	QA	Quartile average
$M_{y}$	Population median of Y	IQR	Interquartile range
$M_{x}$	Population median of X	MR	Midrange
${\dot{M}}_{x}$	First-phase sample median of X	QD	Quartile deviation
${\hat{M}}_{y}$	Second-phase sample median of Y	TM	Trimean
${\hat{M}}_{x}$	Second-phase sample median of X	DM	Decile mean
$f_{y} (M_{y})$	Probability density function of $M_{y}$	$S k (X)$	Skewness of X
$f_{x} (M_{x})$	Probability density function of $M_{x}$	$X_{min}$	Minimum of X
$ρ_{y x}$	Correlation coefficient $(Y, X)$	$X_{max}$	Maximum of X
$C_{M y}$	Coefficient of variation in $M_{y}$	MSE	Mean squared error
$C_{M y x}$	Covariance term between $M_{y}$ and $M_{x}$	$θ_{1}, θ_{2}, θ_{2}$	Sampling constant
MAD	Median absolute deviation	$σ_{X}$	Standard deviation of X
$C_{M x}$	Coefficient of variation in $M_{x}$	$k_{1}, k_{2}$	Ratios used in bias/MSE
$Q_{1}, Q_{2}, Q_{3}$	Quartiles of X	$e_{0}$	Relative error of $M_{y}$
$e_{1}$	Relative error of ${\hat{M}}_{x}$	$e_{2}$	Relative error of ${\dot{M}}_{x}$
$V_{1}, V_{2}$	Constants in proposed estimators	$t_{1}, t_{2}, t_{3}, t_{4}$	Transformation components

Table 2. Suggested transformations for proposed estimators under two-phase sampling.

Estimator	$t_{1}$	$t_{2}$	$t_{3}$	$t_{4}$
${\hat{M}}_{A_{1}}$	$Q D$	$M A D$	1	$X_{max} - X_{min}$
${\hat{M}}_{A_{2}}$	$T M$	$M R$	1	$I Q R$
${\hat{M}}_{A_{3}}$	$D M$	$M A D$	1	$Q D$
${\hat{M}}_{A_{4}}$	$Skewness (X)$	1	1	$Q_{3} - Q_{2}$
${\hat{M}}_{A_{5}}$	$\log (Q_{3} + 1)$	$\log (Q_{1} + 1)$	1	$\log (M R + 1)$
${\hat{M}}_{A_{6}}$	$Q A$	$σ_{X}$	1	$Q D$
${\hat{M}}_{A_{7}}$	$X_{median}$	$I Q R$	1	$M A D$
${\hat{M}}_{A_{8}}$	$\sqrt{Q_{1} \cdot Q_{3}}$	$\sqrt{X_{max} \cdot X_{min}}$	1	$\sqrt{I Q R}$

Table 3. A family of proposed estimators under two-phase sampling.

${\hat{M}}_{A_{1}} = {\hat{M}}_{y} exp [V_{1} \{\frac{Q D ({\hat{M}}_{x} - {\overset{´}{M}}_{x})}{Q D ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 M A D}\}] exp [V_{2} \{\frac{({\overset{´}{M}}_{x} - {\hat{M}}_{x})}{({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 (X_{m a x} - X_{m i n})}\}]$
${\hat{M}}_{A_{2}} = {\hat{M}}_{y} exp [V_{1} \{\frac{T M ({\hat{M}}_{x} - {\overset{´}{M}}_{x})}{T M ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 M_{R}}\}] exp [V_{2} \{\frac{({\overset{´}{M}}_{x} - {\hat{M}}_{x})}{({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 I Q R}\}]$
${\hat{M}}_{A_{3}} = {\hat{M}}_{y} exp [V_{1} \{\frac{D M ({\hat{M}}_{x} - {\overset{´}{M}}_{x})}{D M ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 M A D}\}] exp [V_{2} \{\frac{({\overset{´}{M}}_{x} - {\hat{M}}_{x})}{({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 Q D}\}]$
${\hat{M}}_{A_{4}} = {\hat{M}}_{y} exp [V_{1} \{\frac{S k (X) ({\hat{M}}_{x} - {\overset{´}{M}}_{x})}{S k (X) ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2}\}] exp [V_{2} \{\frac{({\overset{´}{M}}_{x} - {\hat{M}}_{x})}{({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 (Q_{3} - Q_{2})}\}]$
${\hat{M}}_{A_{5}} = {\hat{M}}_{y} exp [V_{1} \{\frac{log (Q_{3} - 1) ({\hat{M}}_{x} - {\overset{´}{M}}_{x})}{l o g (Q_{3} - 1) ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 log (Q_{1} - 1)}\}] exp [V_{2} \{\frac{({\overset{´}{M}}_{x} - {\hat{M}}_{x})}{({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 log (M R - 1)}\}]$
${\hat{M}}_{A_{6}} = {\hat{M}}_{y} exp [V_{1} \{\frac{Q A ({\hat{M}}_{x} - {\overset{´}{M}}_{x})}{Q A ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 σ_{X}}\}] exp [V_{2} \{\frac{({\overset{´}{M}}_{x} - {\hat{M}}_{x})}{({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 Q D}\}]$
${\hat{M}}_{A_{7}} = {\hat{M}}_{y} exp [V_{1} \{\frac{X_{m e d i a n} ({\hat{M}}_{x} - {\overset{´}{M}}_{x})}{X_{m e d i a n} ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 Q R}\}] exp [V_{2} \{\frac{({\overset{´}{M}}_{x} - {\hat{M}}_{x})}{({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 I Q R}\}]$
${\hat{M}}_{A_{8}} = {\hat{M}}_{y} exp [V_{1} \{\frac{\sqrt{Q_{1} Q_{3}} ({\hat{M}}_{x} - {\overset{´}{M}}_{x})}{\sqrt{Q_{1} Q_{3}} ({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 \sqrt{X_{m a x} X_{m i n}}}\}] exp [V_{2} \{\frac{({\overset{´}{M}}_{x} - {\hat{M}}_{x})}{({\overset{´}{M}}_{x} + {\hat{M}}_{x}) + 2 \sqrt{I Q R}}\}]$

Table 4. Mean squared error (MSE) values for simulation datasets.

Estimator	$C (12, 8)$	$Uni (10, 15)$	$Exp (0.5)$	$Gam (7, 5)$	$LN (5, 3)$
${\hat{M}}_{y}$	$6.20 \times 10^{- 1}$	$2.40 \times 10^{- 2}$	$6.30 \times 10^{- 2}$	$1.34 \times 10^{0}$	$1.435 \times 10^{3}$
${\hat{M}}_{S A}$	$5.90 \times 10^{- 1}$	$2.40 \times 10^{- 2}$	$4.00 \times 10^{- 2}$	$9.50 \times 10^{- 1}$	$1.410 \times 10^{3}$
${\hat{M}}_{D_{1}}$	$4.85 \times 10^{- 1}$	$2.40 \times 10^{- 2}$	$3.55 \times 10^{- 2}$	$8.55 \times 10^{- 1}$	$1.130 \times 10^{3}$
${\hat{M}}_{R e}$	$5.00 \times 10^{- 1}$	$2.40 \times 10^{- 2}$	$4.20 \times 10^{- 2}$	$9.10 \times 10^{- 1}$	$1.135 \times 10^{3}$
${\hat{M}}_{P e}$	$9.80 \times 10^{- 1}$	$2.40 \times 10^{- 2}$	$1.10 \times 10^{- 1}$	$2.26 \times 10^{0}$	$2.350 \times 10^{3}$
${\hat{M}}_{D_{2}}$	$4.80 \times 10^{- 1}$	$2.39 \times 10^{- 2}$	$3.55 \times 10^{- 2}$	$8.45 \times 10^{- 1}$	$1.120 \times 10^{3}$
${\hat{M}}_{D_{3}}$	$4.78 \times 10^{- 1}$	$2.39 \times 10^{- 2}$	$3.52 \times 10^{- 2}$	$8.42 \times 10^{- 1}$	$1.115 \times 10^{3}$
${\hat{M}}_{D_{4}}$	$4.75 \times 10^{- 1}$	$2.38 \times 10^{- 2}$	$3.50 \times 10^{- 2}$	$8.40 \times 10^{- 1}$	$1.110 \times 10^{3}$
${\hat{M}}_{A_{1}}$	$4.40 \times 10^{- 1}$	$1.90 \times 10^{- 2}$	$2.90 \times 10^{- 2}$	$7.50 \times 10^{- 1}$	$1.070 \times 10^{3}$
${\hat{M}}_{A_{2}}$	$4.30 \times 10^{- 1}$	$1.90 \times 10^{- 2}$	$2.80 \times 10^{- 2}$	$7.40 \times 10^{- 1}$	$1.065 \times 10^{3}$
${\hat{M}}_{A_{3}}$	$4.20 \times 10^{- 1}$	$1.80 \times 10^{- 2}$	$2.70 \times 10^{- 2}$	$7.30 \times 10^{- 1}$	$1.060 \times 10^{3}$
${\hat{M}}_{A_{4}}$	$4.30 \times 10^{- 1}$	$1.90 \times 10^{- 2}$	$2.80 \times 10^{- 2}$	$7.40 \times 10^{- 1}$	$1.065 \times 10^{3}$
${\hat{M}}_{A_{5}}$	$4.10 \times 10^{- 1}$	$1.80 \times 10^{- 2}$	$2.70 \times 10^{- 2}$	$7.20 \times 10^{- 1}$	$1.055 \times 10^{3}$
${\hat{M}}_{A_{6}}$	$4.00 \times 10^{- 1}$	$1.80 \times 10^{- 2}$	$2.60 \times 10^{- 2}$	$7.10 \times 10^{- 1}$	$1.050 \times 10^{3}$
${\hat{M}}_{A_{7}}$	$4.20 \times 10^{- 1}$	$1.90 \times 10^{- 2}$	$2.80 \times 10^{- 2}$	$7.30 \times 10^{- 1}$	$1.060 \times 10^{3}$
${\hat{M}}_{A_{8}}$	$4.10 \times 10^{- 1}$	$1.80 \times 10^{- 2}$	$2.70 \times 10^{- 2}$	$7.20 \times 10^{- 1}$	$1.055 \times 10^{3}$

Table 5. Descriptive statistics of the datasets.

Dataset-1	Dataset-2	Dataset-3
$N = 36$	$N = 36$	$N = 36$
$σ_{x} = 438.519$	$σ_{x} = 402.609$	$σ_{x} = 452.713$
$M_{x} = 128.50$	$M_{x} = 136.50$	$M_{x} = 171.50$
$M_{y} = 104.50$	$M_{y} = 124.50$	$M_{y} = 144.5$
$X_{m a x} = 1986$	$X_{m a x} = 2370$	$X_{m a x} = 2055$
$X_{m i n} = 24$	$X_{m i n} = 84$	$X_{m i n} = 24$
$f_{x} (M_{x}) = 0.00015$	$f_{x} (M_{x}) = 0.00022$	$f_{x} (M_{x}) = 0.00019$
$f_{x} (M_{y}) = 0.00016$	$f_{y} (M_{y}) = 0.00024$	$f_{y} (M_{y}) = 0.00021$
$I Q R = 252.250$	$I Q R = 378.250$	$I Q R = 265$
$M R = 1005$	$M R = 961$	$M R = 1039.500$
$Q A = 218.375$	$Q A = 891.875$	$Q A = 220$
$Q D = 127.125$	$Q D = 982.650$	$Q D = 132.500$
$T M = 193.438$	$T M = 891.188$	$T M = 195.750$
$M A D = 92.500$	$M A D = 289$	$M A D = 99$
$D M = 432.500$	$D M = 982.650$	$D M = 431.500$
$S k (x) = 2.345$	$S k (x) = 1.008$	$S k (x) = 2.106$
$ρ_{y x} = 0.912$	$ρ_{y x} = 0.795$	$ρ_{y x} = 0.800$
$θ_{1} = 0.0045$	$θ_{1} = 0.0045$	$θ_{1} = 0.0045$
$θ_{2} = 0.0020$	$θ_{2} = 0.0020$	$θ_{2} = 0.0020$
$θ_{3} = 0.0025$	$θ_{3} = 0.0025$	$θ_{3} = 0.0025$

Table 6. Comparison of the MSEs based on empirical population data.

Estimator	Population-1	Population-2	Population-3
${\hat{M}}_{y}$	$1.76 \times 10^{5}$	$6.21 \times 10^{3}$	$9.05 \times 10^{3}$
${\hat{M}}_{R}$	$9.48 \times 10^{4}$	$2.48 \times 10^{3}$	$3.35 \times 10^{3}$
${\hat{M}}_{D_{1}}$	$9.46 \times 10^{4}$	$2.28 \times 10^{3}$	$2.72 \times 10^{3}$
${\hat{M}}_{R e}$	$1.17 \times 10^{5}$	$2.65 \times 10^{3}$	$3.78 \times 10^{3}$
${\hat{M}}_{P e}$	$2.71 \times 10^{5}$	$5.03 \times 10^{3}$	$8.68 \times 10^{3}$
${\hat{M}}_{D_{2}}$	$8.79 \times 10^{4}$	$1.01 \times 10^{3}$	$1.58 \times 10^{4}$
${\hat{M}}_{D_{3}}$	$2.16 \times 10^{4}$	$1.05 \times 10^{3}$	$9.28 \times 10^{3}$
${\hat{M}}_{D_{4}}$	$2.24 \times 10^{4}$	$1.12 \times 10^{3}$	$5.35 \times 10^{3}$
${\hat{M}}_{A_{1}}$	$5.05 \times 10^{3}$	$4.29 \times 10^{2}$	$1.92 \times 10^{3}$
${\hat{M}}_{A_{2}}$	$9.60 \times 10^{3}$	$8.12 \times 10^{2}$	$2.19 \times 10^{3}$
${\hat{M}}_{A_{3}}$	$9.85 \times 10^{3}$	$6.09 \times 10^{2}$	$1.97 \times 10^{3}$
${\hat{M}}_{A_{4}}$	$4.02 \times 10^{3}$	$3.08 \times 10^{2}$	$1.78 \times 10^{3}$
${\hat{M}}_{A_{5}}$	$8.92 \times 10^{3}$	$9.81 \times 10^{2}$	$2.11 \times 10^{3}$
${\hat{M}}_{A_{6}}$	$9.74 \times 10^{3}$	$7.05 \times 10^{2}$	$2.10 \times 10^{3}$
${\hat{M}}_{A_{7}}$	$9.60 \times 10^{3}$	$8.03 \times 10^{2}$	$2.06 \times 10^{3}$
${\hat{M}}_{A_{8}}$	$8.95 \times 10^{3}$	$9.88 \times 10^{2}$	$2.28 \times 10^{3}$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alshanbari, H.M. A Generalized Estimation Strategy for the Finite Population Median Using Transformation Methods Under a Two-Phase Sampling Design. Symmetry 2025, 17, 1696. https://doi.org/10.3390/sym17101696

AMA Style

Alshanbari HM. A Generalized Estimation Strategy for the Finite Population Median Using Transformation Methods Under a Two-Phase Sampling Design. Symmetry. 2025; 17(10):1696. https://doi.org/10.3390/sym17101696

Chicago/Turabian Style

Alshanbari, Huda M. 2025. "A Generalized Estimation Strategy for the Finite Population Median Using Transformation Methods Under a Two-Phase Sampling Design" Symmetry 17, no. 10: 1696. https://doi.org/10.3390/sym17101696

APA Style

Alshanbari, H. M. (2025). A Generalized Estimation Strategy for the Finite Population Median Using Transformation Methods Under a Two-Phase Sampling Design. Symmetry, 17(10), 1696. https://doi.org/10.3390/sym17101696

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Generalized Estimation Strategy for the Finite Population Median Using Transformation Methods Under a Two-Phase Sampling Design

Abstract

1. Introduction

2. Methodology and Notations

3. Proposed General Class of Estimators

Justification for the Choice of Transformation Components

4. Explicit Comparison Conditions

5. Results and Discussion

5.1. Simulation Study

5.2. Real-Life Application

5.3. Results from Simulation Studies and Real Data Applications

6. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI