Next Article in Journal
Exact Solutions for Strong Nonlinear Oscillators with Linear Damping
Previous Article in Journal
Integrating Copula-Based Random Forest and Deep Learning Approaches for Analyzing Heterogeneous Treatment Effects in Survival Analysis
Previous Article in Special Issue
A Self-Normalized Online Monitoring Method Based on the Characteristic Function
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Population Median Estimation Using Auxiliary Variables: A Simulation Study with Real Data Across Sample Sizes and Parameters

by
Umer Daraz
1,
Fatimah A. Almulhim
2,
Mohammed Ahmed Alomair
3,* and
Abdullah Mohammed Alomair
3
1
School of Mathematics and Statistics, Central South University, Changsha 410017, China
2
Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
3
Department of Quantitative Methods, School of Business, King Faisal University, Al-Ahsa 31982, Saudi Arabia
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(10), 1660; https://doi.org/10.3390/math13101660
Submission received: 12 April 2025 / Revised: 14 May 2025 / Accepted: 15 May 2025 / Published: 19 May 2025

Abstract

:
This paper introduces an enhanced class of ratio estimators, which employ the transformation technique on an auxiliary variable under simple random sampling to estimate the population median. The transformation strategy can reduce both the bias and mean square error, which can help estimators become more efficient. The bias and mean square error of proposed estimators are investigated up to the first order of approximation. Through simulation studies and the analysis of various data sets, the performance of the proposed estimators is compared to existing methods. The proposed class of estimators improves the precision and efficiency of median estimation, ensuring more accurate and dependable results in various practical scenarios. The findings reveal that the new estimators show superior performance under the given conditions compared to traditional estimators.

1. Introduction

The use of auxiliary information is necessary for improving estimator efficiency in survey sampling, both in the planning and estimation stages as well. The median is frequently a more suitable measure of central tendency than the mean when dealing with variables that show highly skewed distributions, such as income, expenditure, taxation, consumption, and production. For researchers who want to identify the exact center of their data, the median is an essential tool because these variables regularly show higher skewness. However, the study of efficient techniques for precisely estimating medians in finite populations is still comparatively less focused, despite the extensive literature on the estimation of means, variances, proportions, regression coefficients, and totals. The gap emphasizes the need for more study to enhance methods that can use auxiliary data to estimate the median more accurately. For more details, we refer to [1,2].
In many situations, the median plays an important role in analyzing skewed data or outliers. It improves statistical accuracy by reflecting survey answers in the social sciences, income in economics, pollutant levels in environmental research, patient outcomes in health, and real estate market patterns. The estimation of the median under simple random sampling was initially examined by [3,4,5,6], and they investigated how to calculate it using auxiliary information. Following the work of [6], several estimators have been developed for population median calculation using the different sampling methods. Certain unique methods for enhancing ratios and regression estimators for the median were suggested by [7]. Using double sampling methods, [8] introduced some methods to discuss median estimation. An improved family of estimators using two auxiliary variables in double sampling to estimate the population median was obtained by [9]. The minimum unbiased estimator, using the known median of the auxiliary variable, was proposed by [10]. Later on, improvements in the accuracy of median estimation by using two auxiliary variables under the two-phase sampling technique was discussed by [11]. Some improved types of estimators to estimate the population median under simple random and stratified random sampling were obtained by [12,13]. Under different sampling methods, by utilizing the auxiliary information for different population parameters, some new types of estimators were obtained by [14,15]. While this study focuses on robust median estimation in survey sampling, we acknowledge classical approaches such as least squares estimation (LSE) and multi-innovation least squares (MILS) methods, which are widely used in parameter estimation and model fitting under noisy conditions [16]. In recent years, many researchers are focusing on introducing the new efficient estimators to estimate the population median under different sampling methods. For more details about median estimation, we refer to [17,18,19,20,21,22,23,24,25], and references therein.
The median is less affected by outliers than the mean and is better suited for skewed data sets. It provides a more precise representation of typical values, as it is not influenced by extreme values. The reliability of this measure is enhanced in environmental studies and surveys, where results may be affected by fluctuating significant increases or extreme responses. Motivated by [26,27], this paper introduces an enhanced class of ratio estimators, which employ the transformation techniques as an auxiliary variable in simple random sampling to estimate the population median. The use of transformation methods can help minimize both the bias and mean square error, leading to more efficient estimators. Below are some important aspects of the newly proposed class of estimators for practical applications:
  • These improved estimators reduce bias and MSE, making median estimates more accurate and reliable. Our estimators use unique transformation techniques that include the inter-quartile range, mid-range, quartile average, quartile deviation, and robust measures (trimean and decile mean) on auxiliary variables to improve precision. This helps estimators handle data variability and enhance efficiency.
  • The suggested estimators are flexible to skewed distributions and data sets with outliers, in contrast to many other estimators previously in use. Because of their flexibility, they are especially useful in domains such as environmental research, healthcare, and income analysis.
  • The suggested estimators perform well with simple random sampling, which is one of the most often used sampling methods in survey research. Their usefulness across multiple areas can be improved by adapting them to diverse data distributions. This practical applicability makes them valuable tools for policymakers, researchers, and industry professionals.
  • By considering additional data characteristics, such as the relationships between the target and auxiliary variable, the new estimators enhance their effectiveness in more complex survey designs, while filling a gap in the existing statistical literature. This contribution not only advances practical applications but also paves the way for future developments in the field.
Limitations of existing estimators: Traditional median estimators often perform poorly when the data are skewed or contains outliers, which is common in practical fields such as income analysis, healthcare, and environmental studies. These estimators typically rely on conventional measures that are sensitive to extreme values.
Motivation for robust transformations: To address these limitations, we introduce a new class of estimators that incorporate robust and non-conventional measures (e.g., interquartile range, trimean, quartile deviation, decile mean) through transformation techniques. These measures enhance stability and efficiency by reducing sensitivity to skewness and outliers.
Improvement in efficiency: The motivation is also grounded in improving estimator performance in terms of the bias and mean squared error (MSE). The proposed transformation-based estimators consistently demonstrate higher percent relative efficiency (PRE) in both simulated and real-world data sets, as shown in our analysis.
The structure of this paper is organized as follows: Section 2 provides a thorough explanation of the methods and notations utilized in this study. A review of existing estimators is outlined in Section 3. The proposed class of estimators is introduced and examined in detail in Section 4. Section 5 presents a rigorous mathematical comparison of these estimators. To validate the theoretical findings from Section 5, a simulation study is conducted in Section 6, where five different artificial populations are generated using various probability distributions. Additionally, this section includes numerical examples to illustrate the practical implementation of the theoretical results. Finally, Section 7 discusses the key results and provides potential directions for future research.

2. Variables and Notation

Suppose the auxiliary variable is denoted by X and the study variable by Y for a finite population consisting of N units, denoted as Γ = ( Γ 1 , Γ 2 , , Γ N ) . For each unit i where i = 1 , 2 , , N , the corresponding values of the auxiliary and study variables are x i and y i , respectively. Let a random sample of size n be selected from the population of size N with the condition that n < N under simple random sampling without replacement (SRSWOR). The population medians for the study and auxiliary variables are represented as M y and M x , and the sample medians as M y ^ and M x ^ . Let C M y and C M x represent the population coefficients of variation of the study variable Y and the auxiliary variable X, respectively. The associated probability density functions for the population medians are f y ( M y ) and f x ( M x ) . The correlation coefficient between M y and M x is denoted by ρ y x and is defined as ρ ( M y , M x ) = 4 P 11 ( y , x ) 1 , where P 11 = P ( y M y x M x ) .
To determine the mathematical properties of different estimators, the relative error terms are utilized as follows:
e 0 = M ^ y M y M y
and
e 1 = M ^ x M x M x ,
such that E e i = 0 for i = 0 , 1 .
E e 0 2 = δ C M y 2 ,
E e 1 2 = δ C M x 2 ,
E e 0 e 1 = δ C M y x = ρ y x C M y C M x ,
where
C M y = 1 M y f y ( M y ) ,
C M x = 1 M y f x ( M x ) ,
denote the population coefficient of variations of the study variable Y and the auxiliary variable X, and
δ = 1 4 1 n 1 N ,
is the finite population correction factor.

3. Previously Proposed Estimators

This section explores the biases and mean squared errors associated with existing estimators used for estimating the finite population median. We then compare these results with those of our new estimators to identify potential improvements.
The unbiased common estimator for calculating the population median is the median per unit estimator proposed by [3] and is stated as follows:
M ^ G R = M ^ y .
The variance of M ^ G R is given by
V a r ( M ^ y ) = δ M y 2 C M y 2 ,
where δ and C M y are defined in Section 2.
This estimator, called the median per unit estimator, uses the sample median to estimate the population median under simple random sampling. It serves as a baseline method due to its simplicity and unbiasedness, but it may be less efficient than other estimators that use auxiliary information. Several authors have proposed alternative estimators that utilize auxiliary information to improve efficiency. Examples of such estimators are presented starting from Equation (3) onward.
A ratio estimator for the median obtained by [6] is defined as
M ^ R = M ^ y M ^ x M x .
The following formulas are used to express the bias and MSE of M ^ R :
B i a s M ^ R δ M y C M x 2 C M y x
and
M S E M ^ R δ M y 2 C M y 2 + C M x 2 2 C M y x .
The difference estimator for M ^ D introduced by [10], which is defined as
M ^ D = M ^ y + d M x M ^ x ,
where d is an unknown constant, and the optimum values of the d is given below:
d o p t = ρ y x M y C M y M x C M x .
The following formula represents the minimum MSE of M ^ D , is expressed as
M S E M ^ D m i n δ M y 2 C M y 2 1 ρ y x 2 .
By using the concept given by [28], we introduce the following estimators in terms of the median, which are given below:
M ^ R e = M ^ y exp M x M ^ x M x + M ^ x
and
M ^ P e = M ^ y exp M ^ x M x M x + M ^ x .
The biases and MSEs for ( M ^ R e , M ^ P e ) are defined as follows:
B i a s M ^ R e δ M y 3 8 C M x 2 1 2 C M y x ,
B i a s M ^ P e δ M y 1 2 C M y x 3 8 C M x 2 ,
M S E M ^ R e δ M y 2 C M y 2 + 1 4 C M x 2 C M y x
and
M S E M ^ P e δ M y 2 C M y 2 + 1 4 C M x 2 + C M y x .
The estimators for estimating the median, introduced by [7,11], are defined as follows:
M ^ D 1 = d 1 M ^ y + d 2 M x M ^ x ,
M ^ D 2 = d 3 M ^ y + d 4 M x M ^ x M x M ^ x ,
M ^ D 3 = d 5 M ^ y + d 6 M x M ^ x M x M ^ x M x + M ^ x .
The values of the unknown constants d i ( i = 1 , 2 , , 6 ) are given below:
d 1 o p t = 1 1 + δ C M y 2 1 ρ y x 2 ,
d 2 o p t = M y M x ρ y x C M y 1 + δ C M y 2 1 ρ y x 2 ,
d 3 o p t = 1 δ C M y 2 1 δ C M y 2 + δ C M y 2 1 ρ y x 2 ,
d 4 o p t = M y M x 1 + d 3 o p t ρ y x C M y C M x 2 ,
d 5 o p t = 1 8 8 δ C M x 2 1 + δ C M x 2 1 ρ y x 2
and
d 6 o p t = M y M x 1 2 + d 5 o p t ρ y x C M y C M x 1 .
The following expressions represent the minimum biases and mean square errors of M ^ D i ( i = 1 , 2 , 3 ) :
B i a s M ^ D 1 m i n δ M y C M y 2 ( 1 ρ y x 2 ) 1 + δ C M y 2 ( 1 ρ y x 2 ) ,
B i a s M ^ D 2 m i n δ M y C M y 2 ( 1 ρ y x 2 ) δ C M y 2 C M x 2 + C M y ρ y x C M x 1 δ C M y 2 C M y x 1 δ C M y 2 ρ y x 2 ,
B i a s M ^ D 3 m i n M y 1 + δ C M x 2 ( 1 ρ y x 2 ) [ 1 8 δ C M x 2 1 + 8 1 ρ y x 2 + δ 8 8 δ C M x 2 3 8 C M x 2 1 2 C M y x + δ 2 C M x 2 1 2 + ρ y x C M y 8 C M x 8 δ C M x 2 1 + δ C M x 2 ( 1 ρ y x 2 ) ] ,
M S E M ^ D 1 m i n δ M y 2 C M y 2 1 ρ y x 2 1 + δ C M y 2 1 ρ y x 2 ,
M S E M ^ D 2 m i n δ M y 2 1 δ C M x 2 C M y 2 1 ρ y x 2 1 δ C M x 2 + δ C M y 2 1 ρ y x 2
and
M S E M ^ D 3 m i n δ M y 2 C M x 2 1 ρ y x 2 δ 4 C M x 2 1 16 C M x 2 + C M y 2 1 ρ y x 2 1 + δ C M y 2 1 ρ y x 2 .

4. Proposed Family of Estimators

The fact that most existing estimators are based on traditional measures is one of their main drawbacks. When extreme values appear in the data set, the effectiveness of the estimation methods is uncertain. This section introduces a new family of estimators that use unique transformation techniques that include the interquartile range, mid-range, quartile average, quartile deviation, and robust measures (trimean and decile mean) on auxiliary variables to improve precision. These improved estimators reduce bias and MSE, making median estimates more accurate and reliable. The suggested estimators are defined as follows:
M ^ U = k 1 M ^ y M x M ^ x α 1 + k 2 M x M ^ x M x M ^ x α 2 exp a 1 ( M x M ^ x ) a 1 ( M ^ x + M x ) + 2 a 2 ,
where scalar values ( α 1 , α 2 ) can take on values of ( 0 , 1 , 1 ) , while the constants k 1 , k 2 need to be estimated to reduce bias and mean squared errors. Additionally, sub-classes of the suggested estimator are derived from Equation (23) by applying the auxiliary variable parameters a 1 and a 2 , as outlined in Table 1. It is important to note that a 1 and a 2 can represent either constant values or functions based on known robust parameters and unconventional measures linked to the variable X.

Properties of the Proposed Estimator

To investigate the behavior of the proposed estimator, we simplify Equation (23) into a form involving relative errors. This approach facilitates the calculation of the bias and mean squared error (MSE) of M U ^ , i.e.,
M ^ U = k 1 M y 1 + e 0 1 + e 1 α 1 k 2 M x e 1 1 + e 1 α 2 exp g 4 e 1 2 1 + g 4 2 e 1 1
where
g 4 = a 1 M x a 1 M x + a 2 .
Expanding both sides of (24) and applying the first-order Taylor series expansion, while disregarding higher-order terms where e i > 2 , we obtain the following result:
M ^ U M y M y + k 1 M y 1 + e 0 e 1 α 1 + g 4 2 + e 1 2 α 1 g 4 2 + 3 g 4 2 8 + α 1 α 1 + 1 2 e 0 e 1 α 1 + g 4 2 k 2 M x e 1 e 1 2 α 2 + g 4 2 .
Using (25), the bias of M ^ U is given by
B i a s M ^ U M y k 1 M y Δ D k 2 Δ G ,
where
Δ D = 1 + δ C M x 2 3 g 4 2 + 4 α 1 g 4 + α 1 + 1 8 C M y x 2 α 1 + g 4 2
and
Δ G = δ M x C M x 2 2 α 2 + g 4 2 .
To find the MSE of M ^ U , we square both sides of (25) and apply the expectation, which leads to the following formula:
M S E M ^ U M y 2 + k 1 2 M y 2 Δ A + k 2 2 Δ B 2 k 1 M y 2 Δ D 2 k 2 M y Δ G + 2 k 1 k 2 M y Δ F ,
where
Δ A = 1 + δ C M y 2 + C M x 2 α 1 3 α 1 + 1 + 2 g 4 g 4 + 1 2 2 C M y x 2 2 α 1 + g 4 ,
Δ B = δ M x 2 C M x 2
and
Δ F = δ M x C M x 2 ( α 1 + α 2 + g 4 ) C M y x .
The following expressions are the values of k 1 and k 2 that are optimal and are obtained by minimizing the Equation (27):
k 1 o p t = Δ B Δ D Δ F Δ G Δ A Δ B Δ F 2
and
k 2 o p t = M y Δ A Δ G Δ D Δ F Δ A Δ B Δ F 2 .
By inserting the optimal values of k 1 and k 2 into Equations (26) and (27), we obtain the corresponding minimum values for bias and MSE of M ^ D , as given below:
B i a s M ^ U m i n M y 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2
and
M S E M ^ U m i n M y 2 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 .

5. Comparison of Estimators

In this section, we provide the efficiency conditions by using the mean square error equation of the proposed family of estimators with the mean squared error equations of existing estimators, such as M ^ y , M ^ R , M ^ D , M ^ R e , M ^ P e , M ^ D 1 , M ^ D 2 , and M ^ D 3 .
(i) The following condition results from (29) and (2):
V a r ( M ^ y ) > M S E M ^ U m i n   if ,
δ M y 2 C M y 2 > M y 2 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 ,
θ C M y 2 + Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 > 1 .
(ii) The following condition results from (29) and (5):
M S E ( M ^ R ) > M S E M ^ U m i n   if ,
δ M y 2 C M y 2 + C M x 2 2 C M y x > M y 2 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 ,
δ C M y 2 + C M x 2 2 C M y x + Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 > 1 .
(iii) The following condition results from (29) and (7):
M S E ( M ^ D ) m i n > M S E M ^ U m i n   if ,
δ M y 2 C M y 2 1 ρ y x 2 > M y 2 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 ,
δ C M y 2 1 ρ y x 2 + Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 > 1 .
(iv) The following condition results from (29) and (12):
M S E ( M ^ R e ) > M S E M ^ U m i n   if ,
δ M y 2 C M y 2 + 1 4 C M x 2 C M y x > M y 2 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 ,
δ C M y 2 + 1 4 C M x 2 C M y x + Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 > 1 .
(v) The following condition results from (29) and (13):
M S E ( M ^ P e d ) > M S E M ^ U m i n   if ,
δ M y 2 C M y 2 + 1 4 C M x 2 + C M y x > M y 2 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 ,
δ C M y 2 + 1 4 C M x 2 + C M y x + Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 > 1 .
(vi) The following condition results from (29) and (20):
M S E ( M ^ D 1 ) min > M S E M ^ U m i n   if ,
δ M y 2 C M y 2 1 ρ y x 2 1 + δ C M y 2 1 ρ y x 2 ) > M y 2 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 ,
δ C M y 2 1 ρ y x 2 1 + δ C M y 2 1 ρ y x 2 + Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 > 1 .
(vii) The following condition results from (29) and (21):
M S E ( M ^ D 2 ) min > M S E M ^ U m i n   if ,
δ M y 2 C M x 2 1 ρ y x 2 δ 4 C M X 2 1 16 C M x 2 + C M y 2 1 ρ y x 2 1 + δ C M y 2 1 ρ y x 2 > M y 2 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 ,
δ C M y 2 1 δ C M x 2 1 ρ y x 2 1 δ C M x 2 + δ C M y 2 1 ρ y x 2 + Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 > 1 .
(viii) The following condition results from (29) and (22):
M S E ( M ^ D 3 ) min > M S E M ^ U m i n   if ,
δ M y 2 C M x 2 1 ρ y x 2 δ 4 C M X 2 1 16 C M x 2 + C M y 2 1 ρ y x 2 1 + δ C M y 2 1 ρ y x 2 > M y 2 1 Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 ,
δ C M x 2 1 ρ y x 2 δ 4 C M X 2 1 16 C M x 2 + C M y 2 1 ρ y x 2 1 + δ C M y 2 1 ρ y x 2 + Δ A Δ G 2 + Δ B Δ D 2 2 Δ D Δ F Δ G Δ A Δ B Δ F 2 > 1 .

6. Analysis and Discussion of Findings

In this section motivated by [26,27], we use some positively skewed distributions to generate the simulated populations to check the effectiveness of the new proposed family of estimators with several existing estimators. In addition, some data sets are used for the further confirmation of how well the new proposed estimators perform.

6.1. Simulation Study

The nature of the data and the statistical characteristics of the distribution determine which distribution is most suited for a median estimate; the median is especially useful and compatible with skewed distributions, contains outliers, and has non-normal characteristics. Positively skewed distributions were used to evaluate the performance of the proposed estimators under conditions where the data deviate from normality, which is common in many practical survey scenarios. We selected the given five distributions to obtain the variable X:
  • Population 1: X Moderate skew and spread Gamma distribution ( α = 4 , β = 1 ) with ρ y x = 0.5 ;
  • Population 2: X Slight skew Log-Normal distribution ( μ = 2 , σ = 1 ) with ρ y x = 0.35 ;
  • Population 3: X Heavy tails Cauchy distribution ( γ 0 = 5 , γ = 2 ) with ρ y x = 0.45 ;
  • Population 4: X Baseline Uniform distribution ( a = 5 , b = 20 ) with ρ y x = 0 ;
  • Population 5: X High skew Exponential ( μ = 1 2 ) with ρ y x = 0.75 .
These five distributions are most suited for testing and proving the robustness of the suggested estimators under conditions and their characteristics: We can determine the variable Y through the expression given below:
Y = r y x × X + e ,
where r y x is the correlation, and
e N ( 0 , 1 ) ,
represents the error term.
In order to evaluate robustness and efficiency, we applied the following methods to the R program and examined the MSEs of the various suggested estimators and other existing estimators for every distribution and correlation setting.
(1) Generate populations: Using the techniques described above, simulate N = 900 observations for X and Y .
(2) Sample selection: Samples of size n can be chosen using simple random sampling without replacement (SRSWOR). The sample sizes are n = 30 , 70 , and 120.
(3) Optimal parameter estimation: To assess and check the performance of the estimators, compute the required statistics (such as sample mean, median, variance, and covariance) from the sampled data using the procedures outlined above. The optimal values for the enhanced class of estimators, which consider unknown constants along, are also determined.
(4) Performance Metrics: For every sample size, the efficiency ( P R E ) values for each estimator discussed in this article are computed.
(5) Replications: After 85,000 iterations of steps 3 and 4, compute the mean squared errors and percent relative efficiency using the formulas provided below:
M S E ( M ^ t ) min = k = 1 85000 M ^ t k M y 2 85000
and
P R E = V a r M ^ y M S E ( M ^ t ) min × 100 ,
where t = R , D , R e , P e , D 1 , D 2 D 3 , U 1 , U 2 , , U 8 .

6.2. Real-Life Application

We now examine the percent relative efficiency (PRE) of three distinct data sets, helping us to evaluate the performance of the proposed family of estimators as compared to other different estimators. The data sets and their main statistical summaries are described in detail below,
Population 1 
(Source: Singh [10]).
Y: represents the total quantity of fish that was caught in 1995;
X : represents the total quantity of fish caught by marine recreational fishermen in 1994.
N = 69 , n = 17 , M x = 2011 , M y = 2068 , f x ( M x ) = 0.00014 , f y ( M y ) = 0.00014 , ρ y x = 0.151 , T M = 4043 , D M = 3853 , Q R = 3936 , Q A = 2956 , Q D = 1968 , M R = 19019.5 .
Population 2 
(Source: Murthy [29]).
Y : denotes the total number of households;
X : denotes the total area in square miles.
N = 128 , n = 45 , M x = 4.715 , M y = 686 , f x ( M x ) = 0.1154 , f y ( M y ) = 0.00092 , ρ y x = 0.468 , T M = 5.385 , D M = 5.378 , Q R = 4.618 , Q A = 5.316 , Q D = 2.309 , M R = 8.270 .
Population 3 
(Source: Koyuncu and Kadilar [30]).
Y : denotes the total number of teachers in educational institutions;
X : denotes the total number of students enrolled in educational institutions in 2012.
N = 923 , n = 180 , M x = 4123 , M y = 171 , f x ( M x ) = 0.00009409 , f y ( M y ) = 0.002676 , ρ y x = 0.855 , T M = 7726 , D M = 7348.3 , Q R = 8283 , Q A = 5870.5 , Q D = 4141.5 , M R = 89530 .
Population 4 
(Source: Singh [10]).
Y: represents the total quantity of fish that was caught in 1995;
X : represents the total quantity of fish caught by marine recreational fishermen in 1934.
N = 69 , n = 17 , M x = 2007 , M y = 2068 , f x ( M x ) = 0.00014 , f y ( M y ) = 0.00014 , ρ y x = 0.314 , T M = 3777 , D M = 3615.2 , Q R = 3936 , Q A = 3002 , Q D = 1975 , M R = 17030 .
Now, we calculate the efficiency for each estimator by using the formulas expressed below:
P R E = V a r M ^ y M S E ( M ^ v ) × 100 ,
where v = R , D , R e , P e , D 1 , D 2 D 3 , U 1 , U 2 , , U 8 . The results of this analysis, which highlight the performance of our proposed family of estimators, can be found in Table 5.

6.3. Discussion

To support the median estimations, we conducted simulations using suitable distributions with varying ρ y x values and different sample sizes. We also examined four data sets to assess the performance of the suggested family of estimators. Our main criterion for comparing the various estimators was the percent relative efficiency criterion (PRE). Table 2, Table 3 and Table 4 present the PRE values for the new family and other existing estimators from five simulated distributions. In Table 5, the results from the actual data sets are displayed. We derive the following significant conclusions from these analyses:
  • The P R E values for all new estimators are higher than those of the other existing estimators covered in Section 2, according to the findings of both simulated and real data sets, which are shown in Table 2, Table 3, Table 4 and Table 5. This illustrates the better performance of the newly proposed estimators over existing ones.
  • Furthermore, the upward-trending graph lines in Figure 1 and Figure 2 for both five simulated distributions and four real data sets prove that all new estimators have P R E values that are consistently greater than those of existing estimators. The inverse relationship between the P R E values for the new estimators and the existing estimators led to the conclusion that the new family of estimators outperforms existing methods. This proves that the proposed family of estimators performs better than the other estimators, as it shows a reverse relationship between the M S E values for the new and existing approaches.
  • Based on the comprehensive simulation results and real-life data analysis presented in Table 2, Table 3, Table 4 and Table 5, the estimator M ^ U 8 consistently demonstrates the highest percent relative efficiency (PRE) among all proposed and existing estimators. Therefore, we recommend the use of M ^ U 8 in practice, especially in applications involving skewed or data affected by extreme under simple random sampling.
Table 2. Percent relative efficiency (PRE) for n = 30 .
Table 2. Percent relative efficiency (PRE) for n = 30 .
Estimator Gam ( 4 , 1 ) LN ( 2 , 1 ) C ( 5 , 2 ) Uni ( 5 , 20 ) Exp ( 0.5 )
M ^ y 100.00100.00100.00100.00100.00
M ^ R 110.54108.3198.70100.29112.39
M ^ D 124.37126.80115.21118.43124.80
M ^ R e 117.21119.79109.35113.01120.12
M ^ P e 102.46101.5793.4695.68103.61
M ^ D 1 133.41135.12121.60125.30131.35
M ^ D 2 140.10145.97132.50138.47138.55
M ^ D 3 152.32156.37142.17148.28155.80
M ^ U 1 256.17241.29228.36230.54245.23
M ^ U 2 276.30265.81251.71253.48267.34
M ^ U 3 263.73251.27239.49241.96256.88
M ^ U 4 248.97236.70223.54226.83240.15
M ^ U 5 283.61270.10258.46260.77276.52
M ^ U 6 298.46286.75274.17277.35293.19
M ^ U 7 315.70303.46289.33293.29309.45
M ^ U 8 328.33316.95302.73306.48322.86
Table 3. Percent relative efficiency (PRE) for n = 70 .
Table 3. Percent relative efficiency (PRE) for n = 70 .
Estimator Gam ( 4 , 1 ) LN ( 2 , 1 ) C ( 5 , 2 ) Uni ( 5 , 20 ) Exp ( 0.5 )
M ^ y 100.00100.00100.00100.00100.00
M ^ R 108.21105.7896.4197.57110.46
M ^ D 130.15129.42118.41120.82127.41
M ^ R e 115.49116.88105.82109.23119.08
M ^ P e 91.2490.3786.2788.3891.60
M ^ D 1 141.30141.31128.94132.56139.50
M ^ D 2 158.98158.94141.75147.39156.48
M ^ D 3 174.83174.87156.34161.26172.28
M ^ U 1 284.72270.95259.60261.86276.36
M ^ U 2 305.84294.39283.74285.64298.42
M ^ U 3 293.14280.76271.28272.92287.15
M ^ U 4 278.69265.84256.36259.22273.96
M ^ U 5 314.94301.55289.83292.31308.52
M ^ U 6 329.79317.90304.27307.13324.72
M ^ U 7 348.39336.77320.41323.45342.33
M ^ U 8 360.40349.58333.25337.64355.72
Table 4. Percent relative efficiency (PRE) for n = 120 .
Table 4. Percent relative efficiency (PRE) for n = 120 .
Estimator Gam ( 4 , 1 ) LN ( 2 , 1 ) C ( 5 , 2 ) Uni ( 5 , 20 ) Exp ( 0.5 )
M ^ y 100.00100.00100.00100.00100.00
M ^ R 104.81103.1793.12095.56106.50
M ^ D 121.71120.93112.72115.24123.77
M ^ R e 112.33111.97101.59105.70114.77
M ^ P e 85.6682.6879.8181.2883.51
M ^ D 1 132.79136.11122.30127.84135.13
M ^ D 2 158.91162.72143.18150.27157.07
M ^ D 3 175.80178.90156.85164.76176.41
M ^ U 1 315.48302.18289.43291.64307.25
M ^ U 2 336.86326.38313.78316.23331.14
M ^ U 3 324.14312.44301.65303.52318.65
M ^ U 4 310.53297.81287.23290.16305.44
M ^ U 5 342.82328.78315.48319.80335.31
M ^ U 6 358.27346.74331.29335.44352.61
M ^ U 7 378.44366.92348.93352.75373.17
M ^ U 8 391.20380.18362.10366.54387.48
Table 5. Percent relative efficiency (PRE) of different estimators using real populations.
Table 5. Percent relative efficiency (PRE) of different estimators using real populations.
EstimatorData 1Data 2Data 3Data 4
M ^ y 100.00100.00100.00100.00
M ^ R 110.50108.3398.74100.28
M ^ D 124.33126.80115.23118.46
M ^ R e 117.29119.71109.35113.03
M ^ P e 102.48101.5293.4695.64
M ^ D 1 133.47135.13121.67125.35
M ^ D 2 140.16145.24132.58138.46
M ^ D 3 152.35156.35142.19148.27
M ^ U 1 256.14241.26228.30230.58
M ^ U 2 276.33265.87251.75253.49
M ^ U 3 263.72251.28239.44241.90
M ^ U 4 248.91236.79223.53226.85
M ^ U 5 283.60270.12258.92260.74
M ^ U 6 298.44286.75274.11277.34
M ^ U 7 315.77303.48289.36293.23
M ^ U 8 328.37316.93302.78306.42

6.3.1. Limitations of the Proposed Estimator

Although the proposed family of estimators demonstrates improved efficiency in terms of bias and percent relative efficiency (PRE), particularly for skewed distributions, it has some practical limitations:
  • Dependence on auxiliary information: The performance improvements depend strongly on the availability and quality of auxiliary variables. If the auxiliary variable exhibits a weak correlation with the study variable, or if its distributional characteristics are poorly understood, the proposed estimators may not yield significant benefits.
  • Computational complexity: Unlike traditional estimators such as the sample median or ratio-type estimators, the proposed estimators require more elaborate computations. These include transformation functions and the estimation of tuning parameters (such as k 1 and k 2 ), which may be less practical in time-constrained or resource-limited survey environments.
  • Assumption of known robust parameters: The proposed estimators use robust statistics like the interquartile range, trimean, or decile mean, assuming that such measures are either available from prior knowledge or can be estimated accurately. In scenarios where this information is unavailable or unreliable, the estimator’s effectiveness could be limited.

6.3.2. Applicability of the Proposed Estimators in Practical Scenarios

The proposed family of estimators is particularly useful in the following applied settings:
  • Skewed distributions: In surveys involving variables such as income, expenditures, or environmental indicators that typically show non-normal or skewed distributions, the proposed estimators outperform traditional alternatives by offering greater robustness.
  • Presence of outliers: In data sets that contain outliers or extreme values, traditional estimators may be heavily influenced, whereas the proposed estimators, which are designed using robust transformations, continue to provide reliable median estimates.
  • Moderate to strong auxiliary variable correlation: When the auxiliary variable is moderately or strongly correlated with the study variable, the proposed estimators achieve high percent relative efficiency (PRE), as supported by both simulation results and real-data applications in the manuscript.

7. Conclusions and Future Works

In this work, we used robust measurements of an auxiliary variable to obtain a new family of estimators for estimating the finite population median under simple random sampling. The first degree of approximation was used to obtain the biases, mean squared errors (MSEs), and minimum MSE of the new family of estimators. To compare the potential performance of new estimators with existing estimators using the percent relative efficiency criterion, we performed a simulation analysis using five distributions with all effective conditions and four real data sets. The simulation and numerical real-life data set results are given in Table 2, Table 3, Table 4 and Table 5, which show that the new family of estimators performs well and can obtain the optimum estimators as compared to other existing estimators. While all of the new estimators have a higher percent relative efficiency than other estimators, the proposed estimator M ^ U 8 is the best option among the proposed family of estimators and is thus strongly recommended.
Furthermore, our analysis focused on the properties of the new family of estimators within the context of simple random sampling. It is worth exploring the potential of developing new estimators based on these findings; we can extend these results to some other sampling techniques, such as finite population median using stratified random sampling, rank set sampling, finite population proportions, and population medians in the presence of non-sampling errors.

Author Contributions

Conceptualization, U.D. and F.A.A.; Methodology, U.D. and F.A.A.; Software, U.D.; Validation, U.D., M.A.A. and A.M.A.; Formal analysis, U.D., F.A.A., M.A.A. and A.M.A.; Investigation, U.D., F.A.A., M.A.A. and A.M.A.; Resources, U.D., F.A.A. and A.M.A.; Data curation, U.D., F.A.A. and A.M.A.; Writing—original draft, U.D.; Writing—review & editing, U.D., F.A.A. and M.A.A.; Visualization, U.D.; Supervision, M.A.A.; Project administration, U.D., M.A.A. and A.M.A.; Funding acquisition, M.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No. KFU251889].

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
  2. Särndal, C.E. Sample survey theory vs. general statistical theory: Estimation of the population mean. Int. Stat. Rev. Int. Stat. 1972, 40, 1–12. [Google Scholar] [CrossRef]
  3. Gross, S. Median estimation in sample surveys. In Proceedings of the Section on Survey Research Methods; American Statistical Association Ithaca: Alexandria, VA, USA, 1980. [Google Scholar]
  4. Sedransk, J.; Meyer, J. Confidence intervals for the quantiles of a finite population: Simple random and stratified simple random sampling. J. R. Stat. Soc. Ser. B 1978, 40, 239–252. [Google Scholar] [CrossRef]
  5. Philip, S.; Sedransk, J. Lower bounds for confidence coefficients for confidence intervals for finite population quantiles. Commun. Stat. Theory Methods 1983, 12, 1329–1344. [Google Scholar] [CrossRef]
  6. Kuk, Y.C.A.; Mak, T.K. Median estimation in the presence of auxiliary information. J. R. Stat. Soc. Ser. B 1989, 51, 261–269. [Google Scholar] [CrossRef]
  7. Rao, T.J. On certail methods of improving ration and regression estimators. Commun. Stat. Theory Methods 1991, 20, 3325–3340. [Google Scholar] [CrossRef]
  8. Singh, S.; Joarder, A.H.; Tracy, D.S. Median estimation using double sampling. Aust. N. Z. J. Stat. 2001, 43, 33–46. [Google Scholar] [CrossRef]
  9. Khoshnevisan, M.; Singh, H.P.; Singh, S.; Smarandache, F. A General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling; Virginia Polytechnic Institute and State University: Blacksburg, VA, USA, 2002. [Google Scholar]
  10. Singh, S. Advanced Sampling Theory with Applications: How Michael Selected Amy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; Volume 2. [Google Scholar]
  11. Gupta, S.; Shabbir, J.; Ahmad, S. Estimation of median in two-phase sampling using two auxiliary variables. Commun. Stat. Theory Methods 2008, 37, 1815–1822. [Google Scholar] [CrossRef]
  12. Aladag, S.; Cingi, H. Improvement in estimating the population median in simple random sampling and stratified random sampling using auxiliary information. Commun. Stat. Theory Methods 2015, 44, 1013–1032. [Google Scholar] [CrossRef]
  13. Solanki, R.S.; Singh, H.P. Some classes of estimators for median estimation in survey sampling. Commun. Stat. Theory Methods 2015, 44, 1450–1465. [Google Scholar] [CrossRef]
  14. Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1899402. [Google Scholar] [CrossRef]
  15. Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry 2024, 16, 957. [Google Scholar] [CrossRef]
  16. Ding, F. Least squares parameter estimation and multi-innovation least squares methods for linear fitting problems from noisy data. J. Comput. Appl. Math. 2023, 426, 115107. [Google Scholar] [CrossRef]
  17. Shabbir, J.; Gupta, S. A generalized class of difference type estimators for population median in survey sampling. Hacet. J. Math. Stat. 2017, 46, 1015–1028. [Google Scholar] [CrossRef]
  18. Irfan, M.; Maria, J.; Shongwe, S.C.; Zohaib, M.; Bhatti S., H. Estimation of population median under robust measures of an auxiliary variable. Math. Probl. Eng. 2021, 2021, 4839077. [Google Scholar] [CrossRef]
  19. Shabbir, J.; Gupta, S.; Narjis, G. On improved class of difference type estimators for population median in survey sampling. Commun. Stat. Theory Methods 2022, 51, 3334–3354. [Google Scholar] [CrossRef]
  20. Subzar, M.; Lone, S.A.; Ekpenyong, E.J.; Salam, A.; Aslam, M.; Raja, T.A.; Almutlak, S.A. Efficient class of ratio cum median estimators for estimating the population median. PLoS ONE 2023, 18, e0274690. [Google Scholar] [CrossRef] [PubMed]
  21. Iseh, M.J. Model formulation on efficiency for median estimation under a fixed cost in survey sampling. Model Assist. Stat. Appl. 2023, 18, 373–385. [Google Scholar] [CrossRef]
  22. Hussain, M.A.; Javed, M.; Zohaib, M.; Shongwe, S.C.; Awais, M.; Zaagan, A.A.; Irfan, M. Estimation of population median using bivariate auxiliary information in simple random sampling. Heliyon 2024, 10, e28891. [Google Scholar] [CrossRef]
  23. Almulhim, F.A.; Alghamdi, A.S. Simulation-based evaluation of robust transformation techniques for median estimation under simple random sampling. Axioms 2025, 14, 301. [Google Scholar] [CrossRef]
  24. Baig, A.; Masood, S.; Ahmed Tarray, T. Improved class of difference-type estimators for population median in survey sampling. Commun. Stat. Theory Methods 2019, 49, 5778–5793. [Google Scholar] [CrossRef]
  25. Masood, S.; Ibrar, B.; Shabbir, J.; Movaheedil, Z. Estimating neutrosophic finite median employing robust measures of the auxiliary variable. Sci. Rep. 2024, 14, 10255. [Google Scholar] [CrossRef] [PubMed]
  26. Daraz, U.; Alomair, M.A.; Albalawi, O.; Al Naim, A.S. New Techniques for Estimating Finite Population Variance Using Ranks of Auxiliary Variable in Two-Stage Sampling. Mathematics 2024, 12, 2741. [Google Scholar] [CrossRef]
  27. Alomair, M.A.; Daraz, U. Dual transformation of auxiliary variables by using outliers in stratified random sampling. Mathematics 2024, 12, 2829. [Google Scholar] [CrossRef]
  28. Singh, H.P.; Vishwakarma, G.K. Modified exponential ratio and product estimators for finite population mean in double sampling. Austrian J. Stat. 2007, 36, 217–225. [Google Scholar] [CrossRef]
  29. Murthy, M.N. Sampling Theory and Methods; Statistical Publishing Society: Calcutta, India, 1967. [Google Scholar]
  30. Koyuncu, K.; Kadilar, C. Family of estimators of population mean using two auxiliary variables in stratified random sampling. Commun. Stat. Theory Methods 2009, 38, 2398–2417. [Google Scholar] [CrossRef]
Figure 1. A graphical display of the findings utilizing information obtained from different population distributions.
Figure 1. A graphical display of the findings utilizing information obtained from different population distributions.
Mathematics 13 01660 g001
Figure 2. A graphical display of the findings utilizing information obtained from different data sets. (a) Source: [10]. (b) Source: [29]. (c) Source: [30]. (d) Source: [10].
Figure 2. A graphical display of the findings utilizing information obtained from different data sets. (a) Source: [10]. (b) Source: [29]. (c) Source: [30]. (d) Source: [10].
Mathematics 13 01660 g002
Table 1. A family of new proposed estimators.
Table 1. A family of new proposed estimators.
Different Classes of M ^ U α 1 α 2 a 1 a 2
M ^ U 1 = k 1 M ^ y M x M ^ x + k 2 M x M ^ x M ^ x M x L 1 1 Q R D M
M ^ U 2 = k 1 M ^ y M x M ^ x + k 2 M x M ^ x L 10 M R T M
M ^ U 3 = k 1 M ^ y M ^ x M x + k 2 M x M ^ x M ^ x M x L 1 1 Q A Q D
M ^ U 4 = k 1 M ^ y + k 2 M x M ^ x M x M ^ x L 01 Q D Q A
M ^ U 5 = k 1 M ^ y + k 2 M x M ^ x M ^ x M x L 0 1 T M M R
M ^ U 6 = k 1 s y 2 M x M ^ x + k 2 M x M ^ x M x M ^ x L 11 D M Q R
M ^ U 7 = k 1 M ^ y M ^ x M x + k 2 M x M ^ x M x M ^ x L 1 1 M R Q A
M ^ U 8 = k 1 M ^ y M ^ x M x + k 2 M x M ^ x L 1 0 Q D T M
where L = exp a 1 ( M ^ x M x ) a 1 ( M ^ x + M x ) + 2 a 2 , Interquartile range: Q R = Q 3 Q 1 , Midrange: M R = x m i n + x m a x 2 , Quartile average: Q A = Q 3 + Q 1 2 , Quartile deviation: Q D = Q 3 Q 1 2 , Trimean: T M = Q 1 + 2 Q 2 + Q 3 4 and, Decile mean: D M = i = 1 9 D i 9 .
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Daraz, U.; Almulhim, F.A.; Alomair, M.A.; Alomair, A.M. Population Median Estimation Using Auxiliary Variables: A Simulation Study with Real Data Across Sample Sizes and Parameters. Mathematics 2025, 13, 1660. https://doi.org/10.3390/math13101660

AMA Style

Daraz U, Almulhim FA, Alomair MA, Alomair AM. Population Median Estimation Using Auxiliary Variables: A Simulation Study with Real Data Across Sample Sizes and Parameters. Mathematics. 2025; 13(10):1660. https://doi.org/10.3390/math13101660

Chicago/Turabian Style

Daraz, Umer, Fatimah A. Almulhim, Mohammed Ahmed Alomair, and Abdullah Mohammed Alomair. 2025. "Population Median Estimation Using Auxiliary Variables: A Simulation Study with Real Data Across Sample Sizes and Parameters" Mathematics 13, no. 10: 1660. https://doi.org/10.3390/math13101660

APA Style

Daraz, U., Almulhim, F. A., Alomair, M. A., & Alomair, A. M. (2025). Population Median Estimation Using Auxiliary Variables: A Simulation Study with Real Data Across Sample Sizes and Parameters. Mathematics, 13(10), 1660. https://doi.org/10.3390/math13101660

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop