Next Article in Journal
Damage Imaging in Plate Structures Under Broadband Chirp Excitation Based on f-k Domain Modal Separation and Additive–Multiplicative Modal Fusion
Previous Article in Journal
Representation-Level Temporal–Frequency Symmetric Learning for Battery State-of-Charge Estimation and Voltage Reconstruction
Previous Article in Special Issue
A Decomposition-Driven Hybrid Approach to Forecasting Oil Market Dynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Transformation-Based Median Estimation Under Stratified Double Sampling with Limited Auxiliary Information

1
Department of Management Sciences, College of Business Administration, Hunan University, Changsha 410082, China
2
Department of Mathematics and Statistics, College of Science, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
3
Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(6), 933; https://doi.org/10.3390/sym18060933 (registering DOI)
Submission received: 16 April 2026 / Revised: 25 May 2026 / Accepted: 27 May 2026 / Published: 29 May 2026
(This article belongs to the Special Issue Unlocking the Power of Probability and Statistics for Symmetry)

Abstract

This study develops a new class of transformation-based estimators for estimating the population median within a stratified two-phase sampling framework. The proposed approach is designed to improve estimation accuracy while reducing survey costs, particularly in situations where auxiliary information is partially available or expensive to collect. By using suitable transformations of the auxiliary variable, the estimators achieve greater stability and robustness in the presence of skewed distributions and extreme observations. Theoretical properties of the proposed estimators are established using first-order approximations, leading to explicit expressions for bias and mean squared error. Optimal conditions are also derived to ensure improved efficiency. To evaluate performance, both simulation experiments and real-world datasets are considered under a variety of distributional settings and correlation structures. The results consistently show that the proposed estimators outperform conventional approaches, including ratio, regression, and exponential-type estimators, in terms of efficiency. In particular, notable improvements are observed for skewed and heavy-tailed populations, where traditional methods often perform poorly. These findings highlight the practical usefulness of the proposed methodology for survey applications in fields such as economics, public health, and social sciences, where reliable median estimation is essential.

1. Introduction

A sampling design can be particularly costly when attempting to obtain accurate estimates in survey research, especially when the population is skewed or contains extreme values. In such cases, the median is a more appropriate measure of central tendency than the arithmetic mean, which may produce unreliable results. Median-based analysis has become widely used in many fields, including monitoring computer network traffic, studying species distribution in ecology, and evaluating public policies, where extreme responses can significantly influence average outcomes. As a practical extension of stratification methods, stratified two-phase sampling balances estimation accuracy with survey economy. In economic surveys, stratifying data by employment status provides reliable estimates of median income, while grouping individuals by age or disease severity in health studies improves the estimation of median recovery times. Therefore, the advantages of this approach are evident in practical applications. These applications demonstrate the effectiveness of stratified two-phase sampling as a reliable method for conducting surveys. Further details regarding the use of auxiliary information can be found in [1,2].
The problem of population median estimation has attracted considerable attention in survey sampling, particularly when dealing with skewed populations and extreme observations. Important theoretical results on linear functions of order statistics were presented in [3], which later provided the basis for many developments in median estimation. The practical importance of median estimation in sample surveys was discussed in [4], while confidence interval procedures for finite population quantiles under simple and stratified random sampling were developed in [5]. Further investigations on lower bounds for confidence coefficients associated with interval estimation of finite population quantiles were carried out in [6]. The role of auxiliary information in improving population median estimation was later demonstrated in [7]. Improved ratio and regression estimation methods were introduced in [8], motivating several subsequent contributions in survey sampling theory. Extensions of median estimation to double sampling frameworks were studied in [9], whereas a generalized class of estimators using two auxiliary variables in double sampling was proposed in [10]. Additional theoretical developments and applications in sampling theory were presented in [11]. Later, median estimation procedures under two-phase sampling using two auxiliary variables were proposed in [12], leading to substantial improvements in estimation efficiency.
Further advancements in population median estimation have focused on improving efficiency and robustness through the effective use of auxiliary information. Improvements in estimating the population median under both simple random and stratified random sampling were presented in [13]. Several flexible classes of estimators for median estimation in survey sampling were later proposed in [14], providing alternative approaches for achieving higher estimation accuracy. More recently, simulation-based studies on auxiliary variable methods for population median estimation using real data across different sample sizes were conducted in [15]. A family of estimators utilizing auxiliary information for improved population median estimation was introduced in [16], whereas improved difference-type estimators for survey sampling applications were proposed in [17]. Generalized classes of median estimators based on auxiliary variables were further developed in [18], while robust auxiliary measures for neutrosophic finite median estimation were explored in [19]. The balance between estimation accuracy and survey cost under stratified random sampling was emphasized in [20]. In addition, improved and robust difference-type estimators for population median estimation were discussed in [21,22,23]. Efficient ratio-cum-median estimators were proposed in [24], whereas efficiency considerations under fixed survey cost conditions were examined in [25]. Furthermore, robust stratified median estimation procedures based on auxiliary transformations and nontraditional auxiliary measures for asymmetric populations were introduced in [26].
For skewed or non-normal populations, conventional estimators such as ratio, regression, exponential, and product estimators often become unreliable because they rely on distributional symmetry and are highly sensitive to extreme observations. To overcome the limitations highlighted in [15], the present study introduces transformation-based estimators within a stratified two-phase random sampling framework to improve estimation accuracy and stability. Since the median provides a more robust measure of central tendency and the auxiliary information collected during the first phase enhances estimation efficiency, the proposed procedures are particularly useful in variable situations such as weather-dependent agricultural yields, manufacturing processes involving defective items, and highly skewed educational outcomes.
The proposed class of estimators offers several distinct contributions to survey methodology. Its distributional adaptability ensures reliable performance in skewed or non-normal populations, where conventional estimators often fail. The proposed approach is also resistant to outliers, making it less vulnerable to extreme values and thereby improving estimation accuracy in real-world data settings. The stratified two-phase sampling design reduces overall survey cost without compromising precision by effectively utilizing supplementary information collected during the first phase. In addition, the transformation-enhanced approach outperforms traditional ratio and regression estimators in terms of relative efficiency. The generalizability of the proposed method further extends its applicability to various fields where accurate median estimation is important, including income analysis, healthcare, and agriculture. Furthermore, both simulation studies and real datasets demonstrate cross-distributional consistency, confirming that the proposed estimators are reliable and stable under different statistical conditions.
Although the present work is motivated by earlier transformation-based median estimation procedures, the current study extends the methodology to a substantially more complex stratified two-phase sampling framework. Unlike conventional single-stage or simple two-phase settings, the proposed approach adopts independent first-phase and second-phase sampling within each stratum, requiring separate derivations of bias, mean squared error, and optimal parameter conditions under stratified double sampling.
The proposed framework further combines stratum-specific auxiliary transformations to improve robustness and efficiency in heterogeneous populations where auxiliary information may be partially available or expensive to obtain. In addition, the study develops a generalized family of estimators containing several transformation sub classes and evaluates their performance under skewed and heavy-tailed distributions as well as real population datasets. These extensions considerably broaden the practical applicability of transformation-based median estimation under complex survey designs.

Practical Implementation and Applicability of the Proposed Estimators

The proposed class of transformation-based estimators is designed not only to improve theoretical efficiency but also to provide practical applicability in real survey situations where auxiliary information may be partially available or costly to obtain. The estimators can be implemented using standard statistical software such as R, MATLAB, SPSS, or Python without requiring complicated computational procedures. Since the methodology is based on sample medians, auxiliary transformations, and first-order approximations, the implementation process remains computationally manageable even for moderately large populations.
In practical applications, the first-phase sample can be used to collect inexpensive auxiliary information, while the second-phase sample is utilized for obtaining both study and auxiliary variables. This structure is especially useful in large-scale surveys where complete information on the study variable is difficult, time-consuming, or expensive to obtain. The proposed estimators therefore offer an effective balance between estimation precision and survey cost.
An additional advantage of the proposed methodology is its flexibility under different population structures. The transformation parameters introduced in the estimator family allow the method to adapt efficiently to skewed, heavy-tailed, and heterogeneous populations where traditional ratio and regression estimators may lose efficiency. The simulation and empirical analyses presented in this study demonstrate that the estimators maintain stable performance across different correlation levels and distributional settings.
Furthermore, the proposed estimators can be applied in a wide range of practical fields including health surveys, economic investigations, agricultural studies, educational assessments, and social science research, where robust estimation of the population median is often more appropriate than mean-based estimation. The improved efficiency and stability observed in both simulated and real datasets indicate that the proposed methodology can serve as a reliable alternative to existing median estimators under stratified two-phase sampling designs.

2. Notation and Symbols

This section presents the notation and general framework used for median estimation. Consider a finite population of size N denoted by:
Ω = { Ω 1 , Ω 2 , Ω 3 , , Ω N } .
The population is subdivided into L non-overlapping and exhaustive strata, where stratum h contains N h units, and is expressed as:
h = 1 L N h = N .
where Y denotes the study variable of interest and X denotes an auxiliary variable used to improve the accuracy of the estimates. Simple random samples of size m h are independently drawn without replacement from each stratum, and these are combined to obtain the total sample size in phase l:
X 1 h = { x 1 h , x 2 h , , x m h }
and
h = 1 L m h = m .
The observations at the initial sampling phase are measured for the auxiliary variable X. These observations are then used to approximate the population median of X, denoted by M x , based on the sampled values.
The sample observations in phase II,
X 2 h = { x 1 h , x 2 h , , x n h } ,
are then selected by drawing n h units from each first-phase sample within each stratum using SRSWOR, where
h = 1 L n h = n .
At this phase of the survey, information is gathered simultaneously for the study variable Y and the auxiliary variable X .
Let the hth stratum’s population medians be denoted by M y h and M x h , respectively. Their two-phase sample estimates are M ^ x 1 h , M ^ x 2 h , and M ^ y h . The probability densities at the respective medians in the hth stratum are represented by f y h ( M y h ) and f x h ( M x h ) . The relationship between these medians is defined by
ρ x h y h ( M x h , M y h ) = 4 Q 11 h ( x h , y h ) 1 ,
with Q 11 h representing the probability that
Q ( x h M x h , y h M y h ) .
A basic framework for determining bias and mean squared error is provided by the following expressions, which relate relative errors to their corresponding expected values. These formulations are important for evaluating the accuracy and reliability of estimators, as they allow researchers to approximate the expected deviation from the true population parameter while accounting for sampling variability.
Δ 0 h = M ^ y h M y h M y h ,
Δ 1 h = M ^ x 1 h M x h M x h
and
Δ 2 h = M ^ x 2 h M x h M x h ,
such that E Δ i h = 0 for i = 0 , 1 , 2 .
Further,
E Δ 0 h 2 = δ 1 h C M y h 2 ,
E Δ 1 h 2 = δ 2 h C M x h 2 ,
E Δ 2 h 2 = δ 1 h C M x h 2 ,
E Δ 0 h Δ 1 h = δ 2 h C M y x h = ρ y x h C M y h C M x h ,
E Δ 0 h Δ 2 h = δ 1 h C M y x h = ρ y x h C M y h C M x h ,
E Δ 1 h Δ 2 h = δ 2 h C M x h 2 ,
where
C M y h = 1 M y h f y h ( M y h ) ,
C M x h = 1 M x h f x h ( M x h ) ,
δ 1 h = 1 4 1 n h 1 N h ,
δ 2 h = 1 4 1 m h 1 N h ,
δ 3 h = 1 4 1 n h 1 m h .

3. Review of Conventional Estimators

The median estimators developed in the literature perform differently depending on the sampling design and population structure. This section provides a thorough analysis of these estimators, emphasizing their theoretical properties such as variance, bias, and mean squared error. Examining these methods not only provides a basis for comparison but also highlights the limitations of existing approaches, thereby laying the groundwork for the development of improved and more efficient estimation methods.
Using the stratified two-phase sampling approach, the conventional sample median estimator and associated variance expression are defined as follows, initially introduced by [4]:
D ^ 1 s t = h = 1 L W h M ^ y h
and
V a r ( D ^ 1 s t ) = h = 1 L δ 1 h W h 2 M y h 2 C M y h 2 .
In this research, we modify the ratio estimator proposed by [9] for double sampling to accommodate stratified double sampling, which is expressed as follows:
D ^ 2 s t = h = 1 L W h M ^ y h M ^ x 2 h M ^ x 1 h .
We can use the following first-order formulas to obtain estimates of D ^ 2 s t :
B i a s D ^ 2 s t h = 1 L W h δ 3 h M y h C M x h 2 C M y x h
and
M S E D ^ 2 s t h = 1 L W h 2 M y h 2 δ 1 h C M y h 2 + δ 3 h C M x h 2 2 C M y x h .
The difference-type estimator is rewritten using the method of [11] in stratified two-phase sampling. For D ^ 3 s t , the following expression holds:
D ^ 3 s t = h = 1 L W h M ^ y h + d 1 h M ^ x 1 h M ^ x 2 h .
The optimum mean MSE for D ^ 3 s t at the best value of d 1 h using the first-order approximation is as follows:
M S E D ^ 3 s t m i n h = 1 L W h 2 M y h 2 C M y h 2 δ 1 h δ 3 h ρ y x h 2 ,
where
d 1 h o p t = ρ y x h M y h C M y h M x h C M x h .
In this work, the exponential estimators proposed by [27] under double sampling are generalized to stratified double sampling as described below:
D ^ 4 s t = h = 1 L W h M ^ y h exp M ^ x 1 h M ^ x 2 h M ^ x 1 h + M ^ x 2 h
and
D ^ 5 s t = h = 1 L W h M ^ y h exp M ^ x 2 h M ^ x 1 h M ^ x 1 h + M ^ x 2 h .
The corresponding formulas are valid under a first-order expansion and can be written as follows:
B i a s D ^ 4 s t 1 2 h = 1 L W h δ 3 h M y h 3 4 C M x h 2 C M y x h ,
B i a s D ^ 5 s t 1 2 h = 1 L W h δ 3 h M y h 3 4 C M x h 2 + C M y x h ,
M S E D ^ 4 s t h = 1 L W h 2 M y h 2 δ 1 h C M y h 2 + δ 3 h C M x h 2 1 4 T h
and
M S E D ^ 5 s t h = 1 L W h 2 M y h 2 δ 1 h C M y h 2 + δ 3 h C M x h 2 1 4 + T h ,
where
T h = ρ y x h C M y h C M x h .
This study is an extension of the work of [8,12], in which estimators were proposed under the two-phase sampling scenario. The stratified formulas can be expressed as follows:
D ^ 6 s t = h = 1 L W h d 2 h M ^ y h + d 3 h M ^ x 1 h M ^ x 2 h ,
D ^ 7 s t = h = 1 L W h d 4 h M ^ y h + d 5 h M ^ x 1 h M ^ x 2 h M ^ x 1 h M ^ x 2 h ,
D ^ 8 s t = h = 1 L W h d 6 h M ^ y h + d 7 h M ^ x 1 h M ^ x 2 h M ^ x 1 h M ^ x 2 h M ^ x 1 h + M ^ x 2 h .
The formulas given below represent the estimators in a stratified two-phase design and can be expressed as follows:
B i a s D ^ 6 s t m i n h = 1 L W h M y h δ 1 h C M y h 2 C M x h 2 δ 3 h C M y x h 2 C M x h 2 1 + δ 1 h C M y h 2 δ 3 h C M y x h 2 ,
B i a s D ^ 7 s t m i n h = 1 L W h [ M y h d 4 h ( o p t ) 1 + δ 3 h { d 4 h ( o p t ) M y h C M x h 2 C M y x h + d 5 h ( o p t ) M x h C M x h 2 } ] ,
B i a s D ^ 8 s t m i n h = 1 L W h M y h 1 + δ 3 h 4 C M x 2 1 + d 6 h ( o p t ) M x h + 8 ρ y x h M y h M x h ,
M S E D ^ 6 s t m i n h = 1 L W h 2 M y h 2 δ 1 h C M y h 2 C M x h 2 δ 3 h C M y x h 2 C M x h 2 1 + δ 1 h C M y h 2 δ 3 h C M y x h 2 ,
M S E D ^ 7 s t min h = 1 L W h 2 [ B i a s D ^ 7 s t min 2 + δ 1 h M y h 2 d 4 h ( opt ) 2 C M y h 2 + δ 3 h { C M x h 2 d 4 h ( opt ) M y h + d 5 h ( opt ) M x h 2 2 d 4 h ( opt ) M y h C M y x h d 4 h ( opt ) M y h + d 5 h ( opt ) M x h } ]
M S E M ^ 8 m i n h = 1 L W h 2 B i a s D ^ 8 s t m i n + δ 3 h 4 d 6 h ( o p t ) 2 M y h 2 C M x h 2 ,
where
d 2 h ( o p t ) = C M x h 2 C M x h 2 1 + δ 1 h C M y h 2 δ 3 h C M y x h 2 ,
d 3 h ( o p t ) = M y h C M y x h M x h C M x h 2 1 + δ 1 h C M y h 2 δ 3 h C M y x h 2 ,
d 4 h ( o p t ) = C M x h 2 C M x h 2 1 + δ 1 h C M y h 2 1 + δ 3 h C M x h 2 + δ 3 h C M y x h 2 1 + C M x h 2 ,
d 5 h ( o p t ) = M y h C M x h 2 δ 1 h C M y h 2 1 + C M y x h 1 + δ 3 h C M y x h C M x h 2 1 + δ 1 h C M y h 2 1 + δ 3 h C M x h 2 + δ 3 h C M y x h 2 1 + C M x h 2 ,
d 6 h ( o p t ) = 1 8 8 δ 2 h C M x h 2 1 + δ 1 h C M x h 2 1 ρ y x h 2 ,
d 7 h o p t = M y h M x h 1 2 + d 6 h o p t ρ y x h M y h M x h 1 .

4. Order-Statistic-Based Estimators

In this section, we suggest a new group of median estimators for stratified double sampling. This is based on the work of [15]. The method incorporates transformations of auxiliary information to improve the estimation of the population median. The specific form of the estimator is given below:
Z ^ s t = h = 1 L W h α 1 h M ^ y h M ^ x 1 h M ^ x 2 h p 1 h + α 2 h M ^ x 1 h M ^ x 2 h p 2 h exp ϕ 1 h ( M ^ x 1 h M ^ x 2 h ) ϕ 1 h ( M ^ x 1 h + M ^ x 2 h ) + 2 ϕ 2 h ,
where p 1 h , p 2 h { 0 , 1 , 1 } . Efficiency is improved by optimally selecting the constants ( α 1 h , α 2 h ) . Distinct subclasses of (23) are obtained when ϕ 1 h and ϕ 2 h are specified either as constants or as functions derived from robust statistics of X (see Table 1). The parameters ϕ 1 h and ϕ 2 h thus act as adjustment components.
  • where
    D M h = i = 1 9 D i h 9 ,
    Q D h = Q 3 h Q 1 h 2 ,
    T M h = Q 1 h + 2 Q 2 h + Q 3 h 4 ,
    Q R h = Q 3 h Q 1 h ,
    Q A h = Q 3 h + Q 1 h 2 ,
    and
    M R h = X h m i n + X h m a x 2 ,
    represent the decile mean, quartile deviation, quartile range, quartile average, and mid-range.
The proposed subclasses provide a flexible estimation framework capable of adapting to different population structures and characteristics of auxiliary information. Since different transformation forms respond differently to variations in skewness, heterogeneity, and correlation structures, the availability of multiple subclasses increases the practical applicability of the proposed methodology. In practical survey settings, the choice of a particular subclass is guided by the nature of the auxiliary information, the distributional properties of the population, and computational convenience. Overall, the proposed family of estimators offers a broad and adaptable framework for efficient median estimation under stratified two-phase sampling designs.
Reformulating Equation (23) through relative errors yields the following expression:
Z ^ s t = h = 1 L W h α 1 h M y h 1 + Δ 0 h 1 + Δ 1 h p 1 h 1 + Δ 2 h p 1 h + α 2 h 1 + Δ 1 h p 2 h 1 + Δ 2 h p 2 h × exp g h Δ 1 h Δ 2 h 2 1 + g h 2 Δ 1 h + Δ 2 h 1 ,
where
g h = ϕ 1 h M x h ϕ 1 h M x h + ϕ 2 h .
Through a first-order Taylor series expansion of Equation (24), the following approximation is obtained. All terms beyond the second order ( Δ i h > 2 ) are neglected because their contributions are negligible. Thus, we obtain:
Z ^ s t h = 1 L W h M y h h = 1 L W h M y h + α 1 h M ^ y h 1 + Δ 0 h + Δ 1 h p 1 h + g h 2 Δ 2 h p 1 h + g h 2 + Δ 1 h 2 p 1 h g h 2 g h 2 8 + p 1 h ( p 1 h 1 ) 2 + Δ 2 h 2 p 1 h g h 2 + 3 g h 2 8 + p 1 h ( p 1 h + 1 ) 2 Δ 0 h Δ 1 h p 1 h + g h 2 Δ 0 h Δ 2 h p 1 h + g h 2 Δ 1 h Δ 2 h p 1 h + g h 2 2 + α 2 h 1 + Δ 1 h p 2 h + g h 2 Δ 2 h p 2 h + g h 2 + Δ 1 h 2 p 2 h g h 2 g h 2 8 + p 2 h ( p 2 h 1 ) 2 + Δ 2 h 2 p 2 h g h 2 + 3 g h 2 8 + p 2 h ( p 2 h + 1 ) 2 Δ 1 h Δ 2 h p 2 h + g h 2 2 .
To evaluate the bias of Z ^ s t , Equation (25) is reformulated in expectation form, where each term Δ 0 h , Δ 1 h , Δ 0 2 , Δ 1 h 2 , Δ 2 h 2 , Δ 0 h Δ 1 h , Δ 0 h Δ 2 h , Δ 1 h Δ 2 h is replaced with its expected value. This yields:
B i a s Z ^ s t h = 1 L W h M y h α 1 h M y h Γ 3 h α 2 h Γ 5 h ,
Γ 3 h = 1 + δ 1 h C M x h 2 4 p 1 h p 1 h + 1 + g h + 3 g h 2 8 C M y x h 2 p 1 h + g h 2 + δ 2 h C M x h 2 4 p 1 h p 1 h + 1 g h 3 g h 2 8 C M y x h 2 p 1 h + g h 2 ,
and
Γ 5 h = 1 + δ 3 h C M x h 2 4 p 2 h p 2 h + g h + 1 + 3 g h 2 8 ,
where
δ 3 h = δ 1 h δ 2 h .
We calculate the mean squared error of Z ^ s t by squaring both sides of Equation (25), applying expectation, and neglecting higher-order terms ( Δ i h > 2 ) . Substituting each error term with its expected value gives:
M S E Z ^ s t h = 1 L W h 2 [ M y h 2 + α 1 h 2 M y h 2 Γ 1 h + α 2 h 2 Γ 2 h 2 α 1 h M y h 2 Γ 3 h 2 α 2 h M y h Γ 5 h + 2 α 1 h α 2 h M y h Γ 4 h ] ,
where
Γ 1 h = 1 + δ 1 h C M y h 2 + C M x h 2 p 1 h + g h 2 2 + p 1 h g h + 3 g h 2 4 + p 1 h p 1 h + 1 2 4 C M y x h p 1 h + g h 2 + δ 2 h C M x h 2 p 1 h + g h 2 2 + p 1 h g h g h 2 4 + p 1 h p 1 h 1 4 p 1 h + g h 2 + 4 C M y x h p 1 h + g h 2 ,
Γ 2 h = 1 + δ 1 h C M x h 2 p 2 h + g h 2 2 + p 2 h g h + 3 g h 2 4 + p 2 h p 2 h + 1 + δ 2 h C M x h 2 p 2 h + g h 2 2 + p 2 h g h g h 2 4 + p 2 h p 2 h 1 4 p 2 h + g h 2 2 ,
and
Γ 4 h = 1 + δ 1 h C M x h 2 p 1 h g h 2 + 3 g h 2 8 + p 1 h p 1 h + 1 2 + p 1 h + g h 2 p 2 h + g h 2 + p 2 h g h 2 + 3 g h 2 8 + p 2 h p 2 + 1 2 C M y x h p 1 h + p 2 h + g h + δ 2 h C M x h 2 p 1 h g h 2 g h 2 8 + p 1 h p 1 h 1 2 p 1 h + g h 2 p 2 h + g h 2 + p 2 h g h 2 g h 2 8 + p 2 h p 2 h 1 2 p 1 h + g h 2 2 p 2 h + g h 2 2 + C M y x h ( p 1 h + p 2 h + g h ) .
The optimal values of α 1 h and α 2 h , which minimize Equation (27) and ensure the best performance, are given by:
α 1 h o p t = Γ 2 h Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2
and
α 2 h o p t = M y h Γ 1 h Γ 5 h Γ 3 h Γ 4 h Γ 1 h Γ 2 h Γ 4 h 2 .
The optimal constants are obtained under the regularity condition
Γ 1 h Γ 2 h Γ 4 h 2 0
and the denominator is assumed to remain sufficiently bounded away from zero to ensure numerical stability of the proposed estimators. In the simulation and empirical investigations considered in this study, this condition was satisfied for all populations and parameter settings examined.
Substituting the optimal values of α 1 h and α 2 h into Equations (26) and (27) yields the optimum expressions for the bias and mean squared error of Z ^ s t .
B i a s Z ^ s t m i n h = 1 L W h M y h 1 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2
and
M S E Z ^ s t m i n h = 1 L W h 2 M y h 2 1 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2 .
The theoretical derivations presented in this study are based on first-order approximation techniques commonly used in survey sampling theory. Although such approximations may become less accurate under extremely skewed or heavy-tailed populations, they continue to provide analytically tractable and practically useful measures of estimator performance. To support the theoretical findings, extensive simulation experiments and empirical analyses were additionally conducted under different distributional settings, where the proposed estimators demonstrated stable and consistent behavior.

5. Comparative Evaluation of Estimator Properties

The MSE of the suggested estimator is compared to the comparable results of eight benchmark estimators to assess efficiency, specifically D ^ 1 s t , D ^ 2 s t , D ^ 3 s t , D ^ 4 s t , D ^ 5 s t , D ^ 6 s t , D ^ 7 s t , and D ^ 8 s t .
(i)
The inequality is obtained by combining the optimality expression in (29) with the variance relation given in (2):
V a r ( D ^ 1 s t ) > M S E Z ^ s t m i n     if ,
h = 1 L W h 2 M y h 2 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2 > h = 1 L W h 2 M y h 2 1 δ 1 h C M y h 2 .
(ii)
This condition emerges from the comparison between (29) and the error formulation in (5):
M S E ( Z ^ 2 s t ) > M S E Z ^ s t m i n     if ,
h = 1 L W h 2 M y h 2 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2 > h = 1 L W h 2 M y h 2 1 δ 1 h C M y h 2 δ 3 h C M x h 2 2 C M y x h .
(iii)
By jointly considering (29) and the corresponding MSE representation in (7), the following inequality is established:
M S E ( D ^ 3 s t ) m i n > M S E Z ^ s t m i n     if ,
h = 1 L W h 2 M y h 2 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2 > h = 1 L W h 2 M y h 2 1 C M y h 2 δ 1 h δ 3 h ρ y x 2 .
(iv)
The relationship stated below is a direct consequence of (29) along with Equation (12):
M S E ( D ^ 4 s t ) > M S E Z ^ s t m i n     if ,
h = 1 L W h 2 M y h 2 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2 > h = 1 L W h 2 M y h 2 1 δ 1 h C M y h 2 δ 3 h C M x h 2 1 4 T h .
(v)
From the combination of (29) and the expression in (13), the subsequent condition is deduced:
M S E ( D ^ 5 s t ) > M S E Z ^ s t m i n     if ,
h = 1 L W h 2 M y h 2 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2 > h = 1 L W h 2 M y h 2 1 δ 1 h C M y h 2 δ 3 h C M x h 2 1 4 + T h .
(vi)
On the basis of (29) together with the formulation in (20), the following inequality holds:
M S E ( D ^ 6 s t ) min > M S E Z ^ s t m i n     if ,
h = 1 L W h 2 M y h 2 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2 > h = 1 L W h 2 M y h 2 1 δ 1 h C M y h 2 C M x h 2 δ 3 h C M y x h 2 C M x h 2 1 + δ 1 h C M y h 2 δ 3 h C M y x h 2 .
(vii)
The relationship stated below is a direct consequence of (29) along with Equation (21):
M S E ( D ^ 7 s t ) min > M S E Z ^ s t m i n     if ,
h = 1 L W h 2 M y h 2 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2 > h = 1 L W h 2 M y h 2 1 M S E D ^ 7 s t m i n M y h 2 .
(viii)
The relationship stated below is a direct consequence of (29) along with Equation (22):
M S E ( D ^ 8 s t ) min > M S E Z ^ s t m i n     if ,
h = 1 L W h 2 M y h 2 Γ 1 h Γ 5 h 2 + Γ 2 h Γ 3 h 2 2 Γ 3 h Γ 4 h Γ 5 h Γ 1 h Γ 2 h Γ 4 h 2 > h = 1 L W h 2 M y h 2 1 B i a s D ^ 8 s t m i n M y h 2 + δ 3 h 4 d 6 h ( o p t ) 2 C M x h 2 .

6. Simulation Design and Numerical Assessment

This section evaluates the efficiency of the proposed class of estimators relative to existing methods. The study is based on five simulated populations generated from suitably chosen positively skewed distributions, along with three real datasets used to assess their practical performance.

6.1. Simulation Study

In median-based investigations, the choice of distributions must reflect both the data structure and statistical properties. The median, being less sensitive to skewness, extreme observations, and non-normality, remains a reliable measure. An artificial finite population of size N = 1600 is generated as the basis for the subsequent stratified double sampling procedure. The auxiliary variable X is drawn from several distributions representing varied real-world conditions, while the study variable Y is generated to exhibit a specified correlation structure with X. Five distinct populations are considered to capture a wide range of distributional shapes, tail behaviors, and association strengths between X and Y.
For all simulated populations, the random error term e was generated independently from a normal distribution with mean zero and population-specific variance, i.e.,
e N ( 0 , σ e , i 2 ) , i = 1 , 2 , , 5 ,
where the variance of the error component was selected to maintain the desired correlation structure between the study and auxiliary variables under different population settings.
  • Population 1 (Exponential Distribution A). The auxiliary variable follows X Exponential ( λ = 1 / 0.40 ) , showing strong right skewness. The relationship is defined by Y = 0.50 X + e , where e N ( 0 , σ e , 1 2 ) . This construction produces an approximate correlation of ρ y x = 0.50 between X and Y.
  • Population 2 (Exponential Distribution B). The variable X Exponential ( λ = 1 / 0.75 ) , showing a more moderate degree of skewness. The corresponding study variable is Y = 0.35 X + e , where e N ( 0 , σ e , 2 2 ) resulting in a correlation of ρ y x 0.35 .
  • Population 3 (Gamma Distribution). The Gamma distribution with parameters a 1 = 10 and a 2 = 12 , showing moderate right skewness. The Y is generated using the relationship Y = 0.70 X + e , where e N ( 0 , σ e , 3 2 ) , producing an approximate correlation of ρ y x 0.50 .
  • Population 4 (Cauchy Distribution). The auxiliary variable follows a heavy-tailed distribution, X Cauchy ( t 0 = 10 , t 1 = 6 ) . The study variable is modeled as Y = 0.30 X + e , where e N ( 0 , σ e , 4 2 ) , indicating a negative linear association of ρ y x = 0.30 .
  • Population 5 (Lognormal Distribution). The auxiliary variable X follows a lognormal distribution with parameters μ 1 = 8.2 and μ 2 = 4.0 , indicating mild skewness. The study variable Y is defined as Y = 0.60 X + e , where e N ( 0 , σ e , 5 2 ) . This specification produces an approximate correlation of ρ y x = 0.60 between X and Y.
The selection of distributions was made to ensure coverage of distributional features frequently encountered in practice. These distributions together represent symmetric and asymmetric patterns, phenomena with light and heavy tails, and extreme value situations. This wide range provides a thorough basis for simulation research, allowing for a thorough assessment of estimator stability and behavior in various population scenarios. The PREs of the new group and existing estimators are examined under different distributional settings and correlation patterns, in line with the approaches of [15]. The following experiments steps were conducted in R (v. 4.4.0) to evaluate efficiency and robustness.
1.
Population generation under stratification. Partition the finite data of size N into L nonoverlapping strata with sizes N h and weights W h = N h / N ( h = 1 L N h = 1600 , h W h = 1 ). In each stratum h, draw the auxiliary variable X h from a specified distribution (with stratum-specific parameters), and generate the study variable from
Y h = ρ y x h X h + e h , e h N ( 0 , σ e , h 2 ) ,
allowing the X and Y association and noise to vary by stratum.
2.
Phase-I (first-phase sampling on X only). Within each stratum h, select an SRSWOR of size m h with h = 1 L m h = m . Record only the auxiliary measurements X.
3.
Phase-II (second-phase subsampling on X and Y). From the phase-I sample in each stratum h, select a subsample of size n h < m h (SRSWOR; h = 1 L n h = n ). For these units, observe both X and Y.
4.
Estimator construction (median-based, stratified). Combine phase-I information on X with the paired ( X , Y ) from phase-II to compute median-based estimators under stratification (e.g., simple, ratio, difference, exponential, Rao–Gupta, and the proposed transformation class). The finite-population target is defined at the stratum level and aggregated using stratum weights, e.g., the stratified median target M y = h = 1 L W h M y h .
5.
Monte Carlo evaluation. Repeat the two-phase stratified design (Steps 1–4) for a large number of replications (e.g., 25,000) to study sampling behavior.
6.
Performance summaries. The sampling process is repeated 25,000 times in order to achieve uniform results. On the estimator and in combination with m and n, MSE is calculated as the average of all replicas. The percent relative efficiency (PRE) and empirical MSE of each estimator are then computed below:
M S E ( R ^ E ) min = v = 1 25,000 R ^ E v M y 2 25,000
and
PRE = Var D ^ 1 s t MSE R ^ E min × 100 ,
where R ^ E = D ^ 1 s t , D ^ 2 s t , D ^ 3 s t , D ^ 4 s t , D ^ 5 s t , D ^ 6 s t , D ^ 7 s t D ^ 8 s t , Z ^ 1 s t , Z ^ 2 s t , , Z ^ 8 s t . The results obtained from the simulation are given in Table 2.

6.2. Empirical Performance Evaluation

An empirical analysis is carried out using three real population data sets described in this subsection. These data sets were selected from different practical fields, including employment and industrial activity, educational statistics, and household economic data, to evaluate the performance and applicability of the proposed estimators under diverse real-life situations. The populations provide different correlation structures and variability patterns between the study and auxiliary variables, thereby offering a suitable framework for comparative efficiency analysis. The corresponding summary statistics for the considered populations are presented in Table 3, Table 4 and Table 5.
Population 1. 
Two time points are used to represent employment and industrial activity in this data set from the 2013 edition of Punjab Development Statistics (p. 226) [28].  Y 1  represents employment in each district in 2010, while  X 1  represents the number of registered factories in each district. The total district-level employment for 2012 is represented by  Y 2 , while the number of factories registered in the same year is indicated by  X 2 . When combined, these factors offer a standardized framework for relating employment trends across districts to industrial establishments. The following website provides a download link: http://repository.lahoreschool.edu.pk/xmlui/bitstream/handle/123456789/13023/2013.pdf?sequence=1&isAllowed=y (accessed on 13 May 2026). The summary statistics are presented in Table 3.
Population 2. 
The data-set used for empirical evaluation is from Punjab Development Statistics 2014 (p. 135) [29], which offers statistics on government schools during the 2012–2013 school year. Primary and middle school student enrollment statistics by gender are part of the given information on the population being studied. In this context,  X 1  denotes the number of government primary schools (boys and girls combined), while  Y 1  indicates the total number of registered students. In parallel,  X 2  indicates the number of government middle schools for both sexes, while  Y 2  records the total number of middle-level students enrolled. The data can be downloaded using the following URL: http://repository.lahoreschool.edu.pk/xmlui/bitstream/handle/123456789/13900/Dev-2014.pdf?sequence=1&isAllowed=y (accessed on 13 May 2025). The summary measures are given in Table 4.
Population 3. 
Data on household income and food expenditures are combined in the data-set from [1]. Whereas X is defined as weekly income, signifying economic position, Y is defined as household food spending, affected by employment-related factors. The summary statistics are presented in Table 5.

6.3. Discussion and Interpretation of Findings

When comparing transformation-based estimators to traditional approaches, the suggested estimator class clearly outperforms traditional methods on both simulated and real-world datasets. Figure 1 and Figure 2, as well as Table 2 and Table 6, provide consistent validation that the new group of estimators achieves significantly enhanced efficiency values, demonstrating their robustness in various statistical applications.
  • Performance on simulated populations: All transformation-based estimators perform better than the traditional ratio, regression, exponential, and difference-type estimators across the five simulated populations, as shown in Table 2 and Figure 1. The efficiency values show a constant improvement in efficiency, as they increase gradually from Z 1 s t to Z 8 t h . The largest PREs are recorded by estimators like Z 6 s t , Z 7 s t , and Z 8 s t ; these estimators frequently cross 300 percent, whilst conventional estimators stay below 180 percent. This outcome highlights the significant advantages of using phase-wise stratification and auxiliary transformations, which improve stability even in distributions that are heavily tailed or strongly skewed.
  • The PRE patterns observed for Populations 3 and 5 are relatively similar because both populations exhibit moderate positive correlation structures and comparatively stable distributional behavior despite originating from different distributions. In both cases, the transformation-based estimators effectively utilize auxiliary information under positively skewed population settings, resulting in closely related efficiency trends. This similarity further indicates that the proposed estimator family maintains stable performance across different distributional forms when the underlying correlation structure between the study and auxiliary variables is reasonably comparable.
  • Validation on real populations: The same efficiency trends are confirmed to apply to empirical data in Table 6 and Figure 2. In all three real-life populations, the transformation-based estimators continue to produce better PRE values; the greatest overall performance is given by Z 8 s t . The suggested approaches show their dependability outside of controlled simulation conditions in each dataset by showing resistance to distributional asymmetry and moderate correlations between the variables Y and X.
  • It is important to note that the PRE values obtained from the simulation study are comparatively larger than those observed for the real population datasets. This behavior is expected because the simulated populations were generated under controlled distributional environments specifically designed to evaluate estimator performance under skewed and heavy-tailed conditions. Under such settings, the transformation structure can more effectively utilize auxiliary information, leading to larger relative efficiency improvements.
  • In contrast, real population data naturally contain additional variability arising from heterogeneous population structures, irregular correlations, measurement fluctuations, and practical survey complexities. These factors generally reduce the magnitude of achievable efficiency improvements. Nevertheless, the proposed estimators continue to demonstrate clear superiority over the existing estimators in both simulated and empirical analyses, indicating that the observed efficiency improvements are not simulation-driven but reflect the overall robustness and practical applicability of the proposed methodology.
  • The derivation of the proposed estimators is based on first-order Taylor series approximation techniques commonly adopted in survey sampling theory. These approximations are developed under the assumption that the sampling errors are sufficiently small so that higher-order terms may be neglected for analytical tractability. It is acknowledged that, under extremely heavy-tailed distributions such as the Cauchy distribution, higher-order moments may not exist and the approximation accuracy may therefore be affected. However, the inclusion of the Cauchy population in the simulation study was intended to examine the practical robustness and stability of the proposed estimators under challenging distributional settings. The simulation and empirical findings indicate that the proposed estimators continue to exhibit stable and improved performance even in such heavy-tailed situations.

7. Conclusions and Future Perspectives

The study develops a transformation-based family of estimators for estimating the population median under a stratified two-phase sampling design. Analytical expressions for optimal bias and mean squared error are derived and supported by an extensive simulation study as well as real population data. The results clearly indicate that the proposed estimators consistently outperform conventional ratio, regression, difference, and exponential estimators in terms of percent relative efficiency. The improvements in efficiency are especially apparent for skewed, heavy-tailed populations or those influenced by outliers. These findings confirm that using transformations of auxiliary information under a two-phase design enhances both the stability and precision of median estimation while reducing the overall cost of data collection.
The transformation-based approach also demonstrates adaptability when the second-phase sample size is considerably smaller than that of the first phase. This property makes the estimators particularly valuable for large-scale surveys or field studies where complete auxiliary information cannot be obtained due to budgetary or logistical limitations. The ability to achieve higher precision without additional data collection enhances their appeal in socioeconomic, agricultural, and health statistics applications, where cost efficiency and robustness are equally important.
Future research may expand this framework in several meaningful directions. One promising line of inquiry is the extension of the proposed estimators to multistage and spatial sampling designs, which frequently occur in environmental and regional studies. Another important development would be to incorporate mechanisms for non-response and missing data within the same transformation structure to preserve estimator efficiency under incomplete observations. Additionally, integrating multiple auxiliary variables and exploring adaptive or nonlinear transformation functions could further enhance estimator flexibility. The inclusion of computational tools or open-source implementations would also support broader use of these methods in practice. Overall, the proposed transformation-based estimators provide a foundation for continued research on robust, cost-effective, and adaptable estimation techniques in modern survey methodology.
The simulation study is conducted under linear relationships between the study and auxiliary variables, which is a commonly adopted framework in survey sampling investigations. While the proposed estimators perform efficiently under different distributional settings, future research may further examine their behavior under nonlinear population structures and more complex dependency patterns between the study and auxiliary variables. This work mainly focuses on point estimation and efficiency improvement; however, the development of standard error estimation procedures and confidence interval methods for the proposed family of estimators may further improve its practical applicability. Therefore, future research may focus on inferential procedures and interval estimation under stratified two-phase sampling schemes.

Author Contributions

Conceptualization, U.D. and H.M.A.; Methodology, U.D. and H.M.A.; Software, U.D., H.M.A. and F.A.A.; Validation, U.D., H.M.A. and F.A.A.; Formal analysis, U.D., H.M.A. and F.A.A.; Investigation, U.D., H.M.A. and F.A.A.; Resources, U.D., H.M.A. and F.A.A.; Data curation, U.D., H.M.A. and F.A.A.; Writing—original draft, U.D. and H.M.A.; Writing—review & editing, U.D., H.M.A. and F.A.A.; Visualization, U.D., H.M.A. and F.A.A.; Supervision, F.A.A.; Project administration, U.D., H.M.A. and F.A.A.; Funding acquisition, F.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R515), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank and extend their appreciation to the funder of this work. This work was supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2026R515), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
  2. Särndal, C.E. Sample survey theory vs. general statistical theory: Estimation of the population mean. Int. Stat. Rev./Rev. Int. Stat. 1972, 40, 1–12. [Google Scholar] [CrossRef]
  3. Stigler, S.M. Linear functions of order statistics. Ann. Math. Stat. 1969, 40, 770–788. [Google Scholar] [CrossRef]
  4. Gross, S. Median estimation in sample surveys. In Proceedings of the Section on Survey Research Methods; American Statistical Association Ithaca: Alexandria, VA, USA, 1980. [Google Scholar]
  5. Sedransk, J.; Meyer, J. Confidence intervals for the quantiles of a finite population: Simple random and stratified simple random sampling. J. R. Stat. Soc. Ser. B (Methodol.) 1978, 40, 239–252. [Google Scholar] [CrossRef]
  6. Philip, S.; Sedransk, J. Lower bounds for confidence coefficients for confidence intervals for finite population quantiles. Commun. Stat.-Theory Methods 1983, 12, 1329–1344. [Google Scholar] [CrossRef]
  7. Kuk, Y.C.A.; Mak, T.K. Median estimation in the presence of auxiliary information. J. R. Stat. Soc. Ser. B 1989, 51, 261–269. [Google Scholar] [CrossRef]
  8. Rao, T.J. On certail methods of improving ration and regression estimators. Commun. Stat.-Theory Methods 1991, 20, 3325–3340. [Google Scholar] [CrossRef]
  9. Singh, S.; Joarder, A.H.; Tracy, D.S. Median estimation using double sampling. Aust. N. Z. J. Stat. 2001, 43, 33–46. [Google Scholar] [CrossRef]
  10. Khoshnevisan, M.; Singh, H.P.; Singh, S.; Smarandache, F. A General Class of Estimators of Population Median Using Two Auxiliary Variables in Double Sampling; Virginia Polytechnic Institute and State University: Blacksburg, VA, USA, 2002. [Google Scholar]
  11. Singh, S. Advanced Sampling Theory with Applications: How Michael Selected Amy; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; Volume 2. [Google Scholar]
  12. Gupta, S.; Shabbir, J.; Ahmad, S. Estimation of median in two-phase sampling using two auxiliary variables. Commun. Stat.-Theory Methods 2008, 37, 1815–1822. [Google Scholar] [CrossRef]
  13. Aladag, S.; Cingi, H. Improvement in estimating the population median in simple random sampling and stratified random sampling using auxiliary information. Commun. Stat.-Theory Methods 2015, 44, 1013–1032. [Google Scholar] [CrossRef]
  14. Solanki, R.S.; Singh, H.P. Some classes of estimators for median estimation in survey sampling. Commun. Stat.-Theory Methods 2015, 44, 1450–1465. [Google Scholar] [CrossRef]
  15. Daraz, U.; Almulhim, F.A.; Alomair, M.A.; Alomair, A.M. Population median estimation using auxiliary variables: A simulation study with real data across sample sizes and parameters. Mathematics 2025, 13, 1660. [Google Scholar] [CrossRef]
  16. Sharma, P.; Lata, A.; Yadav, S.K.; Noor-ul-Amin, M. Family of estimators for estimating population median using auxiliary information in survey sampling. J. Reliab. Stat. Stud. 2025, 18, 343–370. [Google Scholar] [CrossRef]
  17. Baig, A.; Masood, S.; Ahmed Tarray, T. Improved class of difference-type estimators for population median in survey sampling. Commun. Stat.-Theory Methods 2019, 49, 5778–5793. [Google Scholar] [CrossRef]
  18. Sharma, P.; Singh, R. Generalized class of estimators for population median using auxiliary information. Hacet. J. Math. Stat. 2015, 44, 443–453. [Google Scholar] [CrossRef]
  19. Masood, S.; Ibrar, B.; Shabbir, J.; Movaheedil, Z. Estimating neutrosophic finite median employing robust measures of the auxiliary variable. Sci. Rep. 2024, 14, 10255. [Google Scholar] [CrossRef] [PubMed]
  20. Sharma, P.; Pusa, N.; Kumari, M.; Singh, P. Balancing accuracy and cost: A new estimator for stratified random sampling. Commun. Stat.-Simul. Comput. 2025, 1–17. [Google Scholar] [CrossRef]
  21. Shabbir, J.; Gupta, S. A generalized class of difference type estimators for population median in survey sampling. Hacet. J. Math. Stat. 2017, 46, 1015–1028. [Google Scholar] [CrossRef]
  22. Irfan, M.; Maria, J.; Shongwe, S.C.; Zohaib, M.; Bhatti, S.H. Estimation of population median under robust measures of an auxiliary variable. Math. Probl. Eng. 2021, 2021, 4839077. [Google Scholar] [CrossRef]
  23. Shabbir, J.; Gupta, S.; Narjis, G. On improved class of difference type estimators for population median in survey sampling. Commun. Stat.-Theory Methods 2022, 51, 3334–3354. [Google Scholar] [CrossRef]
  24. Subzar, M.; Lone, S.A.; Ekpenyong, E.J.; Salam, A.; Aslam, M.; Raja, T.A.; Almutlak, S.A. Efficient class of ratio cum median estimators for estimating the population median. PLoS ONE 2023, 18, e0274690. [Google Scholar] [CrossRef] [PubMed]
  25. Iseh, M.J. Model formulation on efficiency for median estimation under a fixed cost in survey sampling. Model Assist. Stat. Appl. 2023, 18, 373–385. [Google Scholar] [CrossRef]
  26. Alghamdi, A.S.; Almulhim, F.A. Stratified median estimation using auxiliary transformations: A robust and efficient approach in asymmetric populations. Axioms 2025, 17, 1127. [Google Scholar] [CrossRef]
  27. Singh, H.P.; Vishwakarma, G.K. Modified exponential ratio and product estimators for finite population mean in double sampling. Austrian J. Stat. 2007, 36, 217–225. [Google Scholar] [CrossRef]
  28. Bureau of Statistics. Punjab Development Statistics Government of the Punjab, Lahore, Pakistan; Bureau of Statistics: Islamabad, Pakistan, 2013. [Google Scholar]
  29. Bureau of Statistics. Punjab Development Statistics Government of the Punjab, Lahore, Pakistan; Bureau of Statistics: Islamabad, Pakistan, 2014. [Google Scholar]
Figure 1. The graph shows how well the new class of estimators works compared to the previous ones by showing their percent relative efficiency on simulated data.
Figure 1. The graph shows how well the new class of estimators works compared to the previous ones by showing their percent relative efficiency on simulated data.
Symmetry 18 00933 g001
Figure 2. The graph compares the efficiency of a new group of estimators and previous estimators through their PRE on real data.
Figure 2. The graph compares the efficiency of a new group of estimators and previous estimators through their PRE on real data.
Symmetry 18 00933 g002
Table 1. Subfamilies of the proposed estimator corresponding to distinct parameter values.
Table 1. Subfamilies of the proposed estimator corresponding to distinct parameter values.
Different Estimators of Z ^ st ϕ 1 h ϕ 2 h p 1 h p 2 h
Z ^ 1 = h = 1 L W h α 1 h M ^ y h M ^ x 1 h M ^ x 2 h + α 2 h M ^ x 2 h M ^ x 1 h exp ϕ 1 h ( M ^ x 1 h M ^ x 2 h ) ϕ 1 h ( M ^ x 1 h + M ^ x 2 h ) + 2 ϕ 2 h Q D h T M h 1 1
Z ^ 2 = h = 1 L W h α 1 h M ^ y h M ^ x 1 h M ^ x 2 h + α 2 h exp ϕ 1 h ( M ^ x 1 h M ^ x 2 h ) ϕ 1 h ( M ^ x 1 h + M ^ x 2 h ) + 2 ϕ 2 h M R h T M h 10
Z ^ 3 = h = 1 L W h α 1 h M ^ y h M ^ x 2 h M ^ x 1 h + α 2 h M ^ x 2 h M ^ x 1 h exp ϕ 1 h ( M ^ x 1 h M ^ x 2 h ) ϕ 1 h ( M ^ x 1 h + M ^ x 2 h ) + 2 ϕ 2 h Q A h Q D h 1 1
Z ^ 4 = h = 1 L W h α 1 h M ^ y h + α 2 h M ^ x 1 h M ^ x 2 h exp ϕ 1 h ( M ^ x 1 h M ^ x 2 h ) ϕ 1 h ( M ^ x 1 h + M ^ x 2 h ) + 2 ϕ 2 h Q D h Q A h 01
Z ^ 5 = h = 1 L W h α 1 h M ^ y h + α 2 h M ^ x 2 h M ^ x 1 h exp ϕ 1 h ( M ^ x 1 h M ^ x 2 h ) ϕ 1 ( M ^ x 1 h + M ^ x 2 h ) + 2 ϕ 2 h T M h M R h 0 1
Z ^ 6 = h = 1 L W h α 1 h M y h M ^ x 1 h M ^ x 2 h + α 2 h M ^ x 1 h M ^ x 2 h exp ϕ 1 h ( M ^ x 1 h M ^ x 2 h ) ϕ 1 h ( M ^ x 1 h + M ^ x 2 h ) + 2 ϕ 2 h D M h Q R h 11
Z ^ 7 = h = 1 L W h α 1 h M ^ y h M ^ x 2 h M ^ x 1 h + α 2 h M ^ x 1 h M ^ x 2 h exp ϕ 1 h ( M ^ x 1 h M ^ x 2 h ) ϕ 1 h ( M ^ x 1 h + M ^ x 2 h ) + 2 ϕ 2 h M R h Q A h 1 1
Z ^ 8 = h = 1 L W h α 1 h M ^ y h M ^ x 2 h M ^ x 1 h + α 2 h exp ϕ 1 h ( M ^ x 1 h M ^ x 2 h ) ϕ 1 h ( M ^ x 1 h + M ^ x 2 h ) + 2 ϕ 2 h Q R h D M h 1 0
Table 2. Simulation analysis results based on PRE values.
Table 2. Simulation analysis results based on PRE values.
EstimatorPRE-1PRE-2PRE-3PRE-4PRE-5
D ^ 1 s t 100.00100.00100.00100.00100.00
D ^ 2 s t 146.72136.45128.93131.52134.08
D ^ 3 s t 158.46150.21142.80147.62152.09
D ^ 4 s t 164.18157.94150.31154.08159.73
D ^ 5 s t 110.92111.87114.76118.32114.57
D ^ 6 s t 172.65165.32159.29162.46168.22
D ^ 7 s t 177.84169.04163.44166.87171.95
D ^ 8 s t 181.27172.39166.92170.42175.11
Z ^ 1 s t 262.83278.65286.17295.54281.63
Z ^ 2 s t 273.19289.72297.34307.29292.77
Z ^ 3 s t 281.48298.46306.88316.58302.11
Z ^ 4 s t 288.95306.11314.62324.97310.38
Z ^ 5 s t 294.63312.72321.39332.81318.12
Z ^ 6 s t 301.52318.93328.22340.95325.64
Z ^ 7 s t 307.74325.48334.76347.92333.71
Z ^ 8 s t 313.92331.87341.52354.86340.24
Table 3. Summary statistics for Population-1 under stratified structure.
Table 3. Summary statistics for Population-1 under stratified structure.
ParameterStratum-1Stratum-2
N h 3636
m h 1818
n h 99
X h min 2424
X h max 19862055
M x h 168.500171.500
M y h 10,484.50010,494.500
σ h 438.519452.713
f x h ( M x h ) 0.0024636660.002315051
f y h ( M y h ) 0.000040337360.00004086913
ρ y h x h 0.9120.519
T M h 193.438195.750
D M h 432.500431.500
S k h 2.1062.345
Q A h 218.375220
Q D h 127.125132.500
M A D h 193.43899
Q R h 252.25265
M R h 10051039.500
Table 4. Summary statistics for Population-2 under stratified structure.
Table 4. Summary statistics for Population-2 under stratified structure.
ParameterStratum-1Stratum-2
N h 3636
m h 1818
n h 99
X h min 38884
X h max 1534478
M x h 1016.500206
M y h 116,23049,661
σ x h 402.609424.937
f x h ( M x h ) 0.0009519930.004094403
f y h ( M y h ) 0.000008350.0000143374
ρ y h x h 0.7840.875
T M h 891.188210.688
D M h 982.650231
S k h 1.0081.023
Q A h 891.875215.375
Q D h 982.65062.875
M A D h 289267
Q R h 378.250125.750
M R h 961281
Table 5. Summary statistics for Population 3 under stratified structure.
Table 5. Summary statistics for Population 3 under stratified structure.
ParameterStratum-1Stratum-2
N h 1818
m h 99
n h 55
X h min 2815
X h max 9575
M x h 60.66444.205
M y h 52.72846.339
σ h 10.13012.750
f x h ( M x h ) 0.00047290.0003791
f y h ( M y h ) 0.00002190.0000481
ρ y h x h 0.3370.496
T M h 61.62345.287
D M h 61.60045.242
S k h 0.3000.500
Q A h 54.28748.940
Q D h 6.8408.614
M A D h 8.10010.200
Q R h 13.72117.201
M R h 61.50045
Table 6. Comparison of the PREs values based on actual population data.
Table 6. Comparison of the PREs values based on actual population data.
EstimatorPRE-1PRE-2PRE-3
D ^ 1 s t 100.00100.00100.00
D ^ 2 s t 148.51157.36116.52
D ^ 3 s t 160.24169.78122.08
D ^ 4 s t 167.91176.53124.05
D ^ 5 s t 163.65172.35123.11
D ^ 6 s t 169.13180.24124.01
D ^ 7 s t 170.29180.83125.49
D ^ 8 s t 170.78181.37125.63
Z ^ 1 s t 184.56195.54155.16
Z ^ 2 s t 185.62196.90155.59
Z ^ 3 s t 186.74197.68157.65
Z ^ 4 s t 185.95199.03156.85
Z ^ 5 s t 187.21198.14153.18
Z ^ 6 s t 188.27197.77153.83
Z ^ 7 s t 188.13198.15154.77
Z ^ 8 s t 189.67197.92157.64
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Daraz, U.; M. Aljohani, H.; Almulhim, F.A. Optimal Transformation-Based Median Estimation Under Stratified Double Sampling with Limited Auxiliary Information. Symmetry 2026, 18, 933. https://doi.org/10.3390/sym18060933

AMA Style

Daraz U, M. Aljohani H, Almulhim FA. Optimal Transformation-Based Median Estimation Under Stratified Double Sampling with Limited Auxiliary Information. Symmetry. 2026; 18(6):933. https://doi.org/10.3390/sym18060933

Chicago/Turabian Style

Daraz, Umer, Hassan M. Aljohani, and Fatimah A. Almulhim. 2026. "Optimal Transformation-Based Median Estimation Under Stratified Double Sampling with Limited Auxiliary Information" Symmetry 18, no. 6: 933. https://doi.org/10.3390/sym18060933

APA Style

Daraz, U., M. Aljohani, H., & Almulhim, F. A. (2026). Optimal Transformation-Based Median Estimation Under Stratified Double Sampling with Limited Auxiliary Information. Symmetry, 18(6), 933. https://doi.org/10.3390/sym18060933

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop