Next Article in Journal
Orthogonal Polynomials on Radial Rays in the Complex Plane: Construction, Properties and Applications
Previous Article in Journal
Effects of Predation-Induced Emigration on a Landscape Ecological Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Twofold Auxiliary Information Under Two-Phase Sampling: An Improved Family of Double-Transformed Variance Estimators

1
School of Mathematics and Statistics, Central South University, Changsha 410017, China
2
School of Metallurgy and Environment, Central South University, Changsha 410083, China
3
Department of Statistics and Operations Research, Faculty of Science, King Saud University, P.O. Box 2455, Riyadh 11451, Saudi Arabia
*
Author to whom correspondence should be addressed.
Axioms 2025, 14(1), 64; https://doi.org/10.3390/axioms14010064
Submission received: 16 December 2024 / Revised: 9 January 2025 / Accepted: 13 January 2025 / Published: 16 January 2025

Abstract

:
Outlier values and rankings are important for emphasizing data distribution variability, which improves the accuracy and effectiveness of variance estimations. To enhance the estimation of finite population variance in a two-phase sampling framework, this study presents an improved class of double exponential-type estimators by utilizing the outlier values and ranks of an auxiliary variable. A theoretical analysis is conducted to derive the biases and mean squared errors (MSEs) of these estimators using first-order approximations. A comprehensive simulation study is then performed to analyze the performance of the proposed estimators. The results clearly show that the new estimators provide more precise estimates, achieving a higher percentage relative efficiency (PRE) across all simulated scenarios. Furthermore, three data sets are analyzed to further confirm the efficiency of the proposed estimators as compared to other existing estimators. These results emphasize the potential of the proposed class of estimators to optimize variance estimation techniques, making it a more cost-effective and accurate choice for researchers using two-phase sampling in a variety of domains.

1. Introduction

In sampling theory, it is usual to employ auxiliary variables along with the study variable to enhance design, as well as to increase the efficiency of estimators by taking advantage of the connection between the auxiliary and study variables. In certain cases, data on many auxiliary variables are readily available. For example, in order to examine the public health and welfare situation in a whole country or state, it may be necessary to know the number of beds in various hospitals, the number of physicians and other staff members, and the total amount of money available for medical treatment. When this type of information is lacking, two-phase sampling is preferred since it requires one to gather a large preliminary sample from which the auxiliary variable is calculated. This strategy is frequently employed when it is more cost-effective or efficient to undertake an initial preparation phase prior to the main phase. In two-phase sampling, the first and second samples are referred to as the first and second phase samples, respectively. A cost-effective technique for sample surveys is two-phase sampling, especially in situations when supplementary data are not available. Two-phase sampling was initially suggested by [1], with further investigations conducted by [2], who proposed a general ratio-type estimator, and this was later expanded upon by [3,4,5]. For more details about two-phase sampling, readers can look at [6,7].
A major challenge in measuring variability is the unpredictable nature of the results, necessitating the precise control and careful selection of auxiliary data for improving the accuracy of the estimators. The concept of population variance has been extensively studied and improved over time, with various estimators introduced by various researchers. The initial discussion on auxiliary information in population variance calculations was initiated by [4]. The use of ratio and product type exponential estimators has also been suggested by [8] for improving estimation. By using transformation, Ref. [9] proposed an improved variance estimator. As discussed in various studies [10,11,12,13,14,15], researchers have introduced different variance type estimators to provide better estimates for population variance.
The presence of outlier (extreme) values can greatly affect the accuracy of estimations and lead to significantly underestimated outcomes in populations. The work by [16] presented two estimators that employed a transformation of the auxiliary variable known largest and smallest observations. Nevertheless, this method was not investigated further until the study by [17], who used the idea of the largest and smallest values for various estimators. By including extreme values into various transformations, the estimation of the finite population mean by [18] was significantly improved. To estimate population variance with least mean squared errors, Ref. [19,20] presented a novel family of estimators based on extreme values. To estimate the population variance, Ref. [21] recently introduced a number of effective estimators by transforming extreme values. To improve the mean estimation methods under extreme values, Refs. [22,23,24] introduced different transformed estimators.
Recently, some researchers obtained different classes of estimators by using transformation on a single auxiliary variable. In order to provide the more precise estimators while taking extreme values into consideration, we introduce a new family of estimators by employing the double transformation method on two auxiliary variables and show that our new double-transformed exponential estimators provide more precise results as compared to other variance estimators under two-phase sampling. These double-transformed estimators are based on the largest and smallest observations of the auxiliary variable as well as the ranking of the smallest and largest observation of the auxiliary variable. Therefore, this ranking method can possibly be helpful for improving the performance and precision of the estimators.
Practical applications of the proposed estimators:
The improved estimators have significant potential for application in various fields and specific domains, such as medicine, industry, retail, and transportation. These estimators can be directly applied to real-world scenarios where accurate predictions are crucial.
  • Medicine: In the medical field, we illustrate how our proposed method can be applied to improve the accuracy of early-stage cancer detection, particularly in the analysis of medical imaging data. By using our method with existing diagnostic tools, such as MRI or CT scans, it can help healthcare professionals more accurately identify malignant tissues, enabling earlier intervention and improving patient outcomes. For instance, the method could enhance the detection of tumors in breast cancer patients by analyzing patterns in mammography images, leading to fewer false negatives and earlier treatment.
  • Industry: In the industrial sector, we explore the application of our method in optimizing predictive maintenance within manufacturing processes. Specifically, our approach can be used to monitor and predict equipment failure in an automotive manufacturing plant. By analyzing sensor data from machinery, our method can predict when a component is likely to fail, allowing for pre-emptive maintenance that minimizes downtime and reduces repair costs. This can lead to increased operational efficiency and cost savings, as well as fewer disruptions in the production line.
  • Retail: In retail, our method can enhance inventory management by predicting product demand more accurately. This would allow retailers to stock the right amount of products at the right time, reducing overstocking and understocking issues. We believe that including these practical examples not only demonstrates the versatility of our proposed method but also illustrates its potential to drive meaningful improvements in both healthcare and industrial operations.
  • Transportation: In the transportation sector, our method can improve route optimization for delivery services. By analyzing traffic patterns and predicting delays, it can help logistics companies plan the most efficient routes, reducing fuel consumption and delivery times.
The following are the arrangements of the next sections of this article: The notations and an overview of various existing estimators are presented in Section 2. We explore our double-transformed family of estimators in depth in Section 3. A mathematical comparison of these estimators is demonstrated in Section 4. A simulation analysis is presented in Section 5 that validates the theoretical results from Section 4 by generating simulated populations based on various probability distributions. In order to demonstrate our theoretical findings, this section also provides numerical examples. In  Section 6, we provide conclusions and discuss future research directions as well.

2. Notations and Existing Estimators

A finite population of N units is represented as Ψ = Ψ 1 , Ψ 2 , , Ψ N . The study variable is denoted by Y, the auxiliary variable is X, and R represents the ranks of the auxiliary variable for each ith unit of y i , x i , and r i . Let the population means be defined below:
Y ¯ = 1 N i = 1 N Y i ,
X ¯ = 1 N i = 1 N X i ,
R ¯ = 1 N i = 1 N R i .
Additionally, population variances are defined as follows (without replacement sampling):
S y 2 = 1 N 1 i = 1 N Y i Y ¯ 2 ,
S x 2 = 1 N 1 i = 1 N X i X ¯ 2
and
S r 2 = 1 N 1 i = 1 N R i R ¯ 2 .
Let C y , C x , and C r be defined as population coefficients of variation, which are calculated as
C y = S y Y ¯ ,
C x = S x X ¯
and
C r = S r R ¯ ,
respectively.
The population correlation coefficients between ( Y , X ) , ( Y , R ) , and ( X , R ) are defined as
ρ y x = S y x S y S x ,
ρ y r = S y r S y S r
and
ρ x r = S x r S x S r ,
respectively.
Consider the following sample variances:
s y 2 = S ^ y 2 = 1 n 1 i = 1 n Y i y ¯ 2 ,
s x 2 = S ^ x 2 = 1 n 1 i = 1 n X i x ¯ 2 ,
s r 2 = S ^ x 2 = 1 n 1 i = 1 n R i r ¯ 2 ,
where
y ¯ = 1 n i = 1 n Y i ,
x ¯ = 1 n i = 1 n X i
and
r ¯ = 1 n i = 1 n R i ,
are the sample means of Y , X , and R. Further, the sample coefficients of variation are defined as follows:
c y = s y y ¯ ,
c x = s x x ¯
and
c r = s r r ¯ .
This study examines a double exponential family of estimators that use different transformations to estimate the finite population variance of study variable Y when the auxiliary variable X is present. The two-phase sampling strategy is defined as follows:
  • For obtaining an accurate estimate of a population variance S x 2 , a first-phase sample m ( m N ) is taken from a population of size N to measure only auxiliary s x 2 .
  • Further, a second-phase sample size denoted by n ( n m ) is obtained from a first-phase sample of size m to observe both y and x, respectively.
Moreover, let s ´ x 2 and s ´ r 2 represent the sample variances obtained from the first-phase sample with a size of m, while s y 2 , s x 2 , and s r 2 denote the sample variances derived from the second-phase sample with a size of n.
We now construct the error terms for calculating the biases and mean square errors of the various existing estimators that are used for comparison in this study:
e 0 = s y 2 S y 2 S y 2 , e 1 = s x 2 S x 2 S x 2 , e 2 = s ´ x 2 S x 2 S x 2 , e 3 = s r 2 S R 2 S r 2 , e 4 = s ´ r 2 S r 2 S R 2 such that E e i = 0 for i = 0 , 1 , 2 , 3 , 4 .
E e 0 2 = θ λ 400 , E e 1 2 = θ λ 040 , E e 2 2 = θ λ 040 ,
E e 3 2 = θ λ 004 , E e 4 2 = θ λ 004 , E e 0 e 1 = θ λ 220 ,
E e 0 e 2 = θ λ 220 , E e 0 e 3 = θ λ 202 , E e 0 e 4 = θ λ 202 ,
E e 1 e 2 = θ λ 040 , E e 1 e 3 = θ λ 022 , E e 1 e 4 = θ λ 022 ,
E e 2 e 3 = θ λ 022 , E e 2 e 4 = θ λ 022 , E e 3 e 4 = θ λ 004 ,
where
λ 400 = ( λ 400 1 ) , λ 040 = ( λ 040 1 ) , λ 004 = ( λ 004 1 ) ,
λ 220 = ( λ 220 1 ) , λ 202 = ( λ 202 1 ) , λ 022 = ( λ 022 1 )
and
θ = 1 n 1 N ,
θ = 1 m 1 N ,
θ = 1 n 1 m ,
θ , θ , and  θ are sampling fraction differences or sampling correction terms used to adjust for varying sample sizes at each phase: θ for the second phase, θ for the first phase, and  θ for the inter-phase. They are important in designing and analyzing estimators to ensure unbiasedness and efficiency in complex survey sampling, particularly in two-phase sampling schemes.
Further,
λ l q s = μ l q s μ 200 l / 2 μ 020 q / 2 μ 002 s / 2
and
μ l q s = i = 1 N Y i Y ¯ l X i X ¯ q R i R ¯ s N 1 ,
where λ l q s represents the population central moment with orders ( l , q , s ) , and  ( μ 200 , μ 020 , μ 002 ) denotes the standard deviation of ( Y , X , R ) . Here, λ 400 = β 2 ( y ) , λ 040 = β 2 ( x ) , and  λ 004 = β 2 ( r ) are the population coefficients of kurtosis.
We know that the estimator for variance is expressed as:
T 1 = 1 n 1 i = 1 n y i Y ¯ 2 .
The variance of T 1 can be calculated as:
V a r ( T 1 ) = θ S y 4 λ 400 .
The ratio estimator is denoted as T 2 : introduced by [4], which is given by
T 2 = s y 2 s ´ x 2 s x 2 .
For T 2 , the following equations give the bias and MSE expressions:
B i a s T 2 θ S y 4 λ 040 λ 220
and
M S E T 2 S y 4 θ λ 400 + θ λ 040 2 θ λ 220 .
We rewrite the linear regression estimator in terms of two-phase sampling, denoted by T 3 , proposed by [25], which can be expressed as
S ^ l r 2 = s y 2 + b ( s y 2 , s x 2 ) s ´ x 2 s x 2 ,
where
b ( s y 2 , s x 2 ) = s y 2 λ ^ 220 s x 2 λ ^ 040
represents a sample regression coefficient.
For T 3 , the following equation gives the MSE expression:
M S E T 3 S y 4 λ 400 θ θ ρ y x 2 ,
where
ρ y x = λ 220 λ 400 λ 040 .
We rewrite the exponential ratio-type estimator T 4 proposed by [8] under two-phase sampling as follows:
T 4 = s y 2 exp s ´ x 2 s x 2 s ´ x 2 + s x 2 .
For T 4 , the following equations give the bias and MSE expressions:
B i a s T 4 1 2 θ S y 2 3 λ 040 4 λ 220
and
M S E T 4 S y 4 θ λ 400 + θ λ 040 4 λ 220 .
We rewrite an improved ratio-type estimator T 5 proposed by [9] under two-phase sampling as follows:
T 5 = s y 2 s ´ x 2 + λ 040 s x 2 + λ 040 .
For T 5 , the following equations give the bias and MSE expressions:
B i a s T 5 θ g S y 2 g λ 040 λ 220
and
M S E T 5 S y 4 θ λ 400 + θ g 2 λ 040 2 g λ 220 ,
where
g = S x 2 S x 2 + λ 040 .
We rewrite the exponential ratio-type estimators introduced by [11] under two-phase sampling as follows:
T 6 = s y 2 s ´ x 2 + C x s x 2 + C x ,
T 7 = s y 2 λ 040 s ´ x 2 + C x λ 040 s x 2 + C x
and
T 8 = s y 2 C x s ´ x 2 + λ 040 C x s x 2 + λ 040 .
For T i ( i = 6 , 7 , 8 ) , the following equations give the bias and MSE expressions:
B i a s T i θ t i S y 2 t j λ 040 λ 220 , j = 0 , 1 , 2
and
M S E T i S y 4 θ λ 400 + θ t j 2 λ 040 2 t i λ 220 ,
where
t 0 = S x 2 S x 2 + C x ,
t 1 = λ 040 S x 2 λ 040 S x 2 + C x ,
t 2 = C x S x 2 C x S x 2 + λ 040 .

3. Proposed Estimator

In this section, a class of double-transformed exponential estimators is established by utilizing known maximum and minimum data points of the auxiliary variable along with the ranks of the auxiliary variable in a two-phase sampling strategy. The following defines the suggested estimator:
S ^ H 2 = s y 2 exp v 1 d 1 s ´ x 2 s x 2 d 1 s ´ x 2 + s x 2 + 2 d 2 exp v 2 d 3 s ´ r 2 s r 2 d 3 s ´ r 2 + s r 2 + 2 d 4 ,
where d i , i = 1 , 2 , 3 , 4 are known parameters of variable X, and  v 1 , v 2 are constant values of either 1 or 2. Notations ( X m , X M h ) indicate the minimum and maximum auxiliary variable values, whereas ( R m , R M ) represent the minimum and maximum values of the auxiliary variable ranks. The transformation values of d i , i = 1 , 2 , 3 , 4 are described in Table 1, while Table 2 presents estimators using (18).

3.1. Properties of the Proposed Estimator

The bias and mean squared error (MSE) of S ^ H 2 are obtained by rewriting (18) in terms of errors in order to describe the characteristics of the double exponential class of estimators. For example,
S ^ H 2 = S y 2 1 + e 0 exp t 4 e 2 e 1 2 1 + t 4 2 e 1 + e 2 1 exp t 5 e 4 e 3 2 1 + t 5 2 e 3 + e 4 1
where
v 1 = v 2 = 1 ,
t 4 = d 1 S x 2 d 1 S x 2 + d 2
and
t 5 = d 3 S r 2 d 3 S r 2 + d 4 .
By using the Taylor series expansion up to the first order, we obtain
S ^ H 2 S y 2 S y 2 e 0 t 4 2 e 1 e 2 t 5 2 e 3 e 4 + 3 t 4 2 8 e 1 2 t 4 2 8 e 2 2 + 3 t 5 2 8 e 3 2 t 5 2 8 e 4 2 t 4 2 e 0 e 1 + t 4 2 e 0 e 2 t 5 2 e 0 e 3 + t 5 2 e 0 e 4 t 4 2 2 e 1 e 2 + t 4 t 5 4 e 1 e 3 t 4 t 5 4 e 1 e 4 t 4 t 5 4 e 2 e 3 + t 4 t 5 4 e 2 e 4 t 5 2 2 e 3 e 4 .
Using (20), the bias of S ^ d u e 2 is given by
B i a s S ^ H 2 θ S y 2 3 t 4 2 8 λ 040 + 3 t 5 2 8 λ 004 t 4 2 λ 220 t 5 2 λ 202 + t 4 t 5 2 λ 022 θ S y 2 3 t 4 2 8 λ 040 + 3 t 5 2 8 λ 004 t 4 2 λ 220 t 5 2 λ 202 + t 4 t 5 2 λ 022 .
After some simplifications, we obtain
B i a s S ^ H 2 θ S y 2 3 8 t 4 2 λ 040 + t 5 2 λ 004 1 2 t 4 λ 220 + t 5 λ 202 t 4 t 5 λ 022 ,
where
θ = θ θ .
We derived an M S E of S ^ H 2 by squaring both sides of (20) and applying the expectation, which is expressed by the following equation, i.e.,
M S E S ^ H 2 θ S y 4 λ 400 + t 4 2 4 λ 040 + t 5 2 4 λ 004 t 4 λ 220 t 5 λ 202 + t 4 t 5 2 λ 022 θ S y 4 t 4 2 4 λ 040 + t 5 2 4 λ 004 t 4 λ 220 t 5 λ 202 + t 4 t 5 2 λ 022 .
After some simplifications, we obtain
M S E S ^ H 2 S y 4 θ λ 400 + θ 4 t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 .

3.2. Analysis of Continuity, Boundedness, and Partial Derivative Properties of the Proposed Estimator

To check the continuity, the boundedness, and the existence and continuity of the first- and second-order partial derivatives of the proposed estimator (18), we need to carefully analyze the properties of the estimator function.
Step 1: Restating the proposed estimator
The proposed estimator for the finite population variance S y 2 is given below:
S ^ H 2 = s y 2 exp v 1 d 1 s ´ x 2 s x 2 d 1 s ´ x 2 + s x 2 + 2 d 2 exp v 2 d 3 s ´ r 2 s r 2 d 3 s ´ r 2 + s r 2 + 2 d 4 ,
where
  • s y 2 , s x 2 , and s r 2 are sample variances from the second-phase sample.
  • s ´ x 2 , s ´ r 2 are sample variances from the first-phase sample.
  • d 1 , d 2 , d 3 , d 4 , are known constants, while v 1 , v 2 are constants, either 1 or 2.
Step 2: Continuity of the function
Exponential function behavior: The exponential function exp ( x ) is well known for being continuous for all real values of x . This property holds true for the terms inside the exponents of the proposed estimator. Let us analyze the expression:
e x p v 1 d 1 s ´ x 2 s x 2 d 1 s ´ x 2 + s x 2 + 2 d 2
and
exp v 2 d 3 s ´ r 2 s r 2 d 3 s ´ r 2 + s r 2 + 2 d 4 .
Since the exponential function is continuous for any real-valued input, we only need to ensure that the terms inside the exponents are continuous functions.
Continuity of the expressions Inside the exponentials: Each term inside the exponentials has the following form:
d 1 s ´ x 2 s x 2 d 1 s ´ x 2 + s x 2 + 2 d 2 , d 3 s ´ r 2 s r 2 d 3 s ´ r 2 + s r 2 + 2 d 4 ,
these are rational functions (fractions), and rational functions are continuous wherever the denominator is non-zero. The denominators are d 1 s ´ x 2 + s x 2 + 2 d 2 and d 3 s ´ r 2 + s r 2 + 2 d 4 , which are always positive (since s x 2 and s r 2 are variances and therefore non-negative), as long as d 2 , d 4 are positive constants. Thus, the rational functions inside the exponentials are continuous. Therefore, the function S ^ H 2 is continuous.
Step 3: Boundedness of the Function
For boundedness, we need to ensure that S ^ H 2 does not approach infinity for any valid values of the sample variances s x 2 , s y 2 , and s y 2 .
1. Exponential terms: The exponential terms are bounded because of the following points:
  • The ratios d 1 s ´ x 2 s x 2 d 1 s ´ x 2 + s x 2 + 2 d 2 and d 3 s ´ r 2 s r 2 d 3 s ´ r 2 + s r 2 + 2 d 4 are finite as long as s x 2 , s r 2 are finite, which is guaranteed since they are sample variances.
  • The exponential function of a real-valued number is always finite and bounded.
2. Multiplying by s y 2 : Since s y 2 is the sample variance, it is also finite and non-negative.
Thus, the estimator S ^ H 2 is bounded.
Step 4: Existence and continuity of partial derivatives
  • First-Order Partial Derivatives: The first-order partial derivatives of S ^ H 2 with respect to s x 2 , s y 2 , and s y 2 exist because the exponential function and the rational functions inside the exponents are smooth and differentiable. Thus, these partial derivatives are continuous.
  • Second-Order Partial Derivatives: Similarly, the second-order partial derivatives exist and are continuous because the functions involved (exponentials and rational functions) are smooth and their derivatives are bounded.
Therefore, from the above observation, we can conclude that the function S ^ H 2 is continuous and bounded. Moreover, the first- and second-order partial derivatives of S ^ H 2 with respect to the sample variances exist, are continuous, and are bounded.

4. Mathematical Comparison

The suggested double exponential class of estimators S ^ H 2 is compared and analyzed in this section with a number of other estimators, including T 1 , T 2 , T 3 , T 4 , T 5 , and  T i ( i = 6 , 7 , 8 ) .
Condition (1): By (1) and (23),
V a r ( T 1 ) > M S E S ^ H 2   if
θ θ t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 < 0 .
For θ θ > 0 , that is, θ > θ :
t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 < 0 .
Similarly, θ θ < 0 , that is, θ < θ :
t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 > 0 .
When (24) or (25) are satisfied, the  estimator S ^ H 2 performs well compared to V a r ( T 1 ) .
Condition (2): By (4) and (23),
M S E ( T 2 ) > M S E S ^ H 2   if
θ θ λ 040 2 λ 022 1 4 t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 > 0 .
For θ θ < 0 , that is, θ < θ :
λ 040 2 λ 022 1 4 t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 > 0 .
Similarly θ θ > 0 , that is, θ > θ :
λ 040 2 λ 022 1 4 t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 < 0 .
When (26) or (27) are satisfied, the  estimator S ^ H 2 performs well compared to M S E ( T 2 ) .
Condition (3): By (6) and (23),
M S E ( T 3 ) > M S E S ^ H 2   if
θ θ ρ y x 2 + 1 4 t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 2 t 5 λ 022 < 0 .
For θ θ > 0 , that is, θ > θ :
ρ y x 2 + 1 4 t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 < 0 .
Similarly θ θ < 0 , that is, θ < θ :
ρ y x 2 + 1 4 t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 > 0 .
When (28) or (29) are satisfied, the  estimator S ^ H 2 performs well compared to M S E ( T 3 ) .
Condition (4): By (9) and (23),
M S E ( T 4 ) > M S E S ^ H 2   if
θ θ t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 λ 040 4 λ 220 < 0 .
For θ θ > 0 , that is, θ > θ :
t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 λ 040 4 λ 220 < 0 .
Similarly θ θ < 0 , that is, θ < θ :
t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 λ 040 4 λ 220 > 0 .
When (30) or (31) are satisfied, the  estimator S ^ H 2 is performs well compared to M S E ( T 4 ) .
Condition (5): By (12) and (23),
M S E ( T 5 ) > M S E S ^ H 2   if
θ θ t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 4 g 2 λ 040 4 g λ 220 < 0 .
For θ θ > 0 , that is, θ > θ :
t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 4 g 2 λ 040 4 g λ 220 < 0 .
Similarly θ θ < 0 , that is, θ < θ :
t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 4 g 2 λ 040 4 g λ 220 > 0 .
When (32) or (33) are satisfied, the  estimator S ^ H 2 performs well as compared to M S E ( T 5 ) .
Condition (6): By (17) and (23),
M S E ( T i ) > M S E S ^ H 2   if
θ θ t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 4 t i 2 λ 040 4 t i λ 220 < 0 .
For θ θ > 0 , that is, θ > θ :
t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 t i 2 λ 040 4 t i λ 220 < 0 .
Similarly θ θ < 0 , that is, θ < θ :
t 4 2 λ 040 + t 5 2 λ 004 4 t 4 λ 220 4 t 5 λ 202 + 2 t 4 t 5 λ 022 4 t i 2 λ 040 4 t i λ 220 > 0 .
When (34) or (35) are satisfied, the  estimator S ^ H 2 performs well compared to M S E ( T i ) .

5. Numerical Analysis

In this section, we compare our proposed double exponential class of estimators to existing ones by utilizing the population obtained by simulation and three different real data sets in terms of the percent relative efficiency (PRE). This study uses three distinct real and simulated data sets. Our goal is to provide a comprehensive evaluation of the effectiveness as well as the reliability of the suggested class of estimators in various types of practical situations using the PRE criterion on these simulated and different data sets.

5.1. Simulation Study

We perform a simulation analysis to investigate the performance of the suggested double exponential estimators, which are based on the known largest and smallest values of the auxiliary variable and its ranks in a two-phase sampling framework, in order to validate our theoretical findings introduced in Section 4. The auxiliary variable X is artificially produced in this simulation for six various populations, each of which correspond to a probability distribution that is discussed below.
  • Data 1: X U n i f o r m ( 3 , 5 ) .
  • Data 2: X U n i f o r m ( 6 , 9 ) .
  • SData 3: X E x p o n e n t i a l ( 5 ) .
  • Data 4: X E x p o n e n t i a l ( 8 ) .
  • Data 5: X G a m m a ( 2 , 6 ) .
  • Data 6: X G a m m a ( 7 , 10 ) .
In each distribution, the dependent variable Y is obtained by employing the following formula:
Y = r y x × X + e ,
where X is selected from each population distribution,
r y x = 0.70
is a correlation coefficient, and 
e N ( 0 , 1 )
denotes a random error term.
Using R-Software (latest v. 4.4.0), we developed the following procedures to calculate the mean squared error (MSE) as well as the percent relative efficiency (PRE) of the suggested class of estimators (Algorithm 1):
Algorithm 1: Procedure for Calculating the Mean Squared Error (MSE) and Percent Relative Efficiency (PRE)
Step 1: Using the given probability distributions, obtain a population Ψ of size N = 1500 .
Step 2: To choose a first-phase sample S 1 of size m from Ψ by applying simple random sampling without replacement (SRSWOR).
Step 3: Using SRSWOR, choose a second-phase sample S 2 of size n from S 1 .
Step 4: From the above steps, find the optimal values for unknown constants in estimators by computing population total, X M , X m , R M , and R m .
Step 5: Apply a two-phase sampling design as described in Step 2 and Step 3.
Step 6: For each sample size, compute PRE ( S ^ i 2 ) for each described estimator S ^ i 2 .
Step 7: Use the following formulas to calculate the percent relative efficiency values for each estimator after repeating Steps 5 and 6 a total of 40,000 times. The findings for the simulated populations are shown in Table 3.
M S E ( S ^ i 2 ) min = j = 1 40000 S ^ i 2 S y 2 2 40000
and
P R E = V ( T 1 ) M S E ( S ^ i 2 ) m i n × 100 ,
where S ^ i 2 is one of T 1 , T 2 , T 3 , T 4 , T 5 , T 6 , T 7 , T 8 , H k ( k = 1 , 2 , , 8 ) .

5.2. Numerical Examples

Using different data sets, we computed the percent relative efficiency all the estimators in order to evaluate their performance with our suggested double class of exponential type estimators. Below are detailed explanations of the data sets, with descriptive statistics given in Table 4, Table 5 and Table 6.
Data 1. (Source: [26], p. 226)
Y: The employment levels that each department records, which indicate the total number of employees;
X: Information on industrial activities, including the total number of factories in which these departments are formally registered;
R: A comparative perspective of industrial participation across departments is provided by the rankings given to each department based on the total number of factories they registered in 2012.
Data 2. (Source: [3], p. 24)
Y: The family’s food bills, which have a direct connection to their job;
X: The entire weekly revenue received by the family, which represents their available funds during that period of time;
R: Ranking of families based on their weekly income.
Data 3. (Source: [26], p. 135)
Y: Total student enrollment at institutions in 2012;
X: Number of government institutions in 2012;
R: Ranking of government institutions for 2012.
Finally, the following formula is considered to obtain the percent relative efficiency values for different estimators using the above data sets.
P R E = V ( T 1 ) M S E ( S ^ k 2 ) × 100 ,
where S ^ k 2 is one of T 1 , T 2 , T 3 , T 4 , T 5 , T 6 , T 7 , T 8 , H i ( i = 1 , 2 , , 8 ) .

6. Conclusions

In this research, we established an improved and effective double exponential estimator for population variance that use ranks of the auxiliary variable and information on the minimum and maximum values. A comparison with existing estimators is made straightforward by the theoretical assumptions that show how these proposed estimators are more efficient, as detailed in Section 4. To verify these theoretical assumptions, we conducted simulation studies and analyzed various types of real data sets. According to the results, which are shown in Table 3, the new estimators are consistently better than existing approaches in terms of PREs. These conclusions and the theoretical ideas from Section 4 are supported by the empirical results displayed in Table 7.
We can conclude from the simulation and empirical data that the proposed estimators S ^ H k 2 ( k = 1 , 2 , , 8 ) are more precise and efficient compared to other estimators that were discussed in this paper. Among the proposed estimators, S ^ H 8 2 is the most effective choice in the proposed classes of estimators and is therefore highly recommended.
We discussed the performance and properties of the new recommended double exponential family of estimators under a double-phase sampling method. Furthermore, the systematic sampling approach may be used to propose novel estimators, and our findings can help in the development of estimators with minimized MSE. This presents an intriguing direction for future research.

Author Contributions

Conceptualization, U.D., D.A., J.W. and W.E.; methodology, U.D., D.A., J.W. and W.E.; software, U.D., D.A. and J.W.; validation, U.D., D.A., J.W. and W.E.; formal analysis, U.D. and D.A.; investigation, U.D., D.A. and J.W.; resources, U.D., D.A. and J.W.; data curation, U.D., D.A., J.W. and W.E.; writing—original draft, U.D. and D.A.; writing—review & editing, U.D., D.A. and J.W.; visualization, U.D., D.A. and J.W.; supervision, U.D.; project administration, U.D., D.A. and W.E.; funding acquisition, W.E. All authors have read and agreed to the published version of the manuscript.

Funding

The study was funded by the Researchers Supporting Project number (RSPD2025R749), King Saud University, Riyadh, Saudi Arabia.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Neyman, J. Contribution to the theory of sampling human population. J. Am. Stat. Assoc. 1938, 33, 101–116. [Google Scholar] [CrossRef]
  2. Sukhatme, B.V. Some ratio-type estimators in two-phase sampling. J. Am. Stat. Assoc. 1962, 57, 628–632. [Google Scholar] [CrossRef]
  3. Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
  4. Isaki, C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
  5. Rao, J.N.K. On double sampling for stratification and analytical surveys. Biometrika 1973, 60, 125–133. [Google Scholar] [CrossRef]
  6. Vishwakarma, G.K.; Zeeshan, S.M. Generalized ratio-cum-product estimator for finite population mean under two-phase sampling scheme. J. Mod. Appl. Stat. Methods 2020, 19, 7. [Google Scholar] [CrossRef]
  7. Zaman, T.; Kadilar, C. New class of exponential estimators for finite population mean in two-phase sampling. Commun. Stat. Theory Methods 2021, 50, 874–889. [Google Scholar] [CrossRef]
  8. Bahl, S.; Tuteja, R. Ratio and product type exponential estimators. J. Inf. Optim. Sci. 1991, 12, 159–164. [Google Scholar] [CrossRef]
  9. Upadhyaya, L.; Singh, H. An estimator for population variance that utilizes the kurtosis of an auxiliary variable in sample surveys. Vikram Math. J. 1999, 19, 14–17. [Google Scholar]
  10. Dubey, V.; Sharma, H. On estimating population variance using auxiliary information. Stat. Transit. New Ser. 2008, 9, 7–18. [Google Scholar]
  11. Kadilar, C.; Cingi, H. Ratio estimators for the population variance in simple and stratified random sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
  12. Singh, H.; Chandra, P. An alternative to ratio estimator of the population variance in sample surveys. J. Transp. Stat. 2008, 9, 89–103. [Google Scholar]
  13. Singh, H.P.; Solanki, R.S. A new procedure for variance estimation in simple random sampling using auxiliary information. J. Stat. Pap. 2013, 54, 479–497. [Google Scholar] [CrossRef]
  14. Singh, H.P.; Pal, S.K. An efficient class of estimators of finite population variance using quartiles. J. Appl. Stat. 2016, 43, 1945–1958. [Google Scholar] [CrossRef]
  15. Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat. Theory Methods 2023, 52, 2610–2624. [Google Scholar] [CrossRef]
  16. Mohanty, S.; Sahoo, J. A note on improving the ratio method of estimation through linear transformation using certain known population parameters. Sankhyā Indian J. Stat. Ser. B 1995, 57, 93–102. [Google Scholar]
  17. Khan, M.; Shabbir, J. Some improved ratio, product, and regression estimators of finite population mean when using minimum and maximum values. Sci. World J. 2013, 2013, 431868. [Google Scholar] [CrossRef]
  18. Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 20. [Google Scholar] [CrossRef]
  19. Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
  20. Alomair, M.A.; Daraz, U. Dual transformation of auxiliary variables by using outliers in stratified random sampling. Mathematics 2024, 12, 2829. [Google Scholar] [CrossRef]
  21. Daraz, U.; Alomair, M.A.; Albalawi, O.; Al Naim, A.S. New Techniques for Estimating Finite Population Variance Using Ranks of Auxiliary Variable in Two-Stage Sampling. Mathematics 2024, 12, 2741. [Google Scholar] [CrossRef]
  22. Cekim, H.O.; Cingi, H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacet. J. Math. Stat. 2017, 46, 685–694. [Google Scholar]
  23. Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  24. Walia, G.S.; Kaur, H.; Sharma, M. Ratio type estimator of population mean through efficient linear transformation. Am. J. Math. Stat. 2015, 5, 144–149. [Google Scholar]
  25. Watson, D.J. The estimation of leaf area in field crops. J. Agric. Sci. 1937, 27, 474–483. [Google Scholar] [CrossRef]
  26. Bureau of Statistics. Punjab Development Statistics; Government of the Punjab: Lahore, Pakistan, 2013. [Google Scholar]
Table 1. Different known parameters of the auxiliary variables.
Table 1. Different known parameters of the auxiliary variables.
d 1 d 2 d 3 d 4
1 X M X m 1 R M R m
ρ y x X M X m ρ y r R M R m
X M X m ρ y x R M R m ρ y r
X M X m β 2 ( x ) R M R m β 2 ( r )
X M X m 1 R M R m 1
β 2 ( x ) X M X m β 2 ( r ) R M R m
X M X m C x R M R m C r
C x X M X m C r R M R m
Table 2. Different classes of double exponential estimators.
Table 2. Different classes of double exponential estimators.
S ^ H 1 2 = s y 2 exp v 1 s ´ x 2 s x 2 s ´ x 2 + s x 2 + 2 X M X m exp v 2 s ´ r 2 s r 2 s ´ r 2 + s r 2 + 2 R M R m
S ^ H 2 2 = s y 2 exp v 1 ρ y x s ´ x 2 s x 2 ρ y x s ´ x 2 + s x 2 + 2 X M X m exp v 2 ρ y r s ´ r 2 s r 2 ρ y r s ´ r 2 + s r 2 + 2 R M R m
S ^ H 3 2 = s y 2 exp v 1 X M X m s ´ x 2 s x 2 X M X m s ´ x 2 + s x 2 + 2 ρ y x exp v 2 R M R m s ´ r 2 s r 2 R M R m s ´ r 2 + s r 2 + 2 ρ y r
S ^ H 4 2 = s y 2 exp v 1 X M X m s ´ x 2 s x 2 X M X m s ´ x 2 + s x 2 + 2 β 2 ( x ) exp v 2 R M R m s ´ r 2 s r 2 R M R m s ´ r 2 + s r 2 + 2 β 2 ( r )
S ^ H 5 2 = s y 2 exp v 1 X M X m s ´ x 2 s x 2 X M X m s ´ x 2 + s x 2 + 2 exp v 2 R M R m s ´ r 2 s r 2 R M R m s ´ r 2 + s r 2 + 2
S ^ H 6 2 = s y 2 exp v 1 β 2 ( x ) s ´ x 2 s x 2 β 2 ( x ) s ´ x 2 + s x 2 + 2 X M X m exp v 2 β 2 ( r ) s ´ r 2 s r 2 β 2 ( r ) s ´ r 2 + s r 2 + 2 R M R m
S ^ H 7 2 = s y 2 exp v 1 X M X m s ´ x 2 s x 2 X M X m s ´ x 2 + s x 2 + 2 C x exp v 2 R M R m s ´ r 2 s r 2 R M R m s ´ r 2 + s r 2 + 2 C r
S ^ H 8 2 = s y 2 exp v 1 C x s ´ x 2 s x 2 C x s ´ x 2 + s x 2 + 2 X M X m exp v 2 C r s ´ r 2 s r 2 C r s ´ r 2 + s r 2 + 2 R M R m
Table 3. P R E values using different artificial populations.
Table 3. P R E values using different artificial populations.
EstimatorUni(3,5)Uni(6,9)Exp(5)Exp(8)Gam(2,6)Gam(7,10)
T 1 100100100100100100
T 2 136.645133.902143.443138.506150.209137.278
T 3 145.523144.483165.609152.625152.889138.745
T 4 152.289156.120158.701154.200151.774136.285
T 5 153.448157.392162.209156.526159.130139.133
T 6 155.552159.409166.989158.789154.651132.807
T 7 155.526158.762165.335158.002153.299130.218
T 8 154.798153.580163.290159.328151.416136.809
H 1 214.203200.120180.309195.139187.505170.274
H 2 289.387262.560210.279210.508179.035163.549
H 3 250.770260.293223.031225.625161.281155.770
H 4 205.560205.449203.971209.663160.441153.939
H 5 219.654222.517240.351230.323165.710160.171
H 6 230.921250.810220.091227.688169.930166.529
H 7 283.228283.941243.339233.167162.019164.905
H 8 294.497282.445288.167289.910190.583183.270
Table 4. Descriptive statistics for data 1.
Table 4. Descriptive statistics for data 1.
N = 36 m = 15 n = 6 X m = 39
X M = 2379 R m = 1 R M = 36 S y = 8512.266
S x = 3410.428 S r = 10.53 X ¯ = 1659.58 Y ¯ = 4015.22
R ¯ = 18.50 C y = 2.12 C x = 2.06 C r = 0.57
ρ y x = 0.65 ρ y r = 0.36 ρ x r = 0.75 λ 400 = 2051
λ 040 = 2237 λ 004 = 3297 λ 220 = 1542 λ 202 = 2698
λ 022 = 3698 θ = 0.14 θ = 0.04 θ = 0.10
Table 5. Descriptive statistics for data 2.
Table 5. Descriptive statistics for data 2.
N = 33 m = 15 n = 6 X m = 58
X M = 95 R m = 1.5 R M = 33 S y = 10.13
S r = 9.64 S x = 10.58 X ¯ = 72.55 Y ¯ = 27.49
R ¯ = 17 C y = 0.37 C x = 0.14 C r = 0.57
ρ y x = 0.25 ρ y r = 0.20 ρ x r = 0.98 λ 400 = 5.55
λ 040 = 2.08 λ 004 = 1.10 λ 220 = 2.22 λ 202 = 0.94
λ 022 = 1.54 θ = 0.14 θ = 0.04 θ = 0.10
Table 6. Descriptive statistics for data 3.
Table 6. Descriptive statistics for data 3.
N = 36 m = 15 n = 6 X m = 39
X M = 2370 R m = 1.5 R M = 36 S y = 32 , 601.14
S r = 10.54 S x = 2214.219 X ¯ = 1054.39 Y ¯ = 14 , 818.70
R ¯ = 18.50 C y = 22.07 C x = 2.10 C r = 0.56
ρ y x = 0.69 ρ y r = 0.39 ρ x r = 0.84 λ 400 = 236.50
λ 040 = 209.80 λ 004 = 329.70 λ 220 = 397.50 λ 202 = 365.80
λ 022 = 469.80 θ = 0.14 θ = 0.04 θ = 0.10
Table 7. P R E values using real data sets.
Table 7. P R E values using real data sets.
EstimatorPop-1Pop-2Pop-3
T 1 100100100
T 2 141.830143.664121.469
T 3 143.225104.673125.532
T 4 152.034128.010115.634
T 5 141.858143.509121.380
T 6 141.839143.655121.460
T 7 141.839143.660121.460
T 8 141.849142.044121.469
H 1 167.220146.790165.289
H 2 170.456150.281172.520
H 3 169.723147.00168.254
H 4 167.489146.520166.768
H 5 164.056146.526165.879
H 6 158.309146.531160.446
H 7 166.615146.923170.236
H 8 172.869151.402177.216
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Daraz, U.; Agustiana, D.; Wu, J.; Emam, W. Twofold Auxiliary Information Under Two-Phase Sampling: An Improved Family of Double-Transformed Variance Estimators. Axioms 2025, 14, 64. https://doi.org/10.3390/axioms14010064

AMA Style

Daraz U, Agustiana D, Wu J, Emam W. Twofold Auxiliary Information Under Two-Phase Sampling: An Improved Family of Double-Transformed Variance Estimators. Axioms. 2025; 14(1):64. https://doi.org/10.3390/axioms14010064

Chicago/Turabian Style

Daraz, Umer, Della Agustiana, Jinbiao Wu, and Walid Emam. 2025. "Twofold Auxiliary Information Under Two-Phase Sampling: An Improved Family of Double-Transformed Variance Estimators" Axioms 14, no. 1: 64. https://doi.org/10.3390/axioms14010064

APA Style

Daraz, U., Agustiana, D., Wu, J., & Emam, W. (2025). Twofold Auxiliary Information Under Two-Phase Sampling: An Improved Family of Double-Transformed Variance Estimators. Axioms, 14(1), 64. https://doi.org/10.3390/axioms14010064

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop