New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application

Alghamdi, Abdulaziz S.; Alrweili, Hleil

doi:10.3390/math13030329

Open AccessArticle

New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application

by

Abdulaziz S. Alghamdi

¹

and

Hleil Alrweili

^2,*

¹

Department of Mathematics, College of Science & Arts, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia

²

Department of Mathematics, College of Science, Northern Border University, Arar 91431, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(3), 329; https://doi.org/10.3390/math13030329

Submission received: 20 November 2024 / Revised: 30 December 2024 / Accepted: 15 January 2025 / Published: 21 January 2025

(This article belongs to the Special Issue Statistical Simulation and Computation: 3rd Edition)

Download Versions Notes

Abstract

Sampling survey data can sometimes contain outlier observations. When the mean estimator becomes skewed due to the presence of extreme values in the sample, results can be biased. The tendency to remove outliers from sample data is common. However, performing such removal can reduce the accuracy of conventional estimating techniques, particularly with regard to the mean square error (MSE). In order to increase population mean estimation accuracy while taking extreme values into consideration, this study presents an enhanced class of estimators. The method uses extreme values from an auxiliary variable as a source of information rather than eliminating these outliers. Using a first-order approximation, the properties of the suggested class of estimators are investigated within the context of a stratified two-phase sampling framework. A simulation research is conducted to examine the practical performance of these estimators in order to validate the theoretical conclusions. To further demonstrate the superiority of the suggested class of estimators for dealing with extreme values, an analysis of three different datasets demonstrates that they consistently provide higher percent relative efficiency (PRE) when compared to existing estimators.

Keywords:

mean estimation; stratified double phase sampling; auxiliary information; outliers; bias; percent relative efficiency

MSC:

62D05

1. Introduction

In survey sampling, supplementary information serves as an effective way to provide better estimator accuracy. Along with information on the study variables, such as ratio, regression, and product type, estimators depend on more information regarding one or more auxiliary variables to increase their relative efficiency. For example, two auxiliary variables, family members and total expenditure, can be utilized to estimate total household income. Extensive research has been performed to generate new and improved overall parameter estimators, such as the population average, total, median, and so on. For additional information on more effective estimators and their characteristics, see [1,2,3,4,5] and its references.

In many cases, the overall population mean is unknown before the survey is conducted. In such situations, a two-phase sampling approach is recommended. Two-phase sampling, also referred to as double sampling, involves two independent processes deployed to draw a sample from an entire population. This method is commonly used when it is more cost-effective or efficient to conduct an initial phase before the main phase. The primary objective of two-phase sampling is to first select a large sample from the population (the first-phase sample) and gather data on auxiliary variables. Then, a second-phase sample is chosen either from the first-phase sample or independently from the population, where the target variable can be observed along with the auxiliary data. Two-phase sampling is an economical approach often used in surveys when supplementary information is unavailable before the survey is conducted. The first introduction of two-phase sampling was discussed by [6]. A two-phase sampling technique has recently gained importance due to its ability to screen variables at the lowest possible cost. Several studies on two-phase sampling have been conducted [7,8,9,10,11,12,13,14,15].

The sample survey data may contain unexpected observation results. The mean estimator is subject to distortion when extreme values are present in the sample, leading to biased outcomes. First, ref. [16] used the minimum and maximum information of known auxiliary variables to suggest two estimators using linear transformation. Subsequently, these concepts were not investigated further until the research findings of [17]. Using known data on extreme values, they proposed various improved ratios, products, and regression estimators for population means. A number of techniques contributed to significant progress in the estimation of the finite population mean when extreme values are present. To increase the accuracy of this estimation, ref. [18] proposed a two-phase sampling strategy. Expanding upon this, ref. [19] developed a unique transformation method that incorporates the extreme values of a supplementary variable to enhance the accuracy of the population mean estimate. Furthermore, ref. [20] achieved major improvements by utilizing a stratified random sampling technique that effectively includes extreme values, thereby improving the estimate process significantly. Furthermore, ref. [21] used the idea of extreme values to develop a new family of estimators aimed for use in predicting population variance with a low mean squared error. In their recent work, ref. [22] produced a number of effective classes of estimators for population variance by transforming extreme values. More information can be found in [23,24,25,26,27,28,29].

The mean square error (MSE) performance of classical estimators, which are important for statistical analysis, frequently declines when using datasets with extreme values. Researchers may be tempted to exclude outlier data points from their samples in an effort to improve their analysis and enhance outcomes due to this decrease in accuracy. However, this technique can result in biased or insufficient estimates of population parameters. These extreme observations must be included so that the estimations can accurately represent the variability and characteristics of the population. To solve this issue, we retain extreme (minimum and maximum) values of the data for the auxiliary variable and use them as supplementary information. The primary aim is to effectively utilize the outlier information in the auxiliary variable as additional data to enhance the efficiency of the proposed class of estimators. In this paper, we provide a more effective class of estimators that estimate the finite population mean in stratified two-phase sampling using the minimum and maximum values of auxiliary variables as auxiliary information. The new suggested method is particularly useful in economic surveys, public health examinations, and environmental evaluations, where similar sample strategies are often used. The new estimators are ideal for disciplines like market research and agricultural surveys that frequently meet extreme values, as they can efficiently include outlier information without distorting results.

The following key factors motivated the development of a new approach for estimating the finite population mean:

Traditional estimators for the finite population mean often disregard extreme values, as these are typically viewed as challenging due to their potential to produce misleading results or inflated mean squared errors (MSEs). The inefficiency of stratified two-phase sampling designs highlights the necessity for a more robust approach that addresses these concerns effectively.
The complex data structure of stratified two-phase sampling creates significant challenges for existing estimators. This highlights the importance of developing estimators that are more reliable and efficient.
Two-phase sampling allows researchers to select specific clusters or strata that accurately represent the entire population, ensuring that different sub-populations are properly represented.
Two-phase sampling is a useful method in a variety of research situations because it offers more accurate representation and control of variability along with cost savings, enhanced precision, and flexibility.
This paper introduces a new class of estimators for the finite population mean, using extreme values of auxiliary variables to enhance accuracy and stability, particularly in the presence of outliers.
The new estimators improve bias and MSE performance compared to existing methods by using extreme values, leading to more accurate finite population mean estimates, particularly with outliers or skewed data.
The study includes theoretical analysis of the biases and MSE of the estimators, along with Monte Carlo simulations and real-life applications. These show that the proposed estimators outperform traditional methods in percent relative efficiency (PRE), confirming the theoretical improvements with practical results.
The estimators presented in this paper are very useful and mathematically robust, providing useful methods for real-life applications where auxiliary data are frequently available, such as economic data analysis, public health, and education.

This paper is divided into the following outline: Section 2 presents the study methodology and notation. Section 3 provides an overview of the existing estimators. The different types of suggested estimators are covered in detail in Section 4. In Section 5, these estimators are compared mathematically. To confirm the theoretical results from Section 5, a simulation study that generates six different artificial populations using different probability distributions is described in Section 6. Additionally, numerical examples are provided in this part to illustrate the practical uses of our theoretical findings. Lastly, a summary of the main findings and some recommendations for further study are provided in Section 8.

2. Methodology and Notation

Consider a finite population

Γ = (Γ_{1}, Γ_{2}, Γ_{3}, \dots, Γ_{N})

consisting of N units. The population is divided into L strata, each of which has a size of

N_{h}

for

h = 1, 2, \dots, L

, such that the overall population N is equal to the sum of the sizes of all the strata, which is defined as follows:

\sum_{h = 1}^{L} N_{h} = N .

(1)

Let

v_{h i}

be the value of the study variable V, and let

u_{h i}

denote the value of the auxiliary variable U in the h-th stratum of the ith

(i = 1, 2, \dots, N_{h})

unit, respectively.

In this article, we provide an enhanced class of estimators designed to calculate the finite population mean

\bar{V}

of V when an auxiliary variable U is involved. The definition of the two-phase sampling scheme is as follows:

The population mean $\bar{U}$ is estimated in the first phase by selecting a sample of size $(m_{h} < N_{h})$ .
In order to observe v and u, respectively, a sample size of $(n_{h} < m_{h})$ is chosen for the second phase.

Consider the population means of V and U, which are expressed as follows:

\bar{V} = \sum_{h = 1}^{L} \frac{W_{h} \bar{V_{h}}}{N_{h}},

(2)

\bar{U} = \sum_{h = 1}^{L} \frac{W_{h} \bar{U_{h}}}{N_{h}},

(3)

where

\bar{V_{h}} = \sum_{i = 1}^{N_{h}} \frac{V_{h i}}{N_{h}}

(4)

and

\bar{U_{h}} = \sum_{i = 1}^{N_{h}} \frac{U_{h i}}{N_{h}},

(5)

represent the population means of V and U in the h-th stratum, while

W_{h}

is the stratum weight and defined by

W_{h} = \frac{N_{h}}{N} .

(6)

Let the population variances (without replacement sampling) of V and U in the h-th stratum be defined as follows:

ω_{v_{h}}^{2} = \sum_{i = 1}^{N_{h}} \frac{{(V_{h i} - \bar{V_{h}})}^{2}}{N_{h} - 1}

(7)

and

ω_{u_{h}}^{2} = \sum_{i = 1}^{N_{h}} \frac{{(U_{h i} - \bar{U_{h}})}^{2}}{N_{h} - 1} .

(8)

The population coefficients of variation for these variables in the h-th stratum are defined as follows:

τ_{v_{h}} = \frac{ω_{v_{h}}}{\bar{V_{h}}},

(9)

τ_{u_{h}} = \frac{ω_{u_{h}}}{\bar{U_{h}}} .

(10)

Furthermore, the population correlation coefficients between

(V, U)

in the h-th stratum is defined as

ρ_{v_{h} u_{h}} = \frac{ω_{v_{h} u_{h}}}{ω_{v_{h}} ω_{u_{h}}} .

(11)

A sample of size

n_{h}

from the h-th stratum is chosen through a simple random sample without replacement. Let the sample variances of the study variable and auxiliary variable in the h-th stratum be defined as follows:

{\hat{ω}}_{v_{h}}^{2} = \sum_{i = 1}^{n_{h}} \frac{{(v_{h i} - \bar{v_{h}})}^{2}}{n_{h} - 1},

(12)

{\hat{ω}}_{u_{h}}^{2} = \sum_{i = 1}^{n_{h}} \frac{{(u_{h i} - \bar{u_{h}})}^{2}}{n_{h} - 1},

(13)

where

{\bar{v}}_{h} = \frac{1}{n_{h}} \sum_{i = 1}^{n_{h}} v_{h i}

(14)

and

{\bar{u}}_{h} = \frac{1}{n_{h}} \sum_{i = 1}^{n_{h}} u_{h i},

(15)

represent the sample means of V and U in the h-th stratum, respectively. Let the sample coefficients of variation for variables V and U in the h-th stratum be defined as

{\hat{τ}}_{v_{h}} = \frac{{\hat{ω}}_{v_{h}}}{\bar{v_{h}}}

(16)

and

{\hat{τ}}_{u_{h}} = \frac{ω_{u_{h}}}{\bar{u_{h}}} .

(17)

Moreover, let

{\bar{u}}_{h}^{'}

be the sample mean computed from the firs- phase sample with an

m_{h}

size, while

\bar{v_{h}}

and

\bar{u_{h}}

express the sample means obtained from the second phase sample with an

n_{h}

size.

3. Existing Estimators

Under the stratified two-phase sampling method, we investigate the characteristics of many existing estimators in this section, including the biases and mean squared errors for estimating the finite population mean, while comparing them with our proposed class of estimators.

The usual unbiased estimator to estimate the population mean is given by the following:

{\bar{v}}_{t_{1}} = \sum_{h = 1}^{L} W_{h} \bar{v_{h}} .

(18)

The variance of (18) is defined as follows:

V a r ({\bar{v}}_{t_{1}}) = \sum_{h = 1}^{L} δ_{h} W_{h}^{2} {\bar{V_{h}}}^{2} τ_{v_{h}}^{2},

(19)

where

δ_{h} = (\frac{1}{n_{h}} - \frac{1}{N_{h}}) .

In the stratified double phase approach, Reference [6] defined a ratio type estimator for

\bar{V}

, which is given below:

{\bar{v}}_{t_{2}} = \sum_{h = 1}^{L} W_{h} \bar{v_{h}} (\frac{\bar{u}}{{\bar{u}}_{h}^{'}}) .

(20)

Equations (21) and (22) define the bias and

M S E

of

{\bar{v}}_{t_{2}},

respectively:

B i a s ({\bar{v}}_{t_{2}}) ≅ \sum_{h = 1}^{L} δ_{h}^{″} W_{h} \bar{V_{h}} (τ_{u_{h}}^{2} - τ_{v u_{h}})

(21)

and

M S E ({\bar{v}}_{t_{2}}) ≅ \sum_{h = 1}^{L} W_{h}^{2} {\bar{V}}_{h}^{2} (δ_{h} τ_{v_{h}}^{2} + δ_{h}^{″} δ_{u_{h}}^{2} - 2 δ_{h}^{″} τ_{v u_{h}}),

(22)

where

δ_{h}^{'} = (\frac{1}{m_{h}} - \frac{1}{N_{h}})

and

δ_{h}^{″} = (\frac{1}{n_{h}} - \frac{1}{m_{h}}),

respectively.

In the stratified double phase method, the product estimator of

\bar{V}

is defined as

{\bar{v}}_{t_{3}} = \sum_{h = 1}^{L} W_{h} \bar{v_{h}} (\frac{\bar{u_{h}}}{{\bar{u}}_{h}^{'}}) .

(23)

Equations (24) and (25) define the bias and MSE of

{\bar{v}}_{t_{3}},

respectively

B i a s ({\bar{v}}_{t_{3}}) ≅ \sum_{h = 1}^{L} δ_{h}^{″} W_{h} \bar{V_{h}} τ_{v u_{h}}

(24)

and

M S E ({\bar{v}}_{t_{3}}) ≅ \sum_{h = 1}^{L} W_{h}^{2} {\bar{V}}_{h}^{2} (δ_{h} τ_{v_{h}}^{2} + δ_{h}^{″} τ_{u_{h}}^{2} + 2 δ_{h}^{″} τ_{v u_{h}}) .

(25)

In stratified double phase method, the classical regression estimator

{\bar{v}}_{t_{4}}^{d}

is defined as

{\bar{v}}_{t_{4}} = \sum_{h = 1}^{L} W_{h} [{\bar{v}}_{h} + b_{v u_{h}} ({\bar{u}}_{h}^{'} - {\bar{u}}_{h})],

(26)

where

b_{v u_{h}}

is the sample regression coefficient.

Equations (27) and (28) define the bias and MSE of

{\bar{v}}_{t_{4}},

respectively:

B i a s ({\bar{v}}_{t_{4}}) ≅ - \sum_{h = 1}^{L} δ_{h}^{″} W_{h} β_{h} (\frac{μ_{12 h}}{τ_{v u_{h}}} - \frac{μ_{03 h}}{τ_{u_{h}}^{2}})

(27)

and

M S E ({\bar{v}}_{t_{4}}) ≅ \sum_{h = 1}^{L} W_{h}^{2} {\bar{V}}_{h}^{2} τ_{v_{h}}^{2} (δ_{h} - δ_{h}^{″} ρ_{v u_{h}}^{2}),

(28)

where

β_{h} = \frac{τ_{v u_{h}}}{τ_{u_{h}}^{2}}

μ_{r s h} = \frac{\sum_{i = 1}^{N_{h}} {(V_{i h} - \bar{V_{h}})}^{r} {(X_{i h} - \bar{X_{h}})}^{s}}{N_{h}} .

According to [12], an exponential ratio and product estimators are defined as follows:

{\bar{v}}_{t_{5}} = \sum_{h = 1}^{L} W_{h} \bar{v_{h}} exp (\frac{{\bar{u}}_{h}^{'} - {\bar{u}}_{h}}{{\bar{u}}_{h}^{'} + {\bar{u}}_{h}})

(29)

and

{\bar{v}}_{t_{6}} = \sum_{h = 1}^{L} W_{h} \bar{v_{h}} exp (\frac{\bar{u_{h}} - {\bar{u}}_{h}^{'}}{\bar{u_{h}} + {\bar{u}}_{h}^{'}}) .

(30)

Equations (31) and (32) define the bias and MSE of

{\bar{v}}_{t_{5}},

respectively:

M S E ({\bar{v}}_{t_{5}}) ≅ \sum_{h = 1}^{L} W_{h}^{2} {\bar{V}}_{h}^{2} [δ_{h} τ_{v_{h}}^{2} + δ_{h}^{″} τ_{u_{h}}^{2} (\frac{1}{4} - C_{h})]

(31)

and

M S E ({\bar{v}}_{t_{6}}) ≅ \sum_{h = 1}^{L} W_{h}^{2} {\bar{V}}_{h}^{2} [δ_{h} τ_{v_{h}}^{2} + δ_{h}^{″} τ_{u_{h}}^{2} (\frac{1}{4} + C_{h})],

(32)

where

C_{h} = (\frac{ρ_{v u_{h}}}{τ_{u_{h}}}) τ_{v_{h}} .

The double estimator in the stratified two-phase sapling method suggested by [13] is defined as

{\bar{v}}_{t_{7}} ≅ \sum_{h = 1}^{L} {\bar{v}}_{h} \{k_{h} \frac{{\bar{u}}_{h}^{'}}{\bar{u_{h}}} + (1 - k_{h}) \frac{\bar{u_{h}}}{{\bar{u}}_{h}^{'}}\},

(33)

where

k_{h} = \frac{1 + C_{h}}{2} .

Equations (34) and (35) define the bias and MSE of

{\bar{v}}_{t_{7}},

respectively:

B i a s {({\bar{v}}_{t_{7}})}_{min} ≅ \frac{1}{2} \sum_{h = 1}^{L} δ_{h}^{″} W_{h} \bar{V_{h}} τ_{u_{h}}^{2} [1 + C_{h} (1 - 2 C_{h})]

(34)

and

M S E {({\bar{v}}_{t_{7}})}_{min} ≅ \sum_{h = 1}^{L} W_{h}^{2} {\bar{V_{h}}}^{2} τ_{v_{h}}^{2} [δ_{h} (1 - ρ_{v u_{h}}^{2}) + δ_{h}^{'} ρ_{v u_{h}}^{2}] .

(35)

4. Proposed Generalized Estimators

In this section, we offer an improved class of estimators intended for estimating the finite population mean. The new class uses a stratified two-phase sampling strategy that takes into account both the known minimum and maximum values of auxiliary variables in order to increase the accuracy of these estimations. The suggested estimator is defined as follows:

{\bar{v}}_{G_{s t}} = \sum_{h = 1}^{L} W_{h} [g_{1 h} \bar{v_{h}} {(\frac{{\bar{u}}_{h}^{'}}{\bar{u_{h}}})}^{a_{1}} + g_{2 h} {(\frac{{\bar{u}}_{h}^{'}}{\bar{u}})}^{a_{2}}] exp [\frac{t_{1 h} ({\bar{u}}_{h}^{'} - \bar{u_{h}})}{t_{1 h} ({\bar{u}}_{h}^{'} + \bar{u_{h}}) + 2 t_{2 h}}],

(36)

where the scalar quantities (

α_{1}, α_{2}

) can assume the values

(0, - 1, 1)

, and

(g_{1 h}, g_{2 h})

are unknown constants that must be determined to minimize biases and mean squared errors. The parameters of the auxiliary variables are denoted by

t_{1 h},

t_{2 h} .

We further obtain the sub-classes of the proposed estimator from Equation (36), which are detailed in Table 1.

Properties of the Suggested Estimator

We first define all the required error terms before analyzing the bias and mean squared error of the proposed class of estimators. By using these concepts, we can measure the precision and reliability of the estimators while obtaining a better understanding of how well they work in real-life situations.

Let

e_{0 h} = (\frac{\bar{v_{h}} - \bar{V_{h}}}{\bar{V_{h}}}), e_{1 h} = (\frac{\bar{u_{h}} - \bar{U_{h}}}{\bar{U_{h}}}), e_{2 h} = (\frac{{\bar{u}}_{h}^{'} - \bar{U_{h}}}{\bar{U_{h}}})

such that

E (e_{i h}) = 0

,

(i = 0, 1, 2)

.

Also,

E (e_{0 h}^{2}) = δ_{h} τ_{v_{h}}^{2}, E (e_{1 h}^{2}) = δ_{h} τ_{u_{h}}^{2}, E (e_{2 h}^{2}) = δ_{h}^{'} τ_{u_{h}}^{2},

E (e_{0 h} e_{1 h}) = δ_{h} τ_{v u_{h}}, E (e_{0 h} e_{2 h}) = δ_{h}^{'} τ_{v u_{h}}, E (e_{1 h} e_{2 h}) = δ_{h}^{'} τ_{u_{h}}^{2} .

To investigate the properties of the new family of estimators, we rewrite (36) utilizing error terms.

\begin{matrix} {\bar{V}}_{G_{s t}} = \sum_{h = 1}^{L} W_{h} [s_{1 h} \bar{V_{h}} (1 + e_{0 h}) {(1 + e_{1 h})}^{- α_{1}} {(1 + e_{2 h})}^{α_{1}} + s_{2 h} {(1 + e_{1 h})}^{- α_{2}} {(1 + e_{2 h})}^{α_{2}}] \times \\ exp [\frac{t_{3 h} (e_{2 h} - e_{1 h})}{2} {(1 + \frac{t_{3 h}}{2} (e_{2 h} + e_{1 h}))}^{- 1}], \end{matrix}

where

t_{3 h} = \frac{t_{1 h} \bar{U_{h}}}{t_{1 h} \bar{U_{h}} + t_{2 h}} .

Using a first-order Taylor series approximation, we obtain

\begin{matrix} {\bar{v}}_{G_{s t}} - \bar{V} ≅ \sum_{h = 1}^{L} W_{h} [- \bar{V_{h}} + s_{1 h} \bar{V_{h}} 1 + e_{0 h} - e_{1 h} (α_{1} + \frac{t_{3 h}}{2}) + e_{2 h} (α_{1} + \frac{t_{3 h}}{2}) + e_{1 h}^{2} (\frac{α_{1} t_{3 h}}{2} + \frac{3 t_{3 h}^{2}}{8} + \frac{α_{1} (α_{1} + 1)}{2}) \\ + e_{2 h}^{2} (\frac{α_{1} t_{3 h}}{2} - \frac{t_{3 h}^{2}}{8} + \frac{α_{1} (α_{1} - 1)}{2}) - e_{0 h} e_{1 h} (α_{1} + \frac{t_{3 h}}{2}) - e_{0 h} e_{2 h} (α_{1} + \frac{t_{3 h}}{2}) - e_{1 h} e_{2 h} {(α_{1} + \frac{t_{3 h}}{2})}^{2}] \\ + s_{2 h} [1 - e_{1 h} (α_{2} + \frac{t_{3 h}}{2}) + e_{2 h} (α_{2} + \frac{t_{3 h}}{2}) + e_{1 h}^{2} (\frac{α_{2} t_{3 h}}{2} + \frac{3 t_{3 h}^{2}}{8} + \frac{α_{2} (α_{2} + 1)}{2}) \\ + e_{2 h}^{2} (\frac{α_{2} t_{3 h}}{2} - \frac{t_{3 h}^{2}}{8} + \frac{α_{2} (α_{2} - 1)}{2}) - e_{1 h} e_{2 h} {(α_{2} + \frac{t_{3 h}}{2})}^{2}]] . \end{matrix}

(37)

Using (37), the bias of

{\bar{v}}_{G s t}

is given by

B i a s ({\bar{v}}_{G_{s t}}) ≅ \sum_{h = 1}^{L} W_{h} [- \bar{V_{h}} + s_{1 h} \bar{V_{h}} Ω_{3} + s_{2 h} Ω_{5}],

(38)

where

\begin{matrix} Ω_{3} = [1 + δ_{h} \{τ_{u_{h}}^{2} (\frac{4 α_{1} (α_{1} + 1 + t_{3 h}) + 3 t_{3 h}^{2}}{8}) - τ_{v u_{h}} (\frac{2 α_{1} + t_{3 h}}{2})\} \\ + δ_{h}^{'} \{τ_{u_{h}}^{2} \{\frac{- 4 α_{1} (α_{1} + t_{3 h} + 1) - 3 t_{3 h}^{2}}{2}\} + τ_{v u_{h}} (\frac{2 α_{1} + t_{3 h}}{2})\}] \end{matrix}

and

Ω_{5} = [1 + δ_{h} τ_{u_{h}}^{2} (\frac{4 α_{2} (α_{2} + t_{3 h} + 1) + 3 t_{3 h}^{2}}{8}) + δ_{h}^{'} τ_{u_{h}}^{2} (\frac{- 2 α_{2} (α_{2} + t_{3 h} - 1) - 3 t_{3 h}^{2}}{4})] .

The first-order mean squared error

(M S E)

is obtained by squaring both sides of (37) and applying the expected value, as described below.

M S E ({\bar{v}}_{G_{s t}}) ≅ \sum_{h = 1}^{L} W_{h}^{2} [{\bar{V_{h}}}^{2} + s_{1 h}^{2} {\bar{V_{h}}}^{2} Ω_{1} + s_{2 h}^{2} Ω_{2} - 2 s_{1 h} {\bar{V_{h}}}^{2} Ω_{3} - 2 s_{2 h} \bar{V_{h}} Ω_{5} + 2 S_{1 h} s_{2 h} \bar{V_{h}} Ω_{4}],

(39)

where

\begin{matrix} Ω_{1} = [1 + δ_{h} \{τ_{v_{h}}^{2} + τ_{u_{h}}^{2} \{{(α_{1} + \frac{t_{3 h}}{2})}^{2} + (α_{1} t_{3 h} + \frac{3 t_{3 h}^{2}}{4} + \frac{α_{1} (α_{1} + 1)}{2})\} - 4 τ_{v u_{h}} (α_{1} + \frac{t_{3 h}}{2})\} \\ + δ_{h}^{'} \{τ_{u_{h}}^{2} \{{(α_{1} + \frac{t_{3 h}}{2})}^{2} + (α_{1} t_{3 h} - \frac{t_{3 h}^{2}}{4} + α_{1} (α_{1} - 1)) - 4 (α_{1} + \frac{t_{3 h}}{2})\} + 4 τ_{v u_{h}} (α_{1} + \frac{t_{3 h}}{2})\}], \end{matrix}

\begin{matrix} Ω_{2} = [1 + δ_{h} τ_{u_{h}}^{2} \{{(α_{2} + \frac{t_{3 h}}{2})}^{2} + (α_{2} t_{3 h} + \frac{3 t_{3 h}^{2}}{4} + α_{2} (α_{2} + 1))\} + δ_{h}^{'} τ_{u_{h}}^{2} \{{(α_{2} + \frac{t_{3 h}}{2})}^{2} \\ + (α_{2} t_{3 h} - \frac{t_{3 h}^{2}}{4} + α_{2} (α_{2} - 1)) - 4 {(α_{2} + \frac{t_{3 h}}{2})}^{2}\}] \end{matrix}

and

\begin{matrix} Ω_{4} = [1 + δ_{h} \{τ_{u_{h}}^{2} \{(\frac{α_{1} t_{3 h}}{2} + \frac{3 t_{3 h}^{2}}{8} + \frac{α_{1} (α_{1} + 1)}{2}) + (α_{1} + \frac{t_{3 h}}{2}) (α_{2} + \frac{t_{3 h}}{2}) + (\frac{α_{2} t_{3 h}}{2} + \frac{3 t_{3 h}^{2}}{8} + \frac{α_{2} (α_{2} + 1)}{2}) \\ - τ_{v u_{h}} (α_{1} + α_{2} + t_{3 h})\} + δ_{h}^{'} \{τ_{u_{h}}^{2} \{(\frac{α_{1} t_{3 h}}{2} - \frac{t_{3 h}^{2}}{8} + \frac{α_{1} (α_{1} - 1)}{2}) - (α_{1} + \frac{t_{3 h}}{2}) (α_{2} + \frac{t_{3 h}}{2}) \\ + (\frac{α_{2} t_{3 h}}{2} - \frac{t_{3 h}^{2}}{8} + \frac{α_{2} (α_{2} - 1)}{2}) - {(α_{1} + \frac{t_{3 h}}{2})}^{2} - {(α_{2} + \frac{t_{3 h}}{2})}^{2}\} + τ_{v u_{h}} (α_{1} + α_{2} + t_{3 h}) . \end{matrix}

The following results are obtained by minimizing Equation (39) to obtain the optimal values of the unknown constants

s_{1 h}

and

s_{2 h}

, which are given as follows:

s_{1 h (o p t)} = \frac{Ω_{2} Ω_{3} - Ω_{4} Ω_{5}}{Ω_{1} Ω_{2} - Ω_{4}^{2}}

and

s_{2 h (o p t)} = \frac{\bar{V_{h}} (Ω_{1} Ω_{5} - Ω_{3} Ω_{4})}{Ω_{1} Ω_{2} - Ω_{4}^{2}} .

To attain a minimum bias and

M S E

for

{\bar{v}}_{G s t}

, the optimal values of

s_{1 h}

and

s_{2 h}

are substituted into (39), which gives the following:

B i a s {({\bar{v}}_{G_{s t}})}_{m i n} ≅ - \sum_{h = 1}^{L} W_{h} \bar{V_{h}} [1 - \frac{(Ω_{1} Ω_{5}^{2} + Ω_{2} Ω_{3}^{2} - 2 Ω_{3} Ω_{4} Ω_{5})}{Ω_{1} Ω_{2} - Ω_{5}^{2}}]

(40)

and

M S E {({\bar{v}}_{G_{s t}})}_{m i n} ≅ \sum_{h = 1}^{L} W_{h}^{2} {\bar{V}}_{h}^{2} [1 - \frac{(Ω_{1} Ω_{5}^{2} + Ω_{2} Ω_{3}^{2} - 2 Ω_{3} Ω_{4} Ω_{5})}{Ω_{1} Ω_{2} - Ω_{5}^{2}}] .

(41)

5. Mathematical Comparison

This section compares the new proposed family of estimators

{\bar{v}}_{G_{s t}}

to existing estimators, such as

{\bar{v}}_{t_{1}}

,

{\bar{v}}_{t_{2}}

,

{\bar{v}}_{t_{3}}

,

{\bar{v}}_{t_{4}}

,

{\bar{v}}_{t_{5}},

and

{\bar{v}}_{t_{6}}

.

Condition (i): A comparison of the estimators stated in (19) and (41)

$V a r ({\bar{v}}_{t_{1}}) > M S E {({\bar{v}}_{G_{s t}})}_{m i n} if \sum_{h = 1}^{L} W_{h}^{2} [δ_{h} τ_{v_{h}}^{2} - 1 + \frac{Ω_{1} Ω_{5}^{2} + Ω_{2} Ω_{3}^{2} - 2 Ω_{3} Ω_{4} Ω_{5}}{Ω_{1} Ω_{2} - Ω_{5}^{2}}] > 0 .$
Condition (ii): A comparison of the estimators stated in (22) and (41)

$M S E ({\bar{v}}_{t_{2}}) > M S E {({\bar{v}}_{G_{s t}})}_{m i n} if \sum_{h = 1}^{L} W_{h}^{2} [δ_{h} τ_{v_{h}}^{2} + δ_{h}^{″} τ_{u_{h}}^{2} - 2 δ_{h}^{″} τ_{v u_{h}} - 1 + \frac{Ω_{1} Ω_{5}^{2} + Ω_{2} Ω_{3}^{2} - 2 Ω_{3} Ω_{4} Ω_{5}}{Ω_{1} Ω_{2} - Ω_{5}^{2}}] > 0 .$
Condition (iii): A comparison of the estimators stated in (28) and (41)

$M S E ({\bar{v}}_{t_{3}}) > M S E {({\bar{v}}_{G_{s t}})}_{m i n} if \sum_{h = 1}^{L} W_{h}^{2} [τ_{v_{h}}^{2} (δ_{h} - δ_{h}^{″} ρ_{v u_{h}}^{2}) - 1 + \frac{Ω_{1} Ω_{5}^{2} + Ω_{2} Ω_{3}^{2} - 2 Ω_{3} Ω_{4} Ω_{5}}{Ω_{1} Ω_{2} - Ω_{5}^{2}}] > 0 .$
Condition (iv): A comparison of the estimators stated in (31) and (41)

$M S E ({\bar{v}}_{t_{4}}) > M S E {({\bar{v}}_{G_{s t}})}_{m i n} if \sum_{h = 1}^{L} W_{h}^{2} [δ_{h} τ_{v_{h}}^{2} + δ_{h}^{″} τ_{u_{h}}^{2} (\frac{1}{4} - C_{h}) - 1 + \frac{Ω_{1} Ω_{5}^{2} + Ω_{2} Ω_{3}^{2} - 2 Ω_{3} Ω_{4} Ω_{5}}{Ω_{1} Ω_{2} - Ω_{5}^{2}}] > 0 .$
Condition (v): A comparison of the estimators stated in (32) and (41)

$M S E ({\bar{y}}_{t_{5}}) > M S E {({\bar{v}}_{G_{s t}})}_{m i n} if \sum_{h = 1}^{L} W_{h}^{2} [δ_{h} τ_{v_{h}}^{2} + δ_{h}^{″} τ_{u_{h}}^{2} (\frac{1}{4} + C_{h}) - 1 + \frac{Ω_{1} Ω_{5}^{2} + Ω_{2} Ω_{3}^{2} - 2 Ω_{3} Ω_{4} Ω_{5}}{Ω_{1} Ω_{2} - Ω_{5}^{2}}] > 0 .$
Condition (vi): A comparison of the estimators stated in (35) and (41)

$M S E {({\bar{v}}_{t_{6}})}_{min} > M S E {({\bar{v}}_{G_{s t}})}_{m i n} if \sum_{h = 1}^{L} W_{h}^{2} [τ_{v_{h}}^{2} (δ_{h} - δ_{h}^{″} ρ_{v u h}^{2}) - 1 + \frac{Ω_{1} Ω_{5}^{2} + Ω_{2} Ω_{3}^{2} - 2 Ω_{3} Ω_{4} Ω_{5}}{Ω_{1} Ω_{2} - Ω_{5}^{2}}] > 0 .$

6. Results and Discussion

This section provides a comprehensive evaluation by comparing the new proposed family of estimators with the existing estimators in terms of MSE and PRE. Both simulated data and three different real datasets are used in this comparison. Our goal is to provide a comprehensive evaluation of the effectiveness and dependability of the suggested class of estimators in a range of practical scenarios by examining the MSE and PRE on these simulated and distinct datasets.

6.1. Simulation Study

To verify the theoretical conclusions presented in Section 5, we conduct a simulation study. In this process, the variable X is artificially produced for different populations, each based on one of the probability distributions listed below:

Population 1: $X \sim U n i f o r m (k_{1} = 3, k_{2} = 5),$
Population 2: $X \sim U n i f o r m (k_{1} = 4, k_{2} = 7),$
Population 3: $X \sim E x p o n e n t i a l (θ = 6),$
Population 4: $X \sim E x p o n e n t i a l (θ = 4)$ ,
Population 5: $X \sim G a m m a (η_{1} = 2, η_{2} = 5),$
Population 6: $X \sim G a m m a (η_{1} = 7, η_{2} = 12),$
Population 7: $X \sim W e i b u l l (λ_{1} = 2, λ_{2} = 1.5),$
Population 8: $X \sim W e i b u l l (λ_{1} = 3, λ_{2} = 0.8) .$

The dependent variable Y is subsequently calculated as

Y = r_{y x} \times X + e,

where

r_{y x} = 0.80

indicates the correlation coefficient between the dependent and independent variables, and

e \sim N (0, 1)

is the error term.

Using R-Software (latest v. 4.4.0), we developed the following procedures to determine the percent relative efficiencies (PREs) and mean squared errors (MSEs) of the suggested class of estimators:

Step 1: First, we start by generating a population of 1500 observations using the particular probability distributions defined above.
Step 2: Apply the simple random sampling without replacement (SRSWOR) approach to obtain a first-phase sample of size $\overset{´}{n_{h}}$ from a population of size $N_{h}$ .
Step 3: Determine the second-phase sample size $n_{h}$ from the first-phase sample using the SRSWOR approach.
Step 4: Using the information from the previous phases, we estimate the population total, the auxiliary variable extreme, and the extreme values of the ranks of the auxiliary variable.
Step 5: We employ the (SRSWOR) technique to provide different sample sizes for each stratum in each population. The given sample sizes are 15%, 30%, and 45%.
Step 6: Using all of the estimators described in this article, the mean squared errors and percent relative efficiencies values can be determined for each sample size. This step makes sure that each estimator’s $M S E s$ and $P R E s$ are examined for a set of sample sizes.
Step 7: After this, steps 5 and 6 are carried out 50,000 times to guarantee the reliability of the findings. A thorough study of the estimators’ performance under simulated datasets can be found in Table 2 and Table 3, which also display the results for simulated populations.
Step 8: Finally, one can obtain the MSEs and PREs for each estimator over all replications using the following formulas:

$M S E {({\bar{v}}_{R})}_{min} = \frac{\sum_{k = 1}^{50000} {({\bar{v}}_{R k} - \bar{V})}^{2}}{50000}$

and

$P R E = \frac{V a r ({\bar{v}}_{t_{1}})}{M S E {({\bar{v}}_{R})}_{min}} \times 100,$

where R is one of $t_{2}, t_{3}, t_{4}, t_{5}, t_{6}, G_{i} (i = 1, 2, \dots, 8) .$

6.2. Numerical Examples

In order to check the performance of various estimators, we computed the mean squared errors (MSEs) and percent relative efficiencies (PREs) using the three distinct datasets. The objective is to assess the performance of the suggested family of estimators. We describe the datasets in detail below, along with summary statistics:

Data 1. This dataset, which was compiled in Pakistan in 2012 and contains data from various divisions, was taken from page 135 of the Bureau of Statistics [30]. The Pakistan Bureau of Statistics website provides a download link: https://www.pbs.gov.pk/content/microdata (accessed on 1 November 2024). The variables contained in the dataset are described as follows:
Y: This variable provides the total number of students enrolled in schools in 2012. It maintains records of all the registered students across all schools.
X: The total number of public elementary and secondary schools in 2012.

A summary of dataset 1 is given in Table 4. The data are categorized into two groups. The groups are set up as follows:

Group 1: Consists of the divisions of Lahore, Sargodha, Gujranwala, and Rawalpindi; these divisions reflect areas with perhaps different student populations and educational facilities.
Group 2: Consists of the departments of Multan, Bahawalpur, Faisalabad, D.G. Khan, and Sahiwal. These divisions have distinct educational features from Group 1 and provide a varied viewpoint on the dynamics of regional education.
Data 2. This population, which consists information from different divisions, was selected from page 226 [30] of the Bureau of Statistics, collected in Pakistan in 2012. It is accessible for download on the Pakistan Bureau of Statistics website using the following URL: https://www.pbs.gov.pk/content/microdata (accessed on 1 November 2024).
Y: The variable Y represents the employment levels that were reported throughout the various departments in the year 2012, and it serves as a measure of the distribution of the workforce.
X: This refers to the number of factories that were officially registered by the departments in the year 2012, which is an indication of the level of industrial activity.

The data are divided into two distinct groups to simplify comparison analysis, with the descriptive statistics for dataset 2 shown in Table 5.

Group 1: Includes the divisions of Sargodha, Gujranwala, Rawalpindi, and Lahore, which represent some of the more industrialized and heavily inhabited districts.
Group 2: Comprises the divisions of Multan, Bahawalpur, Faisalabad, D.G. Khan, and Sahiwal, including areas noted for agricultural activities as well as expanding industrial sectors.
Data 3. This dataset, obtained from page 24 [1], provides statistics on food costs and the weekly income of households.
Y: Indicates the family’s food expenditure, which is connected to their working status and shows how food prices can change depending on their work environment.
$X$ : Represents the family’s weekly income, which is an important reflection of their financial situation.

For analysis, the dataset is divided into two groups; Table 6 presents the descriptive statistics for dataset 3.

Finally, we use the following formula to calculate the percent relative efficiency (PREs) for different datasets:

P R E = \frac{V a r ({\bar{v}}_{t_{1}})}{M S E ({\bar{v}}_{K})} \times 100,

where K is one of

t_{2}, t_{3}, t_{4}, t_{5}, t_{6}, G_{i} (i = 1, 2, \dots, 8) .

7. Discussion

We conducted simulated analysis and investigated three datasets to verify the usefulness of the suggested class of estimators. We compared several estimators using the percent relative efficiency (PRE) criterion. Table 2 and Table 3 provide the outcomes of the simulation research, including the PRE values for the suggested and existing estimators. In the meantime, the outcomes determined from the real datasets are displayed in Table 7 and Table 8. The findings from these studies lead us to the following general conclusions:

The mean squared error values for all suggested estimators are consistently smaller compared to those of the estimators discussed in Section 3, comprising all simulated situations and real datasets, as shown in Table 2 and Table 7. It is evident from this that the suggested estimators are more precise than existing estimators.
Additionally, it can be shown from Table 3 and Table 8 that every suggested estimator has a $(P R E)$ value that is better than the existing estimators. This demonstrates that the performance of the suggested class of estimators is superior to that of the existing estimators.

7.1. Limitations and Practical Challenges

The proposed estimator demonstrates significant improvements in efficiency, especially in handling outliers and extreme values. However, several challenges and constraints require attention:

7.1.1. Impact of Small Sample Sizes

The proposed estimator performs well with sufficiently large sample sizes, while smaller samples may increase variability and bias. The real-life datasets used in this study are substantial, minimizing this issue in our analysis. However, this limitation could affect scenarios with smaller datasets or restricted data collection. Ensuring adequate sample sizes is essential for reliable application.

7.1.2. Sensitivity to Highly Skewed Populations

As the estimator considers extreme values, its robustness may decrease in populations characterized by significant skewed-ness or heavy-tailed distributions. These conditions can introduce variability that affects stability. Techniques such as trimming or applying reduced weights to extreme values could enhance reliability in these cases.

7.1.3. Dependence on Auxiliary Data

The effectiveness of the estimator relies on the availability and quality of auxiliary variables that are strongly correlated with the target variable. If such auxiliary data are limited or weakly correlated, the efficiency of the estimator may decline. Employing alternative auxiliary variables or conducting sensitivity analyses can help address this issue.

8. Conclusions

This study proposed a new class of estimators for estimating the finite population mean under a stratified double-phase sampling framework. The approach utilizes the minimum and maximum values of auxiliary variables to improve the precision of estimates, addressing the limitations of traditional methods, which often ignore outliers. Theoretical properties of the proposed estimators were derived, highlighting their potential for increased accuracy. These were validated through simulation studies, where different artificial populations were generated to examine performance under different conditions. The findings demonstrate that the suggested estimators continuously provide efficient results as compared to existing estimators in terms of percent relative efficiency (PER), as Table 3 shows. The evaluation using real datasets given in Table 8 validated the theoretical findings and highlighted the flexibility of the estimators across various practical applications, including educational and industrial surveys. This improvement was observed across both simulated data and real-world datasets, where the new estimators demonstrated their practical applicability in handling outliers and complex data structures.

The suggested class of estimators

{\bar{v}}_{i}

(i = G_{1}, G_{2}, \dots, G_{8})

obviously provides better results as compared to other methods under consideration, as demonstrated by the simulation results and the analysis from different datasets. Among the suggested estimators,

{\bar{v}}_{G_{8}}

proved to be the most efficient, yielding the lowest MSE and highest PRE across all scenarios.

Moreover, this study emphasizes the importance of using extreme value information within stratified double phase sampling designs to achieve better estimation accuracy. Future studies can focus on developing improved methods to further minimize MSE and applying these techniques to other sampling frameworks, enabling their use in fields like public health, education, and environmental surveys.

Author Contributions

Conceptualization, A.S.A. and H.A.; Methodology, A.S.A. and H.A.; Software, A.S.A. and H.A.; Validation, A.S.A. and H.A.; Formal analysis, A.S.A. and H.A.; Investigation, A.S.A. and H.A.; Resources, A.S.A. and H.A.; Data curation, A.S.A. and H.A.; Writing—original draft, A.S.A. and H.A.; Writing—review & editing, A.S.A. and H.A.; Visualization, A.S.A. and H.A.; Supervision, H.A.; Project administration, A.S.A. and H.A.; Funding acquisition, H.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number NBU-FPEJ-2025-1107-02.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors extend their appreciation to the Deanship of Scientific Research at Northern Border University, Arar, KSA for funding this research work through the project number NBU-FPEJ-2025-1107-02.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]
Khoshnevisan, M.; Singh, R.; Chauhan, P.; Sawan, N. A general family of estimators for estimating population mean using known value of some population parameter (s). Far East J. Theor. Stat. 2007, 22, 181–191. [Google Scholar]
Rueda, M.M.; Arcos, A.; Martınez-Miranda, M.D.; Román, Y. Some improved estimators of finite population quantile using auxiliary information in sample surveys. Comput. Stat. Data Anal. 2004, 45, 825–848. [Google Scholar] [CrossRef]
Särndal, C.E. Sample survey theory vs. general statistical theory: Estimation of the population mean. Int. Stat. Rev. Int. Stat. 1972, 40, 1–12. [Google Scholar] [CrossRef]
Tarima, S.; Pavlov, D. Using auxiliary information in statistical function estimation. ESAIM Probab. Stat. 2006, 10, 11–23. [Google Scholar] [CrossRef]
Sukhatme, B.V. Some ratio-type estimators in two-phase sampling. J. Am. Stat. Assoc. 1962, 57, 628–632. [Google Scholar] [CrossRef]
Erinola, A.Y.; Singh, R.V.K.; Audu, A.; James, T. Modified class of estimator for finite population mean under two-phase sampling using regression estimation approach. Asian J. Probab. Stat. 2021, 4, 52–64. [Google Scholar] [CrossRef]
Garg, N.; Srivastava, M. A general class of estimators of a finite population mean using multi-auxiliary information under two stage sampling scheme. J. Reliab. Stat. Stud. 2009, 2, 103–118. [Google Scholar]
Guha, S.; Chandra, H. Improved estimation of finite population mean in two-phase sampling with subsampling of the nonrespondents. Math. Popul. Stud. 2021, 28, 24–44. [Google Scholar] [CrossRef]
Kamal, A.; Amir, N.; Dastagir, H. Some exponential type predictive estimators of finite population mean in two-phase sampling. J. Stat. Comput. Interdiscip. Res. 2020, 2, 51–57. [Google Scholar] [CrossRef]
Khare, B.B.; Khare, S. Generalized synthetic estimator for domain mean in two phase sampling using single auxiliary character. J. Reliab. Stat. Stud. 2019, 12, 139–151. [Google Scholar]
Singh, H.P.; Vishwakarma, G.K. Modified exponential ratio and product estimators for finite population mean in double sampling. Austrian J. Stat. 2007, 36, 217–225. [Google Scholar] [CrossRef]
Singh, H.P.; Espeio, M.R. Double sampling ratio-product estimator of a finite population mean in sample surveys. J. Appl. Stat. 2007, 34, 71–85. [Google Scholar] [CrossRef]
Vishwakarma, G.K.; Zeeshan, S.M. Generalized ratio-cum-product estimator for finite population mean under two-phase sampling scheme. J. Mod. Appl. Stat. Methods 2020, 19, 1–16. [Google Scholar] [CrossRef]
Zaman, T.; Kadilar, C. New class of exponential estimators for finite population mean in two-phase sampling. Commun. Stat.-Theory Methods 2021, 50, 874–889. [Google Scholar] [CrossRef]
Mohanty, S.; Sahoo, J. A note on improving the ratio method of estimation through linear transformation using certain known population parameters. Sankhyā Indian J. Stat. Ser. B 1995, 57, 93–102. [Google Scholar]
Khan, M.; Shabbir, J. Some improved ratio, product, and regression estimators of finite population mean when using minimum and maximum values. Sci. World J. 2013, 2013, 431868. [Google Scholar] [CrossRef] [PubMed]
Khan, M. Improvement in estimating the finite population mean under maximum and minimum values in double sampling scheme. J. Stat. Appl. Probab. Lett. 2015, 2, 115–121. [Google Scholar]
Walia, G.S.; Kaur, H.; Sharma, M. Ratio type estimator of population mean through efficient linear transformation. Am. J. Math. Stat. 2015, 5, 144–149. [Google Scholar]
Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 1–15. [Google Scholar] [CrossRef]
Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1899402. [Google Scholar] [CrossRef]
Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
Cekim, H.O.; Cingi, H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacet. J. Math. Stat. 2017, 46, 685–694. [Google Scholar]
Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon 2024, 10, e33402. [Google Scholar] [CrossRef]
Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry 2024, 16, 957. [Google Scholar] [CrossRef]
Dawoud, I.; Awwad, F.A.; Tag Eldin, E.; Abonazel, M.R. New Robust Estimators for Handling Multicollinearity and Outliers in the Poisson Model: Methods, Simulation and Applications. Axioms 2022, 11, 612. [Google Scholar] [CrossRef]
Bhushan, S.; Kumar, A.; Alsadat, N.; Mustafa, M.S.; Alsolmi, M.M. Some Optimal Classes of Estimators Based on Multi-Auxiliary Information. Axioms 2023, 12, 515. [Google Scholar] [CrossRef]
Ullah, A.; Shabbir, J.; Alomair, A.M.; Alomair, M.A. Ratio-type estimator for estimating the neutrosophic population mean in simple random sampling under intuitionistic fuzzy cost function. Axioms 2023, 12, 890. [Google Scholar] [CrossRef]
Bureau of Statistics. Punjab Development Statistics; Government of the Punjab: Lahore, Pakistan, 2013.

Table 1. Some classes of proposed estimator in stratified two-phase sampling.

Subsets of ${\bar{v}}_{G_{st}}^{d}$	$α_{1}$	$α_{2}$	$t_{1 h}$	$t_{2 h}$
${\bar{v}}_{G_{1}} = \sum_{h = 1}^{L} W_{h} [g_{1 h} \bar{v_{h}} (\frac{{\bar{u}}_{h}^{'}}{\bar{u_{h}}}) + g_{2 h} (\frac{\bar{u_{h}}}{{\bar{u}}_{h}^{'}})] K_{h}$	1	−1	$- β_{2 (u_{h})}$	$u_{M_{h}} - u_{m_{h}}$
${\bar{v}}_{G_{2}} = \sum_{h = 1}^{L} W_{h} [g_{1 h} \bar{v_{h}} (\frac{\bar{u_{h}}}{{\bar{u}}_{h}^{'}}) + g_{2 h} (\frac{{\bar{u}}_{h}^{'}}{\bar{u_{h}}})] K_{h}$	−1	1	$- {\hat{τ}}_{u_{h}}$	$u_{M_{h}} - u_{m_{h}}$
${\bar{v}}_{G_{3}} = \sum_{h = 1}^{L} W_{h} [g_{1 h} \bar{v_{h}} (\frac{\bar{u_{h}}}{{\bar{u}}_{h}^{'}}) + g_{2 h} (\frac{\bar{u_{h}}}{{\bar{u}}_{h}^{'}})] K_{h}$	−1	−1	$u_{M_{h}} - u_{m_{h}}$	$- {\hat{τ}}_{u_{h}}$
${\bar{v}}_{G_{4}} = \sum_{h = 1}^{L} W_{h} [g_{1 h} \bar{v_{h}} + g_{2 h} (\frac{{\bar{u}}_{h}^{'}}{\bar{u_{h}}})] K_{h}$	0	1	$u_{M_{h}} - u_{m_{h}}$	$β_{2 (u_{h})}$
${\bar{v}}_{G_{5}} = \sum_{h = 1}^{L} W_{h} [g_{1 h} \bar{v_{h}} (\frac{{\bar{u}}_{h}^{'}}{\bar{u_{h}}}) + g_{2 h} (\frac{{\bar{u}}_{h}^{'}}{\bar{u_{h}}})] K_{h}$	1	1	$u_{M_{h}} - u_{m_{h}}$	$- β_{2 (u_{h})}$
${\bar{v}}_{G_{6}} = \sum_{h = 1}^{L} W_{h} [g_{1 h} \bar{v_{h}} + g_{2 h} (\frac{\bar{u_{h}}}{{\bar{u}}_{h}^{'}})] K_{h}$	0	−1	$β_{2 (u_{h})}$	$u_{M_{h}} - u_{m_{h}}$
${\bar{v}}_{G_{7}} = \sum_{h = 1}^{L} W_{h} [g_{1 h} \bar{v_{h}} (\frac{{\bar{u}}_{h}^{'}}{\bar{u_{h}}}) + g_{2 h}] K_{h}$	1	0	$u_{M_{h}} - u_{m_{h}}$	${\hat{τ}}_{u_{h}}$
${\bar{v}}_{G_{8}} = \sum_{h = 1}^{L} W_{h} [g_{1 h} \bar{v_{h}} (\frac{\bar{u_{h}}}{{\bar{u}}_{h}^{'}}) + g_{2 h}] K_{h}$	−1	0	${\hat{τ}}_{u_{h}}$	$u_{M_{h}} - u_{m_{h}}$

where

K_{h} = exp [\frac{t_{1 h} ({\bar{u_{h}}}^{'} - \bar{u_{h}})}{t_{1 h} ({\bar{u_{h}}}^{'} + \bar{u_{h}}) + 2 t_{2 h}}] .

Table 2. MSE of different estimators using simulation populations.

Estimator	Data-1	Data-2	Data-3	Data-4	Data-5	Data-6	Data-7	Data-8
${\bar{v}}_{t_{1}}$	9.18 × $10^{- 3}$	8.85 × $10^{- 3}$	3.96 × $10^{- 2}$	4.05 × $10^{- 2}$	7.15 × $10^{- 3}$	8.05 × $10^{- 3}$	6.92 × $10^{- 3}$	7.452× $10^{- 3}$
${\bar{v}}_{t_{2}}$	8.62 × $10^{- 3}$	8.40 × $10^{- 3}$	3.05 × $10^{- 2}$	3.15 × $10^{- 2}$	5.12 × $10^{- 3}$	6.00 × $10^{- 3}$	5.972× $10^{- 3}$	6.50 × $10^{- 3}$
${\bar{v}}_{t_{3}}$	7.95 × $10^{- 3}$	8.25 × $10^{- 3}$	2.08 × $10^{- 2}$	2.48 × $10^{- 2}$	4.69 × $10^{- 3}$	5.35 × $10^{- 3}$	5.58 × $10^{- 3}$	5.95 × $10^{- 3}$
${\bar{v}}_{t_{4}}$	7.75 × $10^{- 3}$	7.80 × $10^{- 3}$	2.65 × $10^{- 2}$	2.29 × $10^{- 2}$	3.40 × $10^{- 3}$	4.05 × $10^{- 3}$	4.10 × $10^{- 3}$	4.58 × $10^{- 3}$
${\bar{v}}_{t_{5}}$	8.15 × $10^{- 3}$	8.00 × $10^{- 3}$	3.50 × $10^{- 2}$	3.75 × $10^{- 2}$	6.58 × $10^{- 3}$	6.95 × $10^{- 3}$	5.47 × $10^{- 3}$	5.95 × $10^{- 3}$
${\bar{v}}_{t_{6}}$	7.02 × $10^{- 3}$	7.45 × $10^{- 3}$	3.10 × $10^{- 2}$	3.05 × $10^{- 2}$	3.20 × $10^{- 3}$	3.88 × $10^{- 3}$	3.55 × $10^{- 3}$	3.88 × $10^{- 3}$
${\bar{v}}_{G_{1}}$	8.70 × $10^{- 4}$	8.45 × $10^{- 4}$	3.95 × $10^{- 3}$	5.20 × $10^{- 3}$	8.05 × $10^{- 4}$	8.45 × $10^{- 4}$	7.60 × $10^{- 4}$	7.92 × $10^{- 4}$
${\bar{v}}_{G_{2}}$	6.10 × $10^{- 4}$	7.45 × $10^{- 4}$	3.35 × $10^{- 3}$	4.55 × $10^{- 3}$	3.95 × $10^{- 4}$	4.35 × $10^{- 4}$	5.05 × $10^{- 4}$	5.30× $10^{- 4}$
${\bar{v}}_{G_{3}}$	6.45 × $10^{- 4}$	7.65 × $10^{- 4}$	4.80 × $10^{- 3}$	5.50 × $10^{- 3}$	6.50 × $10^{- 4}$	6.92 × $10^{- 4}$	5.00 × $10^{- 4}$	5.48 × $10^{- 4}$
${\bar{v}}_{G_{4}}$	5.10 × $10^{- 4}$	5.90 × $10^{- 4}$	4.30 × $10^{- 3}$	5.05 × $10^{- 3}$	6.80 × $10^{- 4}$	7.25 × $10^{- 4}$	4.30 × $10^{- 4}$	4.58 × $10^{- 4}$
${\bar{v}}_{G_{5}}$	5.40 × $10^{- 4}$	6.25 × $10^{- 4}$	3.80 × $10^{- 3}$	3.20 × $10^{- 3}$	3.25 × $10^{- 4}$	3.78 × $10^{- 4}$	4.55 × $10^{- 4}$	4.88 × $10^{- 4}$
${\bar{v}}_{G_{6}}$	4.25 × $10^{- 4}$	5.00 × $10^{- 4}$	2.55 × $10^{- 3}$	3.15 × $10^{- 3}$	2.65 × $10^{- 4}$	3.28 × $10^{- 4}$	3.05 × $10^{- 4}$	3.28 × $10^{- 4}$
${\bar{v}}_{G_{7}}$	4.60 × $10^{- 4}$	5.40 × $10^{- 4}$	2.40 × $10^{- 3}$	2.30 × $10^{- 3}$	2.40 × $10^{- 4}$	2.90 × $10^{- 4}$	3.45 × $10^{- 4}$	3.65 × $10^{- 4}$
${\bar{v}}_{G_{8}}$	2.90 × $10^{- 4}$	3.25 × $10^{- 4}$	1.15 × $10^{- 3}$	1.70 × $10^{- 3}$	2.05 × $10^{- 4}$	2.48 × $10^{- 4}$	2.45 × $10^{- 4}$	2.62 × $10^{- 4}$

Table 3. PRE of different estimators using simulation populations.

Estimator	Data-1	Data-2	Data-3	Data-4	Data-5	Data-6	Data-7	Data-8
${\bar{v}}_{t_{1}}$	100.00	100.00	100.00	100.00	100.00	100.00	100.00	100.00
${\bar{v}}_{t_{2}}$	106.49	105.07	129.83	128.57	139.64	132.87	129.50	124.50
${\bar{v}}_{t_{3}}$	115.53	107.27	190.38	163.31	152.48	145.76	141.24	134.32
${\bar{v}}_{t_{4}}$	118.45	113.46	149.43	176.99	208.82	189.32	189.10	175.32
${\bar{v}}_{t_{5}}$	112.64	111.87	113.14	109.06	109.19	145.53	136.42	132.43
${\bar{v}}_{t_{6}}$	130.83	119.73	127.74	133.12	225.11	211.54	222.54	211.20
${\bar{v}}_{G_{1}}$	286.78	604.73	253.16	212.65	243.57	267.14	282.89	271.12
${\bar{v}}_{G_{2}}$	410.73	324.20	319.35	239.87	491.56	462.12	422.43	396.54
${\bar{v}}_{G_{3}}$	389.19	314.87	221.58	201.82	298.51	271.18	427.60	398.90
${\bar{v}}_{G_{4}}$	489.52	407.33	250.84	217.77	285.94	261.34	492.32	468.30
${\bar{v}}_{G_{5}}$	465.64	386.71	263.08	339.89	589.72	523.74	465.99	432.45
${\bar{v}}_{G_{6}}$	584.34	484.15	402.98	348.93	729.62	698.34	700.21	645.18
${\bar{v}}_{G_{7}}$	538.94	446.71	423.96	468.63	797.84	726.12	620.44	578.32
${\bar{v}}_{G_{8}}$	852.26	741.82	869.51	630.85	922.98	898.27	868.23	810.20

Table 4. Summary statistics for data 1.

$N_{1} = 18$	$n_{1} = 5$	$m_{1} = 9$	$W_{1} = 0.500$	$\bar{U_{1}} = 962$
$\bar{V_{1}} = 162979$	$U_{M_{1}} = 1530$	$U_{m_{1}} = 388$	$ω_{u_{1}} = 308$	$ω_{v_{1}} = 255887$
$τ_{u_{1}} = 0.320$	$τ_{v_{1}} = 1.571$	$ρ_{v_{1} u_{1}} = 0.145$	$δ_{1}^{'} = 0.144$	$δ_{1}^{″} = 0.056$
$N_{2} = 18$	$n_{2} = 5$	$m_{2} = 9$	$W_{2} = 0.500$	$\bar{U_{2}} = 1146$
$\bar{V_{2}} = 134458$	$U_{M_{2}} = 2370$	$U_{m_{2}} = 58$	$ω_{u_{2}} = 469.931$	$ω_{v_{2}} = 50236$
$τ_{u_{2}} = 0.409$	$τ_{v_{2}} = 0.374$	$ρ_{v_{2} u_{2}} = 0.787$	$δ_{2}^{'} = 0.144$	$δ_{2}^{″} = 0.056$

Table 5. Summary statistics for data 2.

$N_{1} = 18$	$n_{1} = 5$	$m_{1} = 9$	$W_{1} = 0.500$	$\bar{U_{1}} = 415$
$\bar{V_{1}} = 85572$	$U_{M_{1}} = 2055$	$U_{m 1} = 24$	$ω_{u_{1}} = 521.675$	$ω_{v_{1}} = 248216$
$τ_{u_{1}} = 1.258$	$τ_{v_{1}} = 2.901$	$ρ_{v_{1} u_{1}} = 0.337$	$δ_{1}^{'} = 0.144$	$δ_{1}^{″} = 0.056$
$N_{2} = 18$	$n_{2} = 5$	$m_{2} = 9$	$W_{2} = 0.500$	$\bar{U_{2}} = 257$
$\bar{V_{2}} = 19294$	$U_{M_{2}} = 1674$	$U_{m_{2}} = 52$	$ω_{u_{2}} = 366$	$ω_{v_{2}} = 37979$
$τ_{u_{2}} = 1.423$	$τ_{v_{2}} = 1.969$	$ρ_{v_{2} u_{2}} = 0.976$	$δ_{2}^{'} = 0.144$	$δ_{2}^{″} = 0.056$

Table 6. Descriptive statistics.

$N_{1} = 18$	$n_{1} = 5$	$m_{1} = 9$	$W_{1} = 0.500$	$\bar{U_{1}} = 72.550$
$\bar{V_{1}} = 27.490$	$U_{M_{1}} = 95$	$U_{m 1} = 28$	$ω_{u_{1}} = 10.580$	$ω_{v_{1}} = 10.130$
$τ_{u_{1}} = 0.155$	$τ_{v_{1}} = 0.376$	$ρ_{v_{1} u_{1}} = 0.337$	$δ_{1}^{'} = 0.144$	$δ_{1}^{″} = 0.056$
$N_{2} = 18$	$n_{2} = 5$	$m_{2} = 9$	$W_{2} = 0.500$	$\bar{U_{2}} = 60.870$
$\bar{V_{2}} = 20.820$	$U_{M_{2}} = 75$	$U_{m_{2}} = 15$	$ω_{u_{2}} = 8.980$	$ω_{v_{2}} = 12.750$
$τ_{u_{2}} = 0.142$	$τ_{v_{2}} = 0.269$	$ρ_{v_{2} u_{2}} = 0.496$	$δ_{2}^{'} = 0.144$	$δ_{2}^{″} = 0.056$

Table 7. MSE of different estimators using real populations.

Estimator	Data 1	Data 2	Data 3
${\bar{v}}_{t_{1}}$	2458644474	2277471492	4.991
${\bar{v}}_{t_{2}}$	2325327219	2106243059	4.564
${\bar{v}}_{t_{3}}$	2086419318	2005323090	4.879
${\bar{v}}_{t_{4}}$	2360073402	2123289511	4.628
${\bar{v}}_{t_{5}}$	2621040437	2568789003	4.879
${\bar{v}}_{t_{6}}$	2279370249	2091394778	4.540
${\bar{v}}_{G_{1}}$	1685405190	1691442012	3.108
${\bar{v}}_{G_{2}}$	1489232432	1479656632	2.940
${\bar{v}}_{G_{3}}$	1405592234	1378267780	2.728
${\bar{v}}_{G_{4}}$	1482944924	1451874310	2.835
${\bar{v}}_{G_{5}}$	1637713868	1660835587	3.206
${\bar{v}}_{G_{6}}$	1243061255	1290701980	2.501
${\bar{v}}_{G_{7}}$	1553568342	1523808743	2.730
${\bar{v}}_{G_{8}}$	1163004043	1211251209	2.261

Table 8. PRE of different estimators using real populations.

Estimator	Data 1	Data 2	Data 3
${\bar{v}}_{t_{1}}$	100	100	100
${\bar{v}}_{t_{2}}$	105.733	108.120	109.356
${\bar{v}}_{t_{3}}$	117.840	113.571	117.840
${\bar{v}}_{t_{4}}$	104.177	107.262	107.842
${\bar{v}}_{t_{5}}$	103.804	105.659	102.293
${\bar{v}}_{t_{6}}$	107.865	108.8973	109.696
${\bar{v}}_{G_{1}}$	145.879	134.647	160.586
${\bar{v}}_{G_{2}}$	165.095	153.919	169.762
${\bar{v}}_{G_{3}}$	174.919	165.242	179.532
${\bar{v}}_{G_{4}}$	165.795	156.864	176.049
${\bar{v}}_{G_{5}}$	150.127	137.128	155.676
${\bar{v}}_{G_{6}}$	197.780	176.452	199.560
${\bar{v}}_{G_{7}}$	158.258	149.459	182.820
${\bar{v}}_{G_{8}}$	211.405	188.026	220.743

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alghamdi, A.S.; Alrweili, H. New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application. Mathematics 2025, 13, 329. https://doi.org/10.3390/math13030329

AMA Style

Alghamdi AS, Alrweili H. New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application. Mathematics. 2025; 13(3):329. https://doi.org/10.3390/math13030329

Chicago/Turabian Style

Alghamdi, Abdulaziz S., and Hleil Alrweili. 2025. "New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application" Mathematics 13, no. 3: 329. https://doi.org/10.3390/math13030329

APA Style

Alghamdi, A. S., & Alrweili, H. (2025). New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application. Mathematics, 13(3), 329. https://doi.org/10.3390/math13030329

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

New Class of Estimators for Finite Population Mean Under Stratified Double Phase Sampling with Simulation and Real-Life Application

Abstract

1. Introduction

2. Methodology and Notation

3. Existing Estimators

4. Proposed Generalized Estimators

Properties of the Suggested Estimator

5. Mathematical Comparison

6. Results and Discussion

6.1. Simulation Study

6.2. Numerical Examples

7. Discussion

7.1. Limitations and Practical Challenges

7.1.1. Impact of Small Sample Sizes

7.1.2. Sensitivity to Highly Skewed Populations

7.1.3. Dependence on Auxiliary Data

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI