Dual Transformation of Auxiliary Variables by Using Outliers in Stratified Random Sampling

Mohammed Ahmed Alomair; Umer Daraz

doi:10.3390/math12182839

and

¹

Department of Quantitative Methods, School of Business, King Faisal University, Al-Ahsa 31982, Saudi Arabia

²

School of Mathematics and Statistics, Central South University, Changsha 410017, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics2024, 12(18), 2839;https://doi.org/10.3390/math12182839

This article belongs to the Special Issue Survey Statistics and Survey Sampling: Challenges and Opportunities

Version Notes

Order Reprints

Abstract

To estimate the finite population variance of the study variable, this paper proposes an improved class of efficient estimators using different transformations. When both the minimum and maximum values of the auxiliary variable are known and the ranks of the auxiliary variable are associated with the study variable, these estimators are particularly useful. Therefore, the precision of the estimators can be effectively improved through the utilization of these rankings. We examine the properties of the proposed class of estimators, including bias and mean squared error (MSE), using a first-order approximation through a stratified random sampling method. To determine the performances and validate the findings mathematically, a simulation study is carried out. Based on the results, the proposed class of estimators performs better in terms of the mean squared error

(M S E)

and percent relative efficiency

(P R E)

as compared to other estimators in all scenarios. Furthermore, in order to prove that the performances of the improved class of estimators are better than those of the existing estimators, three data sets are examined in the application section.

Keywords:

auxiliary information; study variable; minimum and maximum values; ranks; mean squared error; percent relative efficiency

MSC:

62D05

1. Introduction

In order to optimize the performance of the estimators under investigation while minimizing costs, time, and human resources, survey sampling aims to gather accurate information regarding various characteristics of the population. In many populations, a few extreme values exist, and estimating unknown population characteristics without considering such information can be quite sensitive. Consequently, outcomes may be overstated or underestimated in certain cases. Therefore, the accuracy of classical estimators usually decreases in terms of mean square error

(M S E)

when extreme values in the data set are present. Such information might be tempting to eliminate from the sample. In order to adequately address this challenge, it is important to include this information in the process of estimating population characteristics. Given the known smallest and largest observations of the auxiliary variable, Mohanty [1] offered two estimators by transforming them linearly. Such works were not studied further after that, until the works of Khan [2]. They applied the concept of using extreme values to a variety of finite population mean estimators. Daraz et al. [3] improved the estimation of the finite population mean under the influence of extreme values by applying a stratified random sampling technique. For more details, see [4,5,6,7] and the references therein.

The estimation problem of finite population variance is an important issue, and controlling variability in applications is challenging. This problem arises in biological and agricultural research, giving a signal that the intended results are unexpected. By carefully using supplementary information, the accuracy of the estimators can be increased. The use of auxiliary information to estimate population variance was first discussed by Das [8], and later expanded further by Isaki [9]. Different product and ratio-type exponential estimators were proposed by Bahl and Tuteja [10] to estimate population variance. Through transformations of extreme values, Daraz and Khan [11] proposed many efficient classes of estimators to estimate the population variance. Recently, using the concept of extreme values, Daraz et al. [12] proposed new classes of estimators for estimating population variance with least mean squared errors. Daraz et al. [13] proposed double exponential ratio-type estimators to discuss the effectiveness of the estimators for estimating the population variance by employing the extreme values of the auxiliary variable. Daraz et al. [14] used the dual use of auxiliary variables under simple random sampling to obtain a class of efficient estimators, addressing the accuracy of the estimators through the linear transformation of extreme values and rankings of auxiliary variables. A variety of researchers have suggested many different kinds of estimators for calculating the finite population variance, including [15,16,17,18,19,20,21,22,23,24,25].

The objective of this article is to effectively utilize the extreme values of the auxiliary variable and the ranks of the auxiliary variable in the estimation process for estimating the finite population variance. Additionally, the objective is to discuss the effectiveness and accuracy of the estimators through various transformations. The rankings of the auxiliary variable are associated with the study variable when there is a relationship between the two variables. As a result, these rankings can be utilized as a valuable tool to enhance the accuracy of the estimator. As discussed in [12,14], we introduce a new class of estimators in this article for estimating the finite population variance utilizing the known information on the extreme values of the auxiliary variable and the ranks of the auxiliary variable under a simple random sampling for further improvement.

This article is divided into the following sections. Section 2 presents the concepts and notations. Some existing estimators are covered in Section 3. In Section 4, we provide an in-depth explanation of our proposed class of estimators. Section 5 gives the mathematical comparison. Section 6 includes a simulation study to produce six distinct artificial populations by utilizing different probability distributions to investigate the theoretical results discussed in Section 5. Some numerical examples are also included in this part to illustrate our theoretical conclusions. Finally, Section 7 includes some conclusions and suggestions for further research.

2. Concepts and Notations

Consider a finite population of size N units, denoted by

U = (U_{1}, U_{2}, U_{3}, \dots, U_{N})

. This population is divided into L strata, each of which is

N_{h} (h = 1, 2, \dots, L),

with the property that

\sum_{h = 1}^{L} N_{h} = N

. Let

y_{h i}

,

x_{h i}

, and

r_{h i}

be the values of the study variable

(Y),

the auxiliary variable

(X),

and the ranks of the auxiliary variable R in the

h t h

stratum of the

i t h (i = 1, 2, \dots, N_{h})

unit, respectively. Let

\bar{Y_{h}} = \frac{1}{N_{h}} \sum_{i = 1}^{N_{h}} Y_{h i},

\bar{X_{h}} = \frac{1}{N_{h}} \sum_{i = 1}^{N_{h}} X_{h i}

and

\bar{R_{h}} = \frac{1}{N_{h}} \sum_{i = 1}^{N_{h}} R_{h i}

represent the population means of the study variable and auxiliary variable as well as the ranks of the auxiliary variable in the

h t h

stratum that correspond to the population means

\bar{Y} = \frac{1}{N_{h}} \sum_{h = 1}^{L} W_{h} \bar{Y_{h}},

\bar{X} = \frac{1}{N_{h}} \sum_{h = 1}^{L} W_{h} \bar{X_{h}}

and

\bar{R} = \frac{1}{N_{h}} \sum_{h = 1}^{L} W_{h} \bar{R_{h}},

respectively, where the known stratum weight is denoted by

W_{h} = N_{h} / N

.

For these variables, we define the population variances in the

h t h

stratum as

S_{y h}^{2} = \frac{1}{N_{h} - 1} \sum_{i = 1}^{N_{h}} {(Y_{h i} - \bar{Y_{h}})}^{2}

S_{x h}^{2} = \frac{1}{N_{h} - 1} \sum_{i = 1}^{N_{h}} {(X_{h i} - \bar{X_{h}})}^{2}

S_{r h}^{2} = \frac{1}{N_{h} - 1} \sum_{i = 1}^{N_{h}} {(R_{h i} - \bar{R_{h}})}^{2}

where

\bar{Y_{h}}, \bar{X_{h}}

, and

\bar{R_{h}}

are defined above. The population coefficients of variations in the

h t h

stratum are defined as

C_{y h} = \frac{S_{y h}}{\bar{Y_{h}}},

C_{x h} = \frac{S_{x h}}{\bar{X_{h}}}

and

C_{r h} = \frac{S_{r h}}{\bar{R_{h}}},

respectively. Further, let

ρ_{y x h} = \frac{S_{y x h}}{S_{y h} S_{x h}},

ρ_{y r h} = \frac{S_{y r h}}{S_{y h} S_{r h}}

and

ρ_{x r h} = \frac{S_{x r h}}{S_{x h} S_{r h}}

be the population co-variances between

(Y, X), (Y, R)

, and

(X, R)

in the

h t h

stratum, respectively.

Simple random sampling without replacement is used to select a sample of size

n_{h}

from the

h t h

stratum. Let

\bar{y_{h}} = \frac{1}{n_{h}} \sum_{i = 1}^{n_{h}} Y_{h i},

\bar{x_{h}} = \frac{1}{n_{h}} \sum_{i = 1}^{n_{h}} X_{h i}

and

\bar{r_{h}} = \frac{1}{n_{h}} \sum_{i = 1}^{n_{h}} R_{h i}

be the sample means of the study variable and auxiliary variable as well as the ranks of the auxiliary variable in the

h t h

stratum. The sample variances for these variables are defined as

{\hat{S}}_{y h}^{2} = s_{y h}^{2} = \frac{1}{n_{h} - 1} \sum_{i = 1}^{n_{h}} {(Y_{h i} - \bar{y_{h}})}^{2},

{\hat{S}}_{x h}^{2} = s_{x h}^{2} = \frac{1}{n_{h} - 1} \sum_{i = 1}^{n_{h}} {(X_{h i} - \bar{x_{h}})}^{2}

and

{\hat{S}}_{r h}^{2} = s_{r h}^{2} = \frac{1}{n_{h} - 1} \sum_{i = 1}^{n_{h}} {(R_{h i} - \bar{r_{h}})}^{2},

respectively. Additionally, the sample coefficients of variation of

Y, X,

and R in the

h t h

stratum are defined as

c_{y h} = \frac{{\hat{S}}_{y h}}{\bar{y_{h}}},

c_{x h} = \frac{{\hat{S}}_{x h}}{\bar{x_{h}}}

and

c_{r h} = \frac{{\hat{S}}_{r h}}{\bar{r_{h}}},

respectively.

3. Existing Estimators

In this section, we discuss some existing estimators of finite population variances and compare them with the proposed class of estimators. To derive the biases and mean square errors for various estimators, we define the following terms:

e_{0 h} = (\frac{s_{y h}^{2} - S_{y h}^{2}}{S_{y h}^{2}}), e_{1 h} = (\frac{s_{x h}^{2} - S_{x h}^{2}}{S_{x h}^{2}}), and e_{2 h} = (\frac{s_{r h}^{2} - S_{r h}^{2}}{S_{r h}^{2}})

such that

E (e_{i h}) = 0

for

i = 0, 1, 2

.

E (e_{0 h}^{2}) = ϕ_{h} δ_{400 h}^{*}, E (e_{1 h}^{2}) = ϕ_{h} δ_{040 h}^{*}, E (e_{2 h}^{2}) = ϕ_{h} δ_{004 h}^{*}

E (e_{0 h} e_{1 h}) = ϕ_{h} δ_{220 h}^{*}, E (e_{0 h} e_{2 h}) = ϕ_{h} δ_{202 h}^{*}, E (e_{1 h} e_{2 h}) = ϕ δ_{022 h}^{*},

where

δ_{400 h}^{*} = (δ_{400 h} - 1)

,

δ_{040 h}^{*} = (δ_{040 h} - 1)

,

δ_{004 h}^{*} = (δ_{004 h} - 1),

δ_{220 h}^{*} = (δ_{220 h} - 1),

δ_{202 h}^{*} = (δ_{202 h} - 1),

δ_{022 h}^{*} = (δ_{022 h} - 1),

and

ϕ_{h} = (\frac{1}{n_{h}} - \frac{1}{N_{h}})

. Also,

δ_{l q s h} = \frac{φ_{l q s h}}{φ_{200 h}^{l / 2} φ_{020 h}^{q / 2} φ_{002 h}^{s / 2}},

where

φ_{l q s h} = \frac{\sum_{i = 1}^{N h} {(Y_{i h} - {\bar{Y}}_{h})}^{l} {(X_{i h} - {\bar{X}}_{h})}^{q} {(R_{i h} - {\bar{R}}_{h})}^{s}}{N_{h} - 1} .

Here,

δ_{400 h} = β_{2 (y h)}, δ_{040 h} = β_{2 (x h)}

, and

δ_{004 h} = β_{2 (r h)}

are the population coefficients of kurtosis.

In stratified random sampling, the variance of the usual estimator

\bar{y_{s t}} = \sum_{h = 1}^{L} W_{h} \bar{y_{h}}

is defined as follows:

V a r (\bar{y_{s t}}) = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} S_{y h}^{2} = S_{y s t}^{2} .

The unbiased estimator

{\hat{S}}_{y s t}^{2}

of

S_{y s t}^{2}

is defined as

{\hat{S}}_{y s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} .

The usual variance estimator of

{\hat{S}}_{y s t}^{2} = s_{y s t}^{2}

for population variance is given by

V a r ({\hat{S}}_{y s t}^{2}) = \sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} S_{y h}^{4} δ_{400 h}^{*} .

(1)

Isaki [9] suggested a ratio estimator for population variance

{\hat{S}}_{t s t}^{2}

, which is given by

{\hat{S}}_{t s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} (\frac{S_{x h}^{2}}{s_{x h}^{2}}) .

(2)

The bias and

M S E

of

{\hat{S}}_{t s t}^{2}

are expressed as follows:

B i a s ({\hat{S}}_{t s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{2} W_{h}^{2} ϕ_{h} S_{y h}^{2} (δ_{040 h}^{*} - δ_{220 h}^{*})

(3)

and

M S E ({\hat{S}}_{t s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} S_{y h}^{4} (δ_{400 h}^{*} + δ_{040 h}^{*} - 2 δ_{220 h}^{*}) .

(4)

The linear regression estimator

{\hat{S}}_{l r s t}^{2}

proposed by Watson [26] is defined as

{\hat{S}}_{l r s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} [s_{y h}^{2} + b_{(s_{y h}^{2}, s_{x h}^{2})} (S_{x h}^{2} - s_{x h}^{2})],

(5)

where

b_{(s_{y h}^{2}, s_{x h}^{2})} = \frac{s_{y h}^{2} {\hat{δ}}_{220 h}^{*}}{s_{x h}^{2} {\hat{δ}}_{040 h}^{*}}

is the sample regression coefficient.

The $M S E$ of the estimator ${\hat{S}}_{l r s t}^{2}$ is expressed as follows:

M S E ({\hat{S}}_{l r s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} S_{y h}^{4} δ_{400 h}^{*} (1 - ρ_{h}^{* 2}),

(6)

where

ρ_{h}^{*} = \frac{δ_{220 h}^{*}}{\sqrt{δ_{400 h}^{*}} \sqrt{δ_{040 h}^{*}}}

.

For population variance under stratified random sampling, Bahal and Tuteja [10] introduced an exponential ratio-type estimator

{\hat{S}}_{b t s t}^{2}

, which is defined as

{\hat{S}}_{b t s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp (\frac{S_{x h}^{2} - s_{x h}^{2}}{S_{x h}^{2} + s_{x h}^{2}}) .

(7)

The bias and

M S E

of

{\hat{S}}_{b t s t}^{2}

are expressed as follows:

B i a s ({\hat{S}}_{b t s t}^{2}) ≅ \frac{1}{2} \sum_{h = 1}^{L} ϕ_{h}^{2} W_{h}^{2} S_{y h}^{2} (\frac{3 δ_{040 h}^{*}}{4} - δ_{220 h}^{*})

(8)

and

M S E ({\hat{S}}_{b t s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} S_{y h}^{4} (δ_{400 h}^{*} + \frac{δ_{040 h}^{*}}{4} - δ_{220 h}^{*}) .

(9)

In stratified random sampling, Upadhyaya and Singh [21] proposed a ratio-type estimator

{\hat{S}}_{u s s t}^{2}

by utilizing the kurtosis of an auxiliary variable.

{\hat{S}}_{u s s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} (\frac{S_{x h}^{2} + δ_{040 h}}{s_{x h}^{2} + δ_{040 h}}) .

(10)

The bias and

M S E

of

{\hat{S}}_{u s s t}^{2}

are expressed as follows:

B i a s ({\hat{S}}_{u s s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{2} W_{h}^{2} S_{y h}^{2} v_{1 h} (v_{1 h} δ_{040 h}^{*} - δ_{220 h}^{*})

(11)

and

M S E ({\hat{S}}_{u s s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} S_{y h}^{4} (δ_{400 h}^{*} + v_{1 h}^{2} δ_{040 h}^{*} - 2 v_{1 h} δ_{220 h}^{*}),

(12)

where

v_{1 h} = \frac{S_{x h}^{2}}{S_{x h}^{2} + δ_{040 h}}

.

Kadilar and Cingi [16] proposed certain ratio estimators as

{\hat{S}}_{a s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} (\frac{S_{x h}^{2} + C_{x h}}{s_{x h}^{2} + C_{x h}}),

(13)

{\hat{S}}_{b s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} (\frac{δ_{040 h} S_{x h}^{2} + C_{x h}}{δ_{040 h} s_{x h}^{2} + C_{x h}})

(14)

and

{\hat{S}}_{c s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} (\frac{C_{x h} S_{x h}^{2} + δ_{040 h}}{C_{x h} s_{x h}^{2} + δ_{040 h}}) .

(15)

The bias and

M S E s

of

{\hat{S}}_{j s t}^{2} (j s t = a s t, b s t, c s t)

are expressed as follows:

B i a s ({\hat{S}}_{j s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{2} W_{h}^{2} S_{y s t}^{2} v_{i h} (v_{i h} δ_{040 h}^{*} - δ_{220 h}^{*}), i = 2, 3, 4

(16)

and

M S E ({\hat{S}}_{j s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} S_{y h}^{4} (δ_{400 h}^{*} + v_{i h}^{2} δ_{040 h}^{*} - 2 v_{i h} δ_{220 h}^{*}),

(17)

where

v_{2 h} = \frac{S_{x h}^{2}}{S_{x h}^{2} + C_{x h}}, v_{3 h} = \frac{δ_{040 h} S_{x h}^{2}}{δ_{040 h} S_{x h}^{2} + C_{x h}},

and

v_{4 h} = \frac{C_{x h} S_{x h}^{2}}{C_{x h} S_{x h}^{2} + δ_{040 h}}

.

4. Proposed Estimator

Motivated by [12,14], this section introduces an improved class of efficient estimators that use the extreme values of the auxiliary variable and rankings based on stratified random sampling to estimate the finite population variance.

{\hat{S}}_{m d s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp [δ_{1 h} \{\frac{ξ_{1 h} (S_{x h}^{2} - s_{x h}^{2})}{ξ_{1 h} (S_{x h}^{2} + s_{x h}^{2}) + 2 ξ_{2 h}}\}] exp [δ_{2 h} \{\frac{ξ_{3 h} (S_{r h}^{2} - s_{r h}^{2})}{ξ_{3 h} (S_{r h}^{2} + s_{r h}^{2}) + 2 ξ_{4 h}}\}],

(18)

where

(δ_{i h}, i = 1, 2)

represent known constant values, whereas

(ξ_{i h}, i = 1, 2, 3, 4)

represent auxiliary variable parameters. Table 1 shows the known values for

ξ_{1 h}, ξ_{2 h}, ξ_{3 h}

, and

ξ_{4 h}

. The largest and smallest values of the auxiliary variable in the

h t h

stratum are denoted by

x_{M h}

and

x_{m h}

, while the largest and smallest values of the ranks of the auxiliary variable are denoted by

R_{M h}

and

R_{m h}

. Table 2 presents the various classes of the proposed estimator derived from (18) and Table 1.

Table 1. Different parameters of the auxiliary variables.

Table 2. Some classes of the proposed estimator.

Properties of the Proposed Estimator

By rewriting (18) in terms of errors, we can derive the bias and

M S E

of the suggested estimator

{\hat{S}}_{m d s t}^{2}

, i.e.,

\begin{matrix} {\hat{S}}_{m d s t}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} S_{y h}^{2} (1 + e_{0 h}) exp {[\frac{- δ_{1 h} k_{4 h} e_{1 h}}{2} (1 + \frac{k_{4 h} e_{1 h}}{2})]}^{- 1} \\ exp {[\frac{- δ_{2 h} k_{5 h} e_{2 h}}{2} (1 + \frac{k_{5 h} e_{2 h}}{2})]}^{- 1} \end{matrix}

(19)

where

k_{4 h} = \frac{ξ_{1 h} S_{x h}^{2}}{ξ_{1 h} S_{x h}^{2} + ξ_{2 h}}

and

k_{5 h} = \frac{ξ_{3 h} S_{r h}^{2}}{ξ_{3 h} S_{r h}^{2} + ξ_{4 h}}

.

Applying the Taylor series to the first approximation order, we obtain

\begin{matrix} {\hat{S}}_{m d s t}^{2} - \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} S_{y h}^{2} ≅ - \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} S_{y h}^{2} [e_{0 h} - \frac{δ_{1 h} k_{4 h}}{2} e_{1 h} - \frac{δ_{2} k_{5 h}}{2} e_{2 h} \\ + (\frac{δ_{1 h} k_{4 h}^{2}}{4} + \frac{δ_{1 h}^{2} k_{4 h}^{2}}{8}) e_{1 h}^{2} + (\frac{δ_{2 h} k_{5 h}^{2}}{4} + \frac{δ_{2 h}^{2} k_{5 h}^{2}}{8}) e_{2 h}^{2} \\ - \frac{δ_{1 h} k_{4 h}}{2} e_{0 h} e_{1 h} - \frac{δ_{2 h} k_{5 h}}{2} e_{0 h} e_{2 h} + \frac{δ_{1 h} δ_{2 h} k_{4 h} k_{5 h}}{2} e_{1 h} e_{2 h}] . \end{matrix}

(20)

Using (20), the bias of

{\hat{S}}_{m d s t}^{2}

is given by

\begin{matrix} B i a s ({\hat{S}}_{m d s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{2} W_{h}^{2} S_{y h}^{2} [(\frac{δ_{1 h} k_{4 h}^{2}}{4} + \frac{δ_{1 h}^{2} k_{4 h}^{2}}{8}) δ_{040 h}^{*} + (\frac{δ_{2 h} k_{5 h}^{2}}{4} + \frac{δ_{2 h}^{2} k_{5 h}^{2}}{8}) δ_{004 h}^{*} \\ - \frac{δ_{1 h} k_{4 h}}{2} δ_{220 h}^{*} - \frac{δ_{2 h} k_{5 h}}{2} δ_{202 h}^{*} + \frac{δ_{1} δ_{2 h} k_{4 h} k_{5 h}}{2} δ_{022 h}^{*}] . \end{matrix}

(21)

By squaring both sides of (20) and taking the expected value, we obtain a first-order approximate

M S E

, which is given by the following equation:

\begin{matrix} M S E ({\hat{S}}_{m d s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} S_{y h}^{4} [δ_{400 h}^{*} + \frac{δ_{1 h}^{2} k_{4 h}^{2}}{4} δ_{040 h}^{*} + \frac{δ_{2 h}^{2} k_{5 h}^{2}}{4} δ_{004 h}^{*} - δ_{1 h} k_{4 h} δ_{220 h}^{*} \\ - δ_{2 h} k_{5 h} δ_{202 h}^{*} + \frac{δ_{1 h} δ_{2 h} k_{4 h} k_{5 h}}{2} δ_{022 h}] . \end{matrix}

(22)

The bias and

M S E

for

{\hat{S}}_{m d s t}^{2}

can be rewritten by substituting the known constant values of

(δ_{1 h} = δ_{2 h} = 1)

into (21) and (22); and after the simple simplifications, we obtain

\begin{matrix} B i a s ({\hat{S}}_{m d s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{2} W_{h}^{2} S_{y h}^{2} [\frac{3}{8} (k_{4 h}^{2} δ_{040 h}^{*} + k_{5 h}^{2} δ_{004 h}^{*}) \\ - \frac{1}{2} (k_{4 h} δ_{220 h}^{*} + k_{5 h} δ_{202 h}^{*} - k_{4 h} k_{5 h} δ_{022 h}^{*})] \end{matrix}

(23)

and

\begin{matrix} M S E ({\hat{S}}_{m d s t}^{2}) ≅ \sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} S_{y h}^{4} [δ_{400 h}^{*} + \frac{1}{4} (k_{4 h}^{2} δ_{040 h}^{*} + k_{5 h}^{2} δ_{004 h}^{*}) \\ - \frac{1}{2} (2 k_{4 h} δ_{220 h}^{*} + 2 k_{5 h} δ_{202 h}^{*} - k_{4 h} k_{5 h} δ_{022 h}^{*})] . \end{matrix}

(24)

5. Mathematical Comparison

In this section, we discuss the comparisons between the proposed class of estimators,

{\hat{S}}_{m d s t}^{2}

, with other existing estimators,

{\hat{S}}_{y s t}^{2}, {\hat{S}}_{t s t}^{2}, {\hat{S}}_{l r s t}^{2}, {\hat{S}}_{b t s t}^{2}, {\hat{S}}_{u s s t}^{2}

, and

{\hat{S}}_{j s t}^{2} (j s t = a s t, b s t, c s t)

.

Condition (i): Comparison of the estimators given in Equations (1) and (24):

V a r ({\hat{S}}_{y s t}^{2}) > M S E ({\hat{S}}_{m d s t}^{2}) if

\sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} [(2 k_{4 h} δ_{220 h}^{*} + 2 k_{5 h} δ_{202 h}^{*} - k_{4 h} k_{5 h} δ_{022 h}^{*}) - \frac{1}{2} (k_{4 h}^{2} δ_{040 h}^{*} + k_{5 h}^{2} δ_{004 h}^{*})] > 0 .

Condition (ii): Comparison of the estimators given in Equations (4) and (24):

M S E ({\hat{S}}_{t s t}^{2}) > M S E ({\hat{S}}_{m d s t}^{2}) if

\sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} [2 δ_{220 h}^{*} (k_{4 h} - 2) + 2 k_{5 h} δ_{202 h}^{*} - k_{4 h} k_{5 h} δ_{022 h}^{*} - \frac{1}{2} \{δ_{040 h}^{*} (k_{4 h}^{2} - 4) + k_{5 h}^{2} δ_{004 h}^{*}\}] > 0 .

Condition (iii): Comparison of the estimators given in Equations (6) and (24):

M S E ({\hat{S}}_{l r s t}^{2}) > M S E ({\hat{S}}_{m d s t}^{2}) if

\sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} [2 k_{4 h} δ_{220 h}^{*} + 2 k_{5 h} δ_{202 h}^{*} - k_{4 h} k_{5 h} δ_{022 h}^{*} - \frac{1}{2} (4 ρ_{h}^{* 2} δ_{400 h}^{*} + k_{4 h}^{2} δ_{040 h}^{*} + k_{5 h}^{2} δ_{004 h}^{*})] > 0 .

Condition (iv): Comparison of the estimators given in Equations (9) and (24):

M S E ({\hat{S}}_{b t s t}^{2}) > M S E ({\hat{S}}_{m d s t}^{2}) if

\sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} [2 δ_{220 h}^{*} (k_{4 h} - 1) + 2 k_{5 h} δ_{202 h}^{*} - k_{4 h} k_{5 h} δ_{022 h}^{*} - \frac{1}{2} \{δ_{040 h}^{*} (k_{4 h}^{2} - 1) + k_{5 h}^{2} δ_{004 h}^{*}\}] > 0 .

Condition (v): Comparison of the estimators given in Equations (12) and (24):

M S E ({\hat{S}}_{u s s t}^{2}) > M S E ({\hat{S}}_{m d s t}^{2}) if

\sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} [δ_{220 h}^{*} (k_{4 h} - 4 v_{1 h}) + k_{5 h} δ_{202 h}^{*} - k_{4 h} k_{5 h} δ_{022 h}^{*} - \frac{1}{2} \{δ_{040 h}^{*} (k_{4 h}^{2} - 4 v_{1 h}^{2}) + k_{5 h}^{2} δ_{004 h}^{*}\}] > 0 .

Condition (vi): Comparison of the estimators given in Equations (17) and (24):

M S E ({\hat{S}}_{j s t}^{2}) > M S E ({\hat{S}}_{m d s t}^{2}) if

\sum_{h = 1}^{L} ϕ_{h}^{3} W_{h}^{4} [δ_{220 h}^{*} (k_{4 h} - 4 v_{i h}) + k_{5 h} δ_{202 h}^{*} - k_{4 h} k_{5 h} δ_{022 h}^{*} - \frac{1}{2} \{δ_{040 h}^{*} (k_{4 h}^{2} - 4 v_{i h}^{2}) + k_{5 h}^{2} δ_{004 h}^{*}\}] > 0 .

6. Numerical Comparison

The objective of this section is to analyze the performance of various estimators by comparing their mean squared errors (MSEs) using simulated and real datasets. We further calculate the percent relative efficiency (PER) of the proposed class of estimators in comparison to other existing estimators.

6.1. Simulation Study

The simulation research was conducted using the methodology described in [12,14] to verify the theoretical findings mentioned in Section 5. With the help of the following probability distributions, we can obtain six distinct populations for the auxiliary variable X using the following probability distributions:

Population 1: $X \sim E x p o n e n t i a l (μ = 3)$ .
Population 2: $X \sim E x p o n e n t i a l (μ = 8)$ .
Population 3: $X \sim G a m m a (γ_{1} = 8, γ_{2} = 12)$ .
Population 5: $X \sim G a m m a (γ_{1} = 6, γ_{2} = 8)$ .
Population 5: $X \sim U n i f o r m (α_{1} = 4, α_{2} = 10)$ .
Population 6: $X \sim U n i f o r m (α_{1} = 5, α_{2} = 11)$ .

The following formula is used to calculate the variable of interest Y:

Y = r_{y x} \times X + e,

where the error term is

e \sim N (0, 1)

and the correlation coefficient between the study and auxiliary variables is

r_{y x} = 0.82 .

The quality and consistency of the data can possibly be indicated by the selected value of

r_{y x}

. A correlation coefficient of

r_{y x} = 0.82

indicates that the relationship between x and y is consistent throughout the data set and that there is relatively low noise.

We used the R-Software (latest v. 4.4.0) to perform the following operations to estimate the mean squared errors (MSEs) and percent relative efficiencies (PREs) of the suggested class of estimators and other existing estimators:

Step 1: In order to generate a population of size 1200, we first used the particular kinds of probability distributions defined above. Using stratified random sampling techniques, this population was then split into two strata in order to calculate different values for the existing and recommended class of estimators.
Step 2: We derived the population total from Step 1, together with the minimum and maximum values of the auxiliary variable. Furthermore, we derived the maximum and minimum values of the ranks of the auxiliary variable.
Step 3: Simple random sampling without replacement (SRSWOR) was used to generate different sample sizes for each population. The specified sample sizes are 20%, 30%, and 40% of the total population $(N)$ .
Step 4: We found the $M S E$ and $P R E$ values for each sample size that is covered in this article.
Step 5: Following 65,000 replications of Steps 3 and 4, the results for artificial populations are shown in Table 3 and Table 4, while the summary for real data sets are shown in Table 5 and Table 6.

Table 3. MSEs of different estimators using the artificial populations.

Table 4. PREs of different estimators using the artificial populations.

Table 5. MSEs using empirical datasets.

Table 6. PREs using empirical datasets.
Finally, we used the following formulae to obtain the MSE and PRE for each estimator over all replications:

$M S E {({\hat{S}}_{D}^{2})}_{min} = \frac{\sum_{i = 1}^{65000} {({\hat{S}}_{D}^{2} - S_{y s t}^{2})}^{2}}{65000}$

and

$P R E = \frac{V ({\hat{S}}_{y s t}^{2})}{M S E {({\hat{S}}_{D}^{2})}_{m i n}} \times 100,$

where $D = t s t, l r s t, b t s t, u s s t, a s t, b s t, c s t,$ and $m d_{i} (i = 1, 2, \dots, 8) .$

6.2. Numerical Examples

We investigated the mean squared errors (MSEs) and percent relative efficiencies (PREs) of the recommended class of estimators and other existing estimators using three real data sets to assess their performances. The descriptions of the data sets are defined below, while summary statistics of the data sets are given in Table 7, Table 8 and Table 9.

Table 7. Summary statistics for data set-1.

Table 8. Summary statistics for data set-2.

Table 9. Summary statistics for data set-3.

Data 1. This data set, which included different divisions, was chosen from Bureau of Statistics page 135 [27] and was collected in Pakistan in 2012. The data set can be downloaded by using the following URL from the Pakistan Bureau of Statistics website: https://www.pbs.gov.pk/\content/microdata, (accessed on 30 July 2024).
Y: The total enrollment of students in 2012.
X: Government elementary and secondary schools in 2012.
R: Ranks of the government elementary and secondary schools in 2012.
Two groups were generated from the data, and the summary statistics of data set-1 are given in Table 7.
Group 1: the divisions of Sargodha, Gujranwala, Rawalpindi, and Lahore.
Group 2: the divisions of Multan, Bahawalpur, Faisalabad, D.G Khan, and Sahiwal.

Data 2. This data set, which includes different divisions, was chosen from Bureau of Statistics page 226 [27] and was collected in Pakistan in 2012. The data set can be downloaded by using the following URL from the Pakistan Bureau of Statistics website: https://www.pbs.gov.pk/\content/microdata, (accessed on 30 July 2024).
Y: Departmental employment levels in 2012.
X: Number of factories the departments registered in 2012.
R: Ranks of the number of factories the departments registered in 2012.
Two groups were generated from the data, and the summary statistics of data set-2 are given in Table 8.
Group 1: the divisions of Sargodha, Gujranwala, Rawalpindi, and Lahore.
Group 2: the divisions of Multan, Bahawalpur, Faisalabad, D.G Khan, and Sahiwal.

Data 3. Another data point was selected from Cochran page 24 [28], comprising different units of food cost and weekly income of families.
Y: Food expenses related to the families’ employment.
X: Families’ weekly income.
R: Ranks of the families weekly income.
Two groups were generated from the data, and the summary statistics of data set-3 are given in Table 9.

Finally, we used the following formula to calculate the percent relative efficiency (PRE) for different data sets:

P R E = \frac{V ({\hat{S}}_{y s t}^{2})}{M S E ({\hat{S}}_{K}^{2})} \times 100,

where K is one of

t s t, l r s t, b t s t, u s s t, a s t, b s t, c s t,

or

m d_{i} (i = 1, 2, \dots, 8) .

We used simulation studies and three real data sets in order to determine the performance of the proposed class of estimators. The

P R E

4 criterion was used for the comparisons between different estimators. The

M S E

and

P R E

values of the proposed and existing estimators obtained from the simulation study are given in Table 3 and Table 4, respectively. The outcomes for real data sets are presented in Table 5 and Table 6. The following are some general findings:

For all simulated scenarios and real data sets, Table 3 and Table 5 illustrate that the $M S E$ values of each proposed estimate are smaller than those of the current estimators defined in the literature, confirming the higher accuracy of the recommended estimators over the existing estimators.
Furthermore, all of the proposed estimators have $P R E$ values that are higher than those of the existing estimators, as shown in Table 4 and Table 6. This indicates that the performance of the proposed class of estimators is preferred to that of the existing estimators.

7. Conclusions

In this article, we proposed a new class of efficient estimators based on different transformations for determining the finite population variance. We noticed that when both the minimum and maximum values of the auxiliary variable are known and the ranks of the auxiliary variable are associated with the study variable, these estimators are particularly useful. Therefore, the precision of the estimators can be effectively improved. To compare the enhancements of the recommended estimators with those of existing estimators, we investigated the theoretical conditions that show the better accuracy of the estimators in Section 5. To verify these conditions, we analyzed different empirical data sets and conducted a simulation study. According to Table 4, the recommended estimators consistently perform better in terms of

P R E s

than existing estimators. The theoretical conclusions in Section 5 are further confirmed by empirical data shown in Table 6. The simulation and empirical results lead us to conclude that the suggested estimators

{\hat{S}}_{m d i_{i}}^{2}

(i = 1, 2, 3, \dots, 8)

are more efficient than the other estimators under consideration. Since

{\hat{S}}_{m d_{8}}^{2}

has the lowest

M S E

among these recommended estimators, it is particularly preferred.

However, we studied certain characteristics of the recommended efficient class of estimators under stratified random sampling. Our results can be useful in identifying the more efficient estimators that can obtain the lowest

M S E s

. It is also conceivable to provide some novel estimators using the two-stage sampling technique. Further research on this area could be valuable.

Author Contributions

Conceptualization, U.D.; Methodology, M.A.A.; Software, M.A.A. and U.D.; Validation, U.D.; Formal analysis, M.A.A. and U.D.; Investigation, M.A.A. and U.D.; Resources, U.D.; Data curation, U.D.; Writing—original draft, M.A.A. and U.D.; Writing—review and editing, U.D.; Visualization, U.D.; Supervision, U.D.; Project administration, M.A.A. and U.D.; Funding acquisition, M.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Grant No.KFU241799].

Data Availability Statement

The real data are secondary, and their sources are given in the data section, while the simulated data have been generated using R software (latest v. 4.4.0).

Conflicts of Interest

The authors declare no conflict of interest.

References

Mohanty, S.; Sahoo, J. A note on improving the ratio method of estimation through linear transformation using certain known population parameters. Sankhya Indian J. Stat. Ser. B 1995, 57, 93–102. [Google Scholar]
Khan, M.; Shabbir, J. Some improved ratio, product, and regression estimators of finite population mean when using minimum and maximum values. Sci. World J. 2013, 2013, 431868. [Google Scholar] [CrossRef] [PubMed]
Daraz, U.; Shabbir, J.; Khan, H. Estimation of finite population mean by using minimum and maximum values in stratified random sampling. J. Mod. Appl. Stat. Methods 2018, 17, 20. [Google Scholar] [CrossRef]
Chatterjee, S.; Hadi, A.S. Regression Analysis by Example; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
Cekim, H.O.; Cingi, H. Some estimator types for population mean using linear transformation with the help of the minimum and maximum values of the auxiliary variable. Hacet. J. Math. Stat. 2017, 46, 685–694. [Google Scholar]
Khan, M. Improvement in estimating the finite population mean under maximum and minimum values in double sampling scheme. J. Stat. Appl. Probab. Lett. 2015, 2, 1–7. [Google Scholar]
Walia, G.S.; Kaur, H.; Sharma, M. Ratio type estimator of population mean through efficient linear transformation. Am. J. Math. Stat. 2015, 5, 144–149. [Google Scholar]
Das, A.K.; Tripathi, T.P. Use of auxiliary information in estimating the finite population variance. Sankhya Indian J. Stat. Ser. C 1978, 40, 39–148. [Google Scholar]
Isaki, C.T. Variance estimation using auxiliary information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
Bahl, S.; Tuteja, R. Ratio and product type exponential estimators. J. Inf. Optim. Sci. 1991, 12, 159–164. [Google Scholar] [CrossRef]
Daraz, U.; Khan, M. Estimation of variance of the difference-cum-ratio-type exponential estimator in simple random sampling. Res. Math. Stat. 2021, 8, 1899402. [Google Scholar] [CrossRef]
Daraz, U.; Wu, J.; Albalawi, O. Double exponential ratio estimator of a finite population variance under extreme values in simple random sampling. Mathematics 2024, 12, 1737. [Google Scholar] [CrossRef]
Daraz, U.; Wu, J.; Alomair, M.A.; Aldoghan, L.A. New classes of difference cum-ratio-type exponential estimators for a finite population variance in stratified random sampling. Heliyon 2024, 10, e33402. [Google Scholar] [CrossRef] [PubMed]
Daraz, U.; Alomair, M.A.; Albalawi, O. Variance estimation under some transformation for both symmetric and asymmetric data. Symmetry 2024, 16, 957. [Google Scholar] [CrossRef]
Dubey, V.; Sharma, H. On estimating population variance using auxiliary information. Stat. Transit. New Ser. 2008, 9, 7–18. [Google Scholar]
Kadilar, C.; Cingi, H. Ratio estimators for the population variance in simple and stratified random sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
Shabbir, J.; Gupta, S. Some estimators of finite population variance of stratified sample mean. Commun. Stat. Theory Methods 2010, 39, 3001–3008. [Google Scholar] [CrossRef]
Shabbir, J.; Gupta, S. Using rank of the auxiliary variable in estimating variance of the stratified sample mean. Int. J. Comput. Theor. Stat. 2019, 6, 171–181. [Google Scholar] [CrossRef]
Singh, H.; Chandra, P. An alternative to ratio estimator of the population variance in sample surveys. J. Transp. Stat. 2008, 9, 89–103. [Google Scholar]
Singh, H.P.; Solanki, R.S. A new procedure for variance estimation in simple random sampling using auxiliary information. J. Stat. Pap. 2013, 54, 479–497. [Google Scholar] [CrossRef]
Upadhyaya, L.; Singh, H. An estimator for populationvariance that utilizes the kurtosis of an auxiliary variablein sample surveys. Vikram Math. J. 1999, 19, 14–17. [Google Scholar]
Yadav, S.K.; Kadilar, C.; Shabbir, J.; Gupta, S. Improved family of estimators of population variance in simple random sampling. J. Stat. Theory Pract. 2015, 9, 219–226. [Google Scholar] [CrossRef]
Yasmeen, U.; Noor-ul-Amin, M. Estimation of Finite Population Variance Under Stratified Sampling Technique. J. Reliab. Stat. Stud. 2021, 14, 565–584. [Google Scholar] [CrossRef]
Zaman, T.; Bulut, H. An efficient family of robust-type estimators for the population variance in simple and stratified random sampling. Commun. Stat. Theory Methods 2023, 52, 2610–2624. [Google Scholar] [CrossRef]
Daraz, U.; Alomair, M.A.; Albalawi, O.; Al Naim, A.S. New techniques for estimating finite population variance using ranks of Auxiliary Variable in Two-Stage Sampling. Mathematics 2024, 12, 2741. [Google Scholar] [CrossRef]
Watson, D.J. The estimation of leaf area in field crops. J. Agric. Sci. 1937, 27, 474–483. [Google Scholar] [CrossRef]
Bureau of Statistics. Punjab Development Statistics; Government of the Punjab: Lahore, Pakistan, 2013.
Cochran, W.B. Sampling Techniques; John Wiley and Sons: Hoboken, NJ, USA, 1963. [Google Scholar]

Table 1. Different parameters of the auxiliary variables.

$ξ_{1 h}$	$ξ_{2 h}$	$ξ_{3 h}$	$ξ_{4 h}$
1	$X_{M h} - X_{m h}$	1	$R_{M h} - R_{m h}$
$X_{M h} - X_{m h}$	$C_{x}$	$R_{M h} - R_{m h}$	$C_{r}$
$X_{M h} - X_{m h}$	1	$R_{M h} - R_{m h}$	1
$X_{M h} - X_{m h}$	$β_{2 (x h)}$	$R_{M h} - R_{m h}$	$β_{2 (r h)}$
$β_{2 (x h)}$	$X_{M h} - X_{m h}$	$β_{2 (r h)}$	$R_{M h} - R_{m h}$
$C_{x h}$	$X_{M h} - X_{m h}$	$C_{r h}$	$R_{M h} - R_{m h}$
$ρ_{y h x h}$	$X_{M h} - X_{m h}$	$ρ_{y h r h}$	$R_{M h} - R_{m h}$
$X_{M h} - X_{m h}$	$ρ_{y h x h}$	$R_{M h} - R_{m h}$	$ρ_{y h r h}$

Table 2. Some classes of the proposed estimator.

Subsets of the Proposed Estimator ${\hat{S}}_{mdst}^{2}$
${\hat{S}}_{m d_{1}}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp [δ_{1 h} \{\frac{(S_{x h}^{2} - s_{x h}^{2})}{(S_{x h}^{2} + s_{x}^{2} h) + 2 (X_{M h} - X_{m h})}\}] exp [δ_{2 h} \{\frac{(S_{r h}^{2} - s_{r h}^{2})}{(S_{r h}^{2} + s_{r h}^{2}) + 2 (R_{M h} - R_{m h})}\}]$
${\hat{S}}_{m d_{2}}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp [δ_{1 h} \{\frac{(X_{M h} - X_{m h}) (S_{x h}^{2} - s_{x h}^{2})}{(X_{M h} - X_{m h}) (S_{x h}^{2} + s_{x h}^{2}) + 2 C_{x h}}\}] exp [δ_{2 h} \{\frac{(R_{M h} - R_{m h}) (S_{r h}^{2} - s_{r h}^{2})}{(R_{M h} - R_{m h}) (S_{r h}^{2} + s_{r h}^{2}) + 2 C_{r h}}\}]$
${\hat{S}}_{m d_{3}}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp [δ_{1 h} \{\frac{(X_{M h} - X_{m h}) (S_{x h}^{2} - s_{x h}^{2})}{(X_{M h} - X_{m h}) (S_{x h}^{2} + s_{x h}^{2}) + 2}\}] exp [δ_{2 h} \{\frac{(R_{M h} - R_{m h}) (S_{r h}^{2} - s_{r h}^{2})}{(R_{M h} - R_{m h}) (S_{r h}^{2} + s_{r h}^{2}) + 2}\}]$
${\hat{S}}_{m d_{4}}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp [δ_{1 h} \{\frac{(X_{M h} - X_{m h}) (S_{x h}^{2} - s_{x h}^{2})}{(x_{M h} - X_{m h}) (S_{x h}^{2} + s_{x h}^{2}) + 2 β_{2 (x h)}}\}] exp [δ_{2 h} \{\frac{(R_{M h} - R_{m h}) (S_{r h}^{2} - s_{r h}^{2})}{(R_{M h} - R_{m h}) (S_{r h}^{2} + s_{r h}^{2}) + 2 β_{2 (r h)}}\}]$
${\hat{S}}_{m d_{5}}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp [δ_{1 h} \{\frac{β_{2 (x h)} (S_{x h}^{2} - s_{x h}^{2})}{β_{2 (x h)} (S_{x h}^{2} + s_{x h}^{2}) + 2 (X_{M h} - X_{m h})}\}] exp [δ_{2 h} \{\frac{β_{2 (r h)} (S_{r h}^{2} - s_{r h}^{2})}{β_{2 (r h)} (S_{r h}^{2} + s_{r h}^{2}) + 2 (R_{M h} - R_{m h})}\}]$
${\hat{S}}_{m d_{6}}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp [δ_{1 h} \{\frac{ρ_{y h x h} (S_{x h}^{2} - s_{x h}^{2})}{ρ_{y h x h} (S_{x h}^{2} + s_{x h}^{2}) + 2 (X_{M h} - X_{m h})}\}] exp [δ_{2 h} \{\frac{ρ_{y h r h} (S_{r h}^{2} - s_{r h}^{2})}{ρ_{y h r h} (S_{r h}^{2} + s_{r h}^{2}) + 2 (R_{M h} - R_{m h})}\}]$
${\hat{S}}_{m d_{7}}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp [δ_{1 h} \{\frac{C_{x h} (S_{x h}^{2} - s_{x h}^{2})}{C_{x h} (S_{x h}^{2} + s_{x h}^{2}) + 2 (X_{M h} - X_{m h})}\}] exp [δ_{2 h} \{\frac{C_{r h} (S_{r h}^{2} - s_{r h}^{2})}{C_{r h} (S_{r h}^{2} + s_{r h}^{2}) + 2 (R_{M h} - R_{m h})}\}]$
${\hat{S}}_{m d_{8}}^{2} = \sum_{h = 1}^{L} ϕ_{h} W_{h}^{2} s_{y h}^{2} exp [δ_{1 h} \{\frac{(X_{M h} - X_{m h}) (S_{x h}^{2} - s_{x h}^{2})}{(X_{M h} - X_{m h}) (S_{x h}^{2} + s_{x h}^{2}) + 2 ρ_{y h x h}}\}] exp [δ_{2 h} \{\frac{(R_{M h} - R_{m h}) (S_{r h}^{2} - s_{r h}^{2})}{(R_{M h} - R_{m h}) (S_{r h}^{2} + s_{r h}^{2}) + 2 ρ_{y h r h}}\}]$

Table 3. MSEs of different estimators using the artificial populations.

Estimator	Pop-I	Pop-II	Pop-III	Pop-IV	Pop-V	Pop-VI
[1] ${\hat{S}}_{y s t}^{2}$	6.43e-5	7.78e-5	7.28e-4	7.43e-4	6.74e-3	6.50e-3
[2] ${\hat{S}}_{t s t}^{2}$	4.89e-5	5.09e-5	4.09e-4	4.79e-4	5.00e-3	5.80e-3
[3] ${\hat{S}}_{l r s t}^{2}$	4.09e-5	5.88e-5	4.90e-4	4.60e-4	5.40e-3	5.90e-3
[4] ${\hat{S}}_{b t s t}^{2}$	4.13e-5	5.06e-5	4.08e-4	4.45e-4	5.20e-3	5.00e-3
[5] ${\hat{S}}_{u s s t}^{2}$	4.23e-5	5.85e-5	4.80e-4	4.30e-4	5.00e-3	5.50e-3
[6] ${\hat{S}}_{a s t}^{2}$	4.03e-5	5.04e-5	4.67e-4	4.00e-4	4.98e-3	5.30e-3
[7] ${\hat{S}}_{b s t}^{2}$	4.03e-5	5.04e-5	4.60e-4	4.00e-4	4.98e-3	5.30e-3
[8] ${\hat{S}}_{c s t}^{2}$	4.02e-5	5.03e-5	4.06e-4	3.80e-4	4.70e-3	5.10e-3
[9] ${\hat{S}}_{m d_{1}}^{2}$	2.90e-5	3.87e-5	2.08e-4	2.60e-4	2.57e-3	2.40e-3
[10] ${\hat{S}}_{m d_{2}}^{2}$	2.88e-5	3.29e-5	2.89e-4	2.80e-4	2.76e-3	2.60e-3
[11] ${\hat{S}}_{m d_{3}}^{2}$	2.93e-5	3.93e-5	2.05e-4	2.30e-4	2.40e-3	2.20e-3
[12] ${\hat{S}}_{m d_{4}}^{2}$	2.80e-5	3.53e-5	2.53e-4	2.00e-4	2.30e-3	2.30e-3
[13] ${\hat{S}}_{m d_{5}}^{2}$	2.40e-5	3.06e-5	2.06e-4	2.50e-4	2.60e-3	2.50e-3
[14] ${\hat{S}}_{m d_{6}}^{2}$	2.40e-5	3.16e-5	2.66e-4	2.57e-4	2.80e-3	2.45e-3
[15] ${\hat{S}}_{m d_{7}}^{2}$	2.30e-5	3.52e-5	2.02e-4	1.90e-4	2.15e-3	2.00e-3
[16] ${\hat{S}}_{m d_{8}}^{2}$	2.20e-5	3.02e-5	2.00e-4	1.70e-4	1.79e-3	1.60e-3

Table 4. PREs of different estimators using the artificial populations.

Estimator	Pop-I	Pop-II	Pop-III	Pop-IV	Pop-V	Pop-VI
[1] ${\hat{S}}_{y s t}^{2}$	100	100	100	100	100	100
[2] ${\hat{S}}_{t s t}^{2}$	131.43	152.85	177.99	155.11	134.80	112.00
[3] ${\hat{S}}_{l r s t}^{2}$	157.22	132.32	177.99	161.52	124.81	110.00
[4] ${\hat{S}}_{b t s t}^{2}$	155.69	153.76	178.43	166.96	129.62	130.00
[5] ${\hat{S}}_{u s s t}^{2}$	152.01	132.99	151.67	172.79	134.80	118.19
[6] ${\hat{S}}_{a s t}^{2}$	159.55	154.37	155.89	185.75	135.34	122.64
[7] ${\hat{S}}_{b s t}^{2}$	159.53	154.37	158.26	185.75	135.34	122.64
[8] ${\hat{S}}_{c s t}^{2}$	159.95	154.67	179.31	195.52	143.40	127.45
[9] ${\hat{S}}_{m d_{1}}^{2}$	221.72	201.34	350	285.77	262.26	270.83
[10] ${\hat{S}}_{m d_{2}}^{2}$	223.27	236.47	251.90	265.36	244.20	250.00
[11] ${\hat{S}}_{m d_{3}}^{2}$	219.45	197.96	355.39	323.05	280.83	295.46
[12] ${\hat{S}}_{m d_{4}}^{2}$	229.64	220.30	287.75	371.50	293.04	282.61
[13] ${\hat{S}}_{m d_{5}}^{2}$	267.92	254.25	353.39	297.20	259.230	260.00
[14] ${\hat{S}}_{m d_{6}}^{2}$	267.92	246.21	273.68	289.11	240.71	265.31
[15] ${\hat{S}}_{m d_{7}}^{2}$	279.57	221.027	360.39	391.06	313.49	325.00
[16] ${\hat{S}}_{m d_{8}}^{2}$	292.28	257.62	364.00	437.06	376.54	361.11

Table 5. MSEs using empirical datasets.

Estimator	Data Set-1	Data Set-2	Data Set-3
[1] ${\hat{S}}_{y s t}^{2}$	1.45e+20	2.599e+20	2.853
[2] ${\hat{S}}_{t s t}^{2}$	9.07.e+20	1.446e+20	2.733
[3] ${\hat{S}}_{l r s t}^{2}$	6.36e+20	1.233e+20	2.707
[4] ${\hat{S}}_{b t s t}^{2}$	6.72e+20	1.358e+20	2.520
[5] ${\hat{S}}_{u s s t}^{2}$	8.67e+20	1.428e+20	2.610
[6] ${\hat{S}}_{a s t}^{2}$	9.07e+20	1.446e+20	2.723
[7] ${\hat{S}}_{b s t}^{2}$	9.07e+20	1.446e+20	2.732
[8] ${\hat{S}}_{c s t}^{2}$	8.13e+20	1.431e+20	2.186
[9] ${\hat{S}}_{m d_{1}}^{2}$	1.165e+20	9.648e+19	1.810
[10] ${\hat{S}}_{m d_{2}}^{2}$	8.691+19	7.273e+19	2.114
[11] ${\hat{S}}_{m d_{3}}^{2}$	8.698+19	7.278e+19	2.113
[12] ${\hat{S}}_{m d_{4}}^{2}$	8.700e+19	7.287e+19	1.893
[13] ${\hat{S}}_{m d_{5}}^{2}$	1.079+20	8.909e+19	1.894
[14] ${\hat{S}}_{m d_{6}}^{2}$	1.200e+20	1.023e+20	1.929
[15] ${\hat{S}}_{m d_{7}}^{2}$	1.274e+20	1.054e+20	2.080
[16] ${\hat{S}}_{m d_{8}}^{2}$	8.684e+19	7.269e+19	1.715

Table 6. PREs using empirical datasets.

Estimator	Data Set-1	Data Set-2	Data Set-3
[1] ${\hat{S}}_{y s t}^{2}$	100	100	100
[2] ${\hat{S}}_{t s t}^{2}$	96.354	179.780	104.394
[3] ${\hat{S}}_{l r s t}^{2}$	140.786	210.872	105.399
[4] ${\hat{S}}_{b t s t}^{2}$	140.715	191.442	113.210
[5] ${\hat{S}}_{u s s t}^{2}$	100.2762	182.043	109.301
[6] ${\hat{S}}_{a s t}^{2}$	96.355	179.790	104.576
[7] ${\hat{S}}_{b s t}^{2}$	96.354	179.780	104.439
[8] ${\hat{S}}_{c s t}^{2}$	107.893	181.588	130.526
[9] ${\hat{S}}_{m d_{1}}^{2}$	202.430	269.4001	157.609
[10] ${\hat{S}}_{m d_{2}}^{2}$	271.396	357.403	134.865
[11] ${\hat{S}}_{m d_{3}}^{2}$	271.168	357.124	134.944
[12] ${\hat{S}}_{m d_{4}}^{2}$	270.817	356.684	135.027
[13] ${\hat{S}}_{m d_{5}}^{2}$	218.660	291.734	150.642
[14] ${\hat{S}}_{m d_{6}}^{2}$	196.486	254.020	147.936
[15] ${\hat{S}}_{m d_{7}}^{2}$	185.193	246.657	136.519
[16] ${\hat{S}}_{m d_{8}}^{2}$	271.617	357.568	166.364

Table 7. Summary statistics for data set-1.

$N_{1} = 18$	$n_{1} = 8$	$\bar{X_{1}} = 962$	$\bar{Y_{1}} = 162979$	$\bar{R_{1}} = 9.500$
$X_{M_{1}} = 1530$	$X_{m_{1}} = 388$	$R_{M_{1}} = 36$	$R_{m_{1}} = 19$	$S_{x_{1}} = 308$
$S_{y_{1}} = 255887$	$S_{r_{1}} = 5.338$	$C_{x_{1}} = 0.320$	$C_{y_{1}} = 1.571$	$C_{r_{1}} = 0.562$
$ρ_{y_{1} x_{1}} = 0.145$	$ρ_{y_{1} r_{1}} = 0.135$	$ρ_{x_{1} r_{1}} = 0.802$	$δ_{4001} = 2625$	$δ_{0401} = 3237$
$δ_{0041} = 1.692$	$δ_{2201} = 1568$	$δ_{2021} = 1548$	$δ_{0221} = 1298$	$ϕ_{1} = 0.069$
$N_{2} = 18$	$n_{2} = 8$	$\bar{X_{2}} = 1146$	$\bar{Y_{2}} = 134458$	$\bar{R_{2}} = 27.500$
$X_{M_{2}} = 2370$	$X_{m_{2}} = 58$	$R_{m_{2}} = 19$	$R_{M_{2}} = 36$	$S_{x_{2}} = 469.931$
$S_{y_{2}} = 50236$	$S_{r_{2}} = 5.338$	$C_{x_{2}} = 0.409$	$C_{y_{2}} = 0.374$	$C_{r_{2}} = 0.194$
$ρ_{y_{2} x_{2}} = 0.787$	$ρ_{y_{2} r_{2}} = 0.657$	$ρ_{x_{2} r_{2}} = 0.889$	$δ_{4002} = 2240$	$δ_{0402} = 2558$
$δ_{0042} = 1.622$	$δ_{2202} = 1807$	$δ_{2022} = 2049$	$δ_{0222} = 1200$	$ϕ_{2} = 0.069$

Table 8. Summary statistics for data set-2.

$N_{1} = 18$	$n_{1} = 8$	$\bar{X_{1}} = 415$	$\bar{Y_{1}} = 85572$	$\bar{R_{1}} = 9.500$
$X_{M_{1}} = 2055$	$X_{m 1} = 24$	$R_{M_{1}} = 18$	$R_{m_{1}} = 1$	$S_{x_{1}} = 521.675$
$S_{y_{1}} = 248216$	$S_{r_{1}} = 5.338$	$C_{x_{1}} = 1.258$	$C_{y_{1}} = 2.901$	$C_{r_{1}} = 0.562$
$ρ_{y_{1} x_{1}} = 0.337$	$ρ_{y_{1} r_{1}} = 0.304$	$ρ_{x_{1} r_{1}} = 0.709$	$δ_{4001} = 3270$	$δ_{0401} = 3345$
$δ_{0041} = 1.692$	$δ_{2201} = 2398$	$δ_{2021} = 1267$	$δ_{0221} = 944$	$ϕ_{1} = 0.069$
$N_{2} = 18$	$n_{2} = 8$	$\bar{X_{2}} = 257$	$\bar{Y_{2}} = 19294$	$\bar{R_{2}} = 27.500$
$X_{M_{2}} = 1674$	$X_{m_{2}} = 52$	$R_{m_{2}} = 19$	$R_{M_{2}} = 36$	$S_{x_{2}} = 366$
$S_{y_{2}} = 37979$	$S_{r_{2}} = 5.338$	$C_{x_{2}} = 1.423$	$C_{y_{2}} = 1.969$	$C_{r_{2}} = 0.194$
$ρ_{y_{2} x_{2}} = 0.976$	$ρ_{y_{2} r_{2}} = 0.565$	$ρ_{x_{2} r_{2}} = 0.786$	$δ_{4002} = 2542$	$δ_{0402} = 2388$
$δ_{0042} = 1.622$	$δ_{2202} = 2246$	$δ_{2022} = 739$	$δ_{0222} = 988$	$ϕ_{2} = 0.069$

Table 9. Summary statistics for data set-3.

$N_{1} = 18$	$n_{1} = 8$	$\bar{X_{1}} = 72.550$	$\bar{Y_{1}} = 27.490$	$\bar{R_{1}} = 9.500$
$X_{M_{1}} = 95$	$X_{m 1} = 28$	$R_{M 1} = 18$	$R_{m 1} = 1$	$S_{x_{1}} = 10.580$
$S_{y_{1}} = 10.130$	$S_{r_{1}} = 5.338$	$C_{x_{1}} = 0.155$	$C_{y_{1}} = 0.376$	$C_{r_{1}} = 0.562$
$ρ_{y_{1} x_{1}} = 0.337$	$ρ_{y_{1} r_{1}} = 0.284$	$ρ_{x_{1} r_{1}} = 0.557$	$δ_{4001} = 2.550$	$δ_{0401} = 2.845$
$δ_{0041} = 1.692$	$δ_{2201} = 3.158$	$δ_{2021} = 4.544$	$δ_{0221} = 4.542$	$ϕ_{1} = 0.069$
$N_{2} = 18$	$n_{2} = 8$	$\bar{X_{2}} = 60.870$	$\bar{Y_{2}} = 20.820$	$\bar{R_{2}} = 27.500$
$X_{M_{2}} = 75$	$X_{m_{2}} = 15$	$R_{m_{2}} = 19$	$R_{M_{2}} = 36$	$S_{x_{2}} = 8.980$
$S_{y_{2}} = 12.750$	$S_{r_{2}} = 5.338$	$C_{x_{2}} = 0.142$	$C_{y_{2}} = 0.269$	$C_{r_{2}} = 0.194$
$ρ_{y_{2} x_{2}} = 0.496$	$ρ_{y_{2} r_{2}} = 0.297$	$ρ_{x_{2} r_{2}} = 0.756$	$δ_{4002} = 4.142$	$δ_{0402} = 3.934$
$δ_{0042} = 1.622$	$δ_{2202} = 1.384$	$δ_{2022} = 1.239$	$δ_{0222} = 2.488$	$ϕ_{2} = 0.069$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Dual Transformation of Auxiliary Variables by Using Outliers in Stratified Random Sampling

Abstract

1. Introduction

2. Concepts and Notations

3. Existing Estimators

4. Proposed Estimator

Properties of the Proposed Estimator

5. Mathematical Comparison

6. Numerical Comparison

6.1. Simulation Study

6.2. Numerical Examples

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics