Use of Nonconventional Dispersion Measures to Improve the E ﬃ ciency of Ratio-Type Estimators of Variance in the Presence of Outliers

: The use of auxiliary information in survey sampling to enhance the e ﬃ ciency of the estimators of population parameters is a common phenomenon. Generally, the ratio and regression estimators are developed by using the known information on conventional parameters of the auxiliary variables, such as variance, coe ﬃ cient of variation, coe ﬃ cient of skewness, coe ﬃ cient of kurtosis, or correlation between the study and auxiliary variable. The e ﬃ ciency of these estimators is dubious in the presence of outliers in the data and a nonsymmetrical population. This study presents improved variance estimators under simple random sampling without replacement with the assumption that the information on some nonconventional dispersion measures of the auxiliary variable is readily available. These auxiliary variables can be the inter-decile range, sample inter-quartile range, probability-weighted moment estimator, Gini mean di ﬀ erence estimator, Downton’s estimator, median absolute deviation from the median, and so forth. The algebraic expressions for the bias and mean square error of the proposed estimators are obtained and the e ﬃ ciency conditions are derived to compare with the existing estimators. The percentage relative e ﬃ ciencies are used to numerically compare the results of the proposed estimators with the existing estimators by using real datasets, indicating the supremacy of the suggested estimators.


Introduction
Suppose a finite population W = {W 1 , W 2 , . . . , W N } consists of N different and identifiable units. Let Y be a measurable variable of interest with values Y i being ascertained on W i ; i = 1, 2, . . . , N resulting in a set of observations Y = {Y 1 , Y 2 , . . . , Y N }. The purpose of the measurement process is to estimate the population variance S 2 by drawing a random sample from the population. Suppose that the information on auxiliary variable X which is correlated with the study variable Y is also available against every unit of the population. The use of auxiliary information to improve the efficiency of the estimators of population parameters is very popular, especially when information on the auxiliary variable is readily available and is highly correlated with the variable of interest. For instance, see [1][2][3] and the references cited therein. The ratio, product, and regression method of estimations are frequently used to enhance the efficiency of the estimators depending upon the nature of the relationship between the study and the auxiliary variables. Along with the estimation of population mean, the estimation of population variance is of importance in many real-life situations. Generally, estimation of population variance is dealt with in the context of augmenting the conventional parameters of the auxiliary variable through a ratio or regression method of estimation to achieve greater efficiency. Mostly coefficient of skewness, coefficient of kurtosis, coefficient of variation, and coefficient of correlation are used in linear combination with some other conventional parameters of the auxiliary variable to estimate the variance. The readers can refer to [4][5][6][7][8][9][10][11][12][13][14][15] and the references therein. The auxiliary measures used in most of the existing ratio-type estimators of variance are nonresistant to the presence of outliers or nonsymmetrical populations. Therefore, there is a need to develop such ratio-or regression-type estimators which are somewhat outlier resistant and more stable in the case of asymmetrical populations.
The present study is focused on estimation of population variance by incorporating information on nonconventional dispersion parameters (detailed in Section 3) of the auxiliary variable. These measures are resistant to outliers and used in a linear combination with the other conventional measures to improve the efficiency of the variance estimator under simple random sampling without replacement (SRSWOR) in the presence of outliers in the target population. The rest of the manuscript is structured as follows. The nomenclature used in the manuscript and background of the existing estimators of population variance is described in Section 2, whereas Section 3 presents the proposed improved families of estimators of population variance. In Section 4 the performance of the proposed families of the estimators is evaluated and compared with the existing estimators of population variance. Finally, some concluding remarks are given in Section 5.

Background of the Ratio-Type Estimators of Variance
This section deals with some of the existing estimators of population variance under simple random sampling (SRS), which utilizes the known information on conventional parameters of the auxiliary variable to enhance the efficiency of the variance estimators. Before going into the details of the existing estimators of population variance, the notation used in this manuscript is as follows. Population median of the auxiliary variable Q 1(X) Population lower quartile of the auxiliary variable Population upper quartile of the auxiliary variable Q r(X) Population inter-quartile range of the auxiliary variable Q a(X) Population inter-quartile average of the auxiliary variable Q d(X) Population semi inter-quartile range of the auxiliary variable D i(X) ith (i = 1, 2, . . . , 10) population decile of the auxiliary variable B(.) Bias of the estimator; MSE(.) Mean square error of the estimator S 2 R Traditional ratio estimator;Ŝ 2 i Existing ratio estimator S 2

SK(i)
The class of estimators introduced by [16] S 2 p1−j Proposed class-I estimators;Ŝ 2 p2−l Proposed class-II estimators The traditional ratio-type estimator of Isaki [4] utilizes information on the known variance S 2 X of the auxiliary variable to estimate the population variance S 2 Y . The estimator, along with its approximate bias and MSE, is given asŜ Several modifications and improved estimators of variance which have been proposed in the literature make use of different conventional characteristics of the auxiliary variable. All these estimators exhibit superior efficiency as compared to the traditional ratio estimator under certain theoretical conditions. Some of the existing estimators, which utilize information on the variance of the auxiliary variable linearly integrated with other conventional parameters of the auxiliary variable, are summarized in Table 1 with their respective bias, MSEs, and constants.
The general structure of the Subramani and Kumarapandiyan [16] class of estimators is given aŝ where ω i , i = 1, 2, . . . , 51 are different choices of parameters of the auxiliary variable X which can be found in Appendix A. All these estimators exhibit superior efficiency as compared to the traditional ratio-type estimators suggested by Isaki [4] under certain conditions, but most of these estimators are based on conventional parameters of the auxiliary variable. Although many other estimators of variance are available in the existing literature, they are more complex and involve laborious computational details. Therefore, the above detailed estimators were chosen for comparison purposes due to the simplicity of their structure and relatively lower computational complexities for practitioners.

Proposed Estimators of Variance
This section presents two different families of ratio estimators of population variance for the case where information on some nonconventional measures of dispersion of the auxiliary variable is readily available. The nonconventional measures used to develop the new ratio estimators of variance include the following. i.
Inter-decile Range: The inter-decile range is the difference between the largest decile D 9(X) and smallest decile D 1(X) . Symbolically it is given as ii. Sample Inter-quartile Range: The sample inter-quartile range is based on the difference between the upper Q 3(X) and lower Q 1(X) quartiles as discussed by Riaz [19] and Nazir et al. [20]. It is computed as Probability Weighted Moment Estimator: The probability weighted moment estimator of dispersion suggested by Downton [21] is based on the ordered sample statistics and it is defined as where X (i) denotes the ith order sample statistics.
iv. Downton's Estimator: Another estimator of dispersion, similar to SPW X , was proposed by Downton [21] and defined as Gini Mean Difference Estimator: Gini [22] introduced a dispersion estimator which is also based on the sample order statistics. It is given as vi. Median Absolute Deviation from Median: Hampel [23] suggested an estimator of dispersion based on absolute deviation from the median. It is defined as The Median of Pairwise Distances: Shamos [24] (p. 260) and Bickel and Lehmann [25] (p. 38) suggested an estimator of dispersion which is based on the median of pairwise distances as [median|X i − X l |; i < l]. Rousseeuw and Croux [26] suggested to pre-multiply it by 1.0483 to achieve consistency under the Gaussian distribution, and the resultant estimator can be defined as

viii.
Median Absolute Deviation from Mean: Wu et al. [27] defined another estimator which is also based on absolute deviation from the mean. It is given as ix. Mean Absolute Deviation from Mean: Wu et al. [27] suggested an estimator of dispersion which is based on absolute deviation from the mean. It is given as 2533 .
x. Average Absolute Deviation from Median: Wu et al. [27] suggested an estimator of dispersion which is based on the average of absolute deviation from the median. It is given as xi. The Ordered Statistic of Subranges: A robust estimator of dispersion based on the order of subranges was introduced by Croux and Rousseeuw [28], defined as , where the symbol [·] represents the integer part of a fraction. xii.
Trimmed Mean of Median of Pairwise Distances: Croux and Rousseeuw [28] defined another robust estimator of dispersion which is based on the trimmed mean of the median of pairwise distances. It is given as , where for each i, we compute the median of |X i − X l |, l = 1, 2, 3, . . . , n that yields n values, and the average of the first h order statistics gives the final estimate T n X , where h = n 2 + 1, which is roughly half of the number of observations. xiii.
The 0.25-quantile of Pairwise Distances: Another incorporated in this study as a non-conventional dispersion measure is due to Rousseeuw and Croux [26] and is defined as where d is the constant factor and its default value is 2.2219 to make it a consistent estimator under normality, while p = h! 2!(h−2)! ≈ N 2 /4 and h = n 2 + 1. Thus, the pth order statistic of the N 2 interpoint distances yields the desired estimator. xiv.

The Median of the Median of Distances:
This study also includes a robust estimator of dispersion defined in Rousseeuw and Croux [26]. It is given as where C is a constant used for consistency and under a normal population its value is usually set to 1.1926.
The abovementioned nonconventional measures are used in conjunction with other conventional measures such as the coefficient of skewness, the coefficient of variation, and the coefficient of correlation in the context of ratio and regression methods of estimation under SRSWOR to propose new estimators of population variance. The detailed properties of the above nonconventional measures can be found in the relevant cited references.

The Suggested Estimators of Class-I
Motivated by Abid et al. [29], we propose a new class of ratio estimators of variance under SRS by using the power transformation and the Searls [30] technique as follows: where L is the Searls [30] constant, s 2 y is the sample variance of the study variable, and X and x are the population and sample mean of the auxiliary variable, respectively. It is worth mentioning that ϕX + ψ > 0, (ϕx + ψ) > 0, and (ϕ, δ) can either be known real numbers or known conventional parameters of the auxiliary variable X, whereas (ψ, ν) are the known nonconventional dispersion parameters of the auxiliary variable X.
To obtain the bias and MSE ofŜ 2 p1−j , in terms of relative errors, we can express e 0 = After putting the values of e 0 and e 1 into Equation (4), we get where ξ 1 = ϕX ϕX+ψ and ξ 2 = δX δX+ν .
Assuming |ξ 1 e 1 | < 1 so that (1 + ξ 1 e 1 ) −ξ 2 is expandable, expanding the right-hand side of Equation (5) and neglecting the terms of e's having power greater than two, we have Subtracting S 2 y from both sides and simplifying, we get By taking the expectation on both sides of Equation (6), we get the bias ofŜ 2 p1− j up to the first degree of approximation as The mean square error ofŜ 2 p1−j is defined as So, squaring both sides of Equation (6), keeping terms of e's only up to the second order and applying expectation, the MSE ofŜ 2 p1−j up to the first degree of approximation is represented as Differentiating Equation (8) with respect to L, equating it to zero, and after simplification, we get the optimum value of L as where . Substituting the above result into Equation (8) and simplifying, the minimum MSE ofŜ 2 p1− j is The exact values of L 1 and L 2 can easily be obtained by substituting the known results for E e 2 0 , E e 2 1 , and E(e 0 e 1 ) into their respective expressions, which are given as The proposed class-I encompasses different kinds of existing estimators by specifying the values of the constants. For example, if we set L = ϕ = δ = 1 and ψ = ν = 0, then the estimator suggested by Upadhyaya and Singh [17] is a member of the proposedŜ 2 p1−j class of estimators. Similarly, if we set L = ϕ = δ = 1, ψ = ω i , and ν = 0, then the Subramani and Kumarapandiyan [16] class of estimators becomes a member of the proposedŜ 2 p1− j class. Some new members of the proposed class-I estimators which are based on integration of conventional parameters and nonconventional dispersion parameters of the auxiliary variable are given in Table 2. It is worth mentioning that many other estimators can be generated from the proposedŜ 2 p1−j class of estimators, but to conserve space only a few are given.
y C X X+Tn X C X x+Tn X C X X C X X+Tn X C X Tn X C X Tn X

Efficiency Conditions for Class-I Estimators
The estimators of class-I perform better than the traditional estimator of Isaki [4] for estimating the population variance if The estimators defined in class-I will achieve greater efficiency as compared to the estimators defined in Section 2, i.e.,Ŝ 2 The suggested class-I estimators will outperform the Upadhyaya and Singh [17] modified ratio-type estimator of population variance in terms of efficiency if The estimators envisaged in the proposed classŜ 2 p1− j will exhibit superior performance as compared to the Subramani and Kumarapandiyan [16] modified class of estimators if

The Suggested Estimators of Class-II
In this section, we present a new class of regression-type estimators of population variance. The proposed class of estimatorsŜ 2 p2−m is defined aŝ where M is the Searls [30] constant, α and τ can be real numbers or functions of known conventional parameter of the auxiliary variable X, ζ and υ are the known functions of the nonconventional dispersion parameters of the auxiliary variable X, and b is the regression coefficient between the study and auxiliary variables. The minimized bias and minimized MSE of the class-II estimators are obtained by adapting the procedure given in Section 3.1: where The estimators envisaged in class-II incorporate many existing estimators of population variance. For instance, if we set M = α = τ = 1 and ζ = υ = b = 0, then the estimator suggested by Upadhyaya and Singh [17] is a member of theŜ 2 p2−m class of estimators. Similarly, if we set M = α = τ = 1, ζ = ω i , and υ = b = 0, then the Subramani and Kumarapandiyan [16] class of estimators becomes a member of the proposedŜ 2 p2−m class of estimators. Moreover, if the regression coefficient b = 0, the class of estimators defined in Section 3.1 is also a member of the proposed classŜ 2 p2−m . Table 3 contains some new members of the proposed class-II to estimate the population variance based on auxiliary information. Table 3. Some new members of proposed class-II estimators.  Table 3. Cont. Table 3. Cont.

Efficiency Conditions for Class-II Estimators
The estimators in class-II will perform better than the Isaki [4] traditional ratio estimator of The estimators defined in class-II will be superior in terms of efficiency as compared to the estimators defined in Section 2, i.e.,Ŝ 2 The suggested class-II estimators will outperform the Upadhyaya and Singh [17] modified ratio-type estimator of population variance in terms of efficiency if The estimators envisaged in the proposed class-II will exhibit superior performance as compared to the Subramani and Kumarapandiyan [16] modified class of estimators if

Empirical Study
To assess the performance of the proposed classes of estimators in comparison to their competing estimators of variance, two real populations were taken from Singh and Chaudhary [31] (p. 177). These are the same datasets which were considered by Subramani and Kumarapandiyan [16]. In population-I, Y denotes the area under wheat crop (in acres) during 1974 in 34 villages and X denotes the area under wheat crop (in acres) during 1971 in the same villages; in population-II, Y is the same as in population-I and X is the area under wheat crop (in acres) during 1973. As mentioned earlier, the estimators in this study are nonconventional and somewhat robust dispersion measures. Moreover, these measures perform more efficiently in the presence of outliers in the data as compared to other conventional measures. So, it is expected that proposed classes of estimators will exhibit superior efficiency as compared to the existing and the traditional ratio estimators. The data of both the populations contain outliers, which is observable from the boxplots shown in Figures 1 and 2.   The comparison between the proposed classes of estimators and the existing estimators was made based on their percentage relative efficiencies (PREs) as compared to the traditional ratio estimator of variance suggested by Isaki [4]. The PRE of the proposed estimators relative to the traditional estimator is defined as where PRE (p) denotes the percentage relative efficiency of the proposed estimator in comparison with the traditional estimator, MSE (Trd) is the mean square error of the traditional estimator, and MSE (p) is the mean square error of the proposed classes of estimator. It is worth mentioning that due to the length of study, from the class of estimators proposed by Subramani and Kumarapandiyan [16], we took only its most efficient estimator, which is based on the D 10(X) , for comparison purposes. The population characteristics are summarized in Table 4. The PREs of the existing estimators as compared to the traditional ratio estimator by Isaki [4] are shown in Tables 5 and 6 for population-1 and -2, respectively, while the PREs of the proposed estimators as compared to the traditional ratio estimator by Isaki [4] are given in Tables 7 and 8, respectively. For better understanding, Figures 3 and 4 display the comparative PREs of the existing and proposed estimators against population-1 and -2, respectively, where the best estimator from each of the proposed classes was chosen for comparison for better visual display.   From the results reported in Tables 5-8, the findings are summarized as follows: 1.
The estimators proposed in class-I and class-II have higher PREs as compared to the existing estimators for both the populations considered in this study, which reveals the supremacy of the proposed classes of estimators in the presence of outliers in the data (cf . Tables 5-8 and Figures 3 and 4). For instance, the suggested estimators of class-I and class-II are at least 38% more efficient as compared to the traditional ratio estimator for population-I. For population-II, the efficiency of the suggested estimators exceeds 44%. All existing estimators are at most 11% and 17% more efficient as compared to the traditional ratio estimator for population-I and population-II, respectively. 2.
The class-I estimators have higher PREs as compared to class-II proposed in this study (cf . Tables 7  and 8). 3.
The estimators which integrate information on the nonconventional dispersion parameter of the auxiliary variable and correlation coefficient between the study and auxiliary variables were found to be superior in terms of efficiency as compared to other estimators (cf . Tables 7 and 8).

4.
The estimator which is based on inter-decile range and the correlation coefficient between the study and auxiliary variables turned out to be the most efficient estimator.

5.
It was also observed that the performance of existing estimators in comparison with the traditional ratio estimators is not much superior in the presence of outliers in the data (cf. Tables 5 and 6), whereas the suggested estimators perform quite well as compared to the existing and the traditional ratio estimators (cf . Tables 7 and 8). These findings highlight the significance of using nonconventional measures in estimating the population variance in the presence of outliers.

Conclusions
This study introduced two new classes of ratio-and regression-type estimators of population variance under simple random sampling without replacement by integrating information on nonconventional and somewhat robust dispersion measures of an auxiliary variable. The expressions for bias and mean square error were obtained, and the efficiency conditions under which the proposed estimators perform better than the existing estimators were also derived. In support of the theoretical findings, an empirical study was conducted based on two real populations which revealed that the suggested classes of estimators outperform the existing estimators considered in this study in terms of PREs in the presence of outliers. Based on the findings, it is strongly recommended that the proposed classes of estimators be used instead of the existing estimators to estimate the population variance in the case of outliers in the dataset. The present study can be further extended by estimating the population variance in the case of two auxiliary variables; moreover, under different sampling schemes, nonconventional measures can be employed to enhance the efficiency of the variance estimators. Funding: There was no funding for this paper.