Next Article in Journal
Disturbance Decoupling Problem: Logic-Dynamic Approach-Based Solution
Previous Article in Journal
Involution Abel–Grassmann’s Groups and Filter Theory of Abel–Grassmann’s Groups

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# On Efficient Estimation of Process Variability

by
Tanveer Akhlaq
1,
1 and
2,*
1
Department of Statistics, COMSATS University Islamabad, Lahore 54900, Pakistan
2
Department of Statistics, King Abdulaziz University, Jeddah 21589, Saudi Arabia
*
Author to whom correspondence should be addressed.
Symmetry 2019, 11(4), 554; https://doi.org/10.3390/sym11040554
Submission received: 2 April 2019 / Revised: 15 April 2019 / Accepted: 15 April 2019 / Published: 17 April 2019

## Abstract

:
Variability or dispersion plays an important role in any process and provides insight into the spread of data from some central point, usually the mean. A process with less spread is preferred over a process in which values differ greatly from the mean. Various methods are available to estimate the process dispersion by using information on the variable of interest. Certain additional variables provide good insight to estimate the process dispersion. In this paper, we propose an efficient method for the estimation of process variability by using the exponential method. The properties of the proposed method were studied. We conducted simulation and empirical studies to compare the proposed method with some existing methods of estimation of variability. The results of the numerical study show that our proposed method is better than the other methods used in the study.

## 1. Introduction

Several authors proposed various methods to estimate the process variation using the information of an additional or auxiliary variable. The method of using the information of an additional variable characterizes a certain estimation method and is very useful in decreasing the variance of estimation.
Several studies were conducted by many researchers to estimate the variability of a process, known as the population variance of the study variable $S y 2$. The simplest estimator of population variance is the sample variance $s y 2$, which was proposed by Reference [1]. Some other estimators of variance were proposed in References [2,3,4,5,6,7] and are based on certain information of one or more auxiliary variables (X). The method of using auxiliary information is very useful in proposing efficient estimates of the population variance. Some methods which are useful in the estimation of population variance are given in the following section.

## 2. Materials and Methods

Suppose we have a finite process which can produce N units; let the value of the characteristic of interest be $U 1 , U 2 , … , N N$. Suppose that the variability of the process is to be estimated on the basis of a sample of size n. Let Y represent the variable of interest and X represent the auxiliary variable. Also, let $Y ¯$ and $S y 2$ be the mean and variance of the variable of interest, and let $X ¯$ and $S x 2$ be the mean and variance of the auxiliary variable for the whole process. The corresponding sample measures are $y ¯ , s y 2 , x ¯$, and $s x 2$, respectively. When information on two variables is available, then the strength of interdependence (correlation coefficient) between these variables is of importance and is denoted by $ρ x y$. Also, let $β 2 / ( y )$ and $β 2 / ( x )$ be the coefficients of kurtosis of the study and auxiliary variables, and let $β 1 ( x )$ be the squared coefficient of skewness of the auxiliary variable. Some additional notations that are useful in studying the properties of various estimators of population variance are given below.
.
We also use $s y 2 = S y 2 ( 1 + e 0 )$ and $s x 2 = S x 2 ( 1 + e 1 )$ in practice, such that $E ( e 0 2 ) = β 2 / ( y )$, $E ( e 1 2 ) = λ β 2 / ( x )$, and $E ( e 0 e 1 ) = λ h /$, with $λ = N − 1$.
We will now discuss some available estimators for process variability (the population variance).
The usual unbiased estimator of population variance is $t 0 = s y 2$ with the mean-square error (MSE) given as
$M S E ( t o ) = 1 n S y 4 [ β 2 / ( y ) − 1 ) ]$
The estimator given above is based on the information of the study variable only. An estimator of variance which utilizes the information of an auxiliary variable was proposed by Reference [1] as
$t I = s y 2 S x 2 s x 2$
The MSE of $t I$ up to the first order is
$M S E ( t I ) ≈ 1 n S y 4 [ β ′ 2 ( y ) + β ′ 2 ( x ) − 2 h ′ ]$
Two exponential-type estimators of variance were proposed by Reference [8] as
$t S R = s y 2 exp ( S x 2 − s x 2 S x 2 + s x 2 )$
and
$t S P = s y 2 exp ( s x 2 − S x 2 S x 2 + s x 2 ) .$
The MSE of the estimators in Equations (4) and (5) are, respectively,
$M S E ( t S R ) = 1 n S y 4 [ β 2 ′ ( y ) + 1 4 β 2 ′ ( x ) − h ′ ]$
and
$M S E ( t S R ) = 1 n S y 4 [ β 2 ′ ( y ) + 1 4 β 2 ′ ( x ) + h ′ ] .$
An exponential-type estimator of variance, proposed by Reference [7], is
$t A = s y 2 exp ( X ¯ − x ¯ X ¯ + x ¯ )$
with MSE as
$M S E ( t A ) = 1 n S y 4 [ β 2 ′ ( y ) + 1 4 C x 2 − λ 21 C x ] ,$
where $C x = X ¯ / S x$ is the coefficient of variation of the auxiliary variable X. An estimator of variance, based on the coefficient of variations, was proposed by Reference [2] as
$t D T = s y 2 ( C x 2 / C ^ x 2 )$
with MSE as
$M S E ( t D T ) = 1 n S y 4 [ β ′ 2 ( y ) + { β ′ 2 ( x ) + 4 C x 2 − 4 β 1 ( x ) C x } − 2 ( h ′ − 2 λ 21 C x ) ] .$
A general procedure for estimating population variance was proposed by Reference [9] as
$t R P = ( s y 2 + d ) [ ω ( C y 2 / C ^ y 2 ) γ + ( 1 − ω ) ( C ^ y 2 / C y 2 ) χ ] − d$
with MSE as
$M S E ( t R P ) = 1 n S y 4 [ β ′ 2 ( y ) − ( h ′ − 2 λ 21 C x ) 2 β ′ 2 ( x ) + 4 C x 2 − 4 β 1 ( x ) C x ] .$
Various authors have used certain transformations of the auxiliary variable to propose different estimators of variance. The popular transformation used by various authors is $s x 2 * = S x 2 ( 1 + g ) − g s x 2$, where $g = n / ( N − n )$; under this transformation, $s x 2 *$ is an unbiased estimator for $S x 2$. Following the same transformation, Reference [6] proposed a dual ratio-type estimator for population variance as
with MSE given as
$M S E ( t Y K 2 ) = 1 n S y 4 [ β ′ 2 ( y ) + g 2 β ′ 2 ( x ) − 2 g h ′ ] .$
The use of a transformed auxiliary variable provides more efficient estimators of population variance; see, for example, References [6,7,8,9,10,11].
The main purpose of this paper is to propose a generalized exponential estimator for population variance by utilizing transformed auxiliary information. The estimator is proposed in Section 3.

## 3. Proposed Estimator

Using the concept of Reference [6], we propose two general exponential-type estimators, known as the exponential ratio and exponential product estimators of population variance, by utilizing transformed auxiliary information x. The proposed estimators are as follows:
$t e R = s y 2 exp ( s x 2 * − S x 2 S x 2 + s x 2 * )$
and
$t e P = s y 2 exp ( S x 2 − s x 2 * S x 2 + s x 2 * ) ,$
where $s x 2 * = S x 2 ( 1 + g ) − g s x 2$. The MSE of the proposed estimators, up to the first order, are
and
$M S E ( t e P ) ≅ S y 4 [ β 2 ′ ( y ) + 1 4 g 2 β 2 ′ ( x ) + g h ′ ] .$
We have also proposed another estimator, following Reference [11] as
$t e R G = s y 2 exp [ β i ( s x 2 * − S x 2 S x 2 + ( a i − 1 ) s x 2 * ) ]$
or
$t e R G = s y 2 exp [ β ( g ( S x 2 − s x 2 ) S x 2 + ( a − 1 ) { S x 2 + g ( S x 2 − s x 2 ) } ) ] .$
Using $s y 2 = S y 2 ( 1 + e o )$ and $s x 2 = S x 2 ( 1 + e 1 )$, and using a Taylor series expansion of Equation (21), we have
$t e R G − S y 2 ≅ S y 2 ( e o − β g e 1 a + β g 2 e 1 2 ( a − 1 ) a 2 + β 2 g 2 e 1 2 2 a 2 − β g e o e 1 a ) .$
The bias of the proposed estimator, up to the first order, is given as
$B i a s ( t e R G ) ≅ S y 2 ( β g 2 ( a − 1 ) β 2 ′ ( x ) a 2 + β 2 g 2 β 2 ′ ( x ) 2 a 2 − β g h ′ a ) .$
The MSE of the estimator in Equation (21) is
$M S E ( t e R G ) ≅ S y 4 ( β 2 ′ ( y ) + β 2 g 2 β 2 ′ ( x ) a 2 − 2 β g h ′ a ) .$
The optimum value of ai is obtained by minimizing Equation (23) and is given as
$a = g . β . β ′ 2 ( x ) h ′ .$
Using the optimum value of $a$ in Equation (23), the minimum value of MSE is
$M S E min ( t e R G ) = 1 n S y 4 [ β ′ 2 ( y ) − h ′ 2 β ′ 2 ( x ) ] .$

#### 3.1. Special Cases

The special cases of the proposed estimator are obtained using different values of the constants involved and are given in Table 1.
We now compare the proposed estimator with some popular available estimators of population variance. The efficiency comparison is given in Section 3.2.

#### 3.2. Efficiency Comparison of the Generalized Exponential Estimator

The comparison of the proposed estimator was done by comparing the mean-square error of the proposed estimator with that of some available estimators. We compared our proposed estimator with $t o$, $t I$, $t S R$, and $t Y K 2$ given in Section 2. The mean-square errors of these estimators are given in Equations (1), (3), (6), and (15), respectively.
Now, the comparison of the proposed estimator with $t 0$ shows that
The comparison of $t e R G$ with $t I$ shows that
Again, the comparison of $t e R G$ with $t S R$ shows that
The comparison of $t e R G$ with $t Y K 2$ shows that
We now give numerical examples for the application of the proposed estimator.

## 4. Results

In this section, we present the numerical studies for the application of the proposed generalized estimator of population variance. The numerical study is twofold. We firstly provide 10 real examples by considering 10 real populations; later, we provide a simulation study to see the performance of the proposed generalized estimator of variance. The results of these studies are given in the sections below.

#### 4.1. Numerical Study

In this section, we present a numerical study to see the performance of the proposed estimator in estimating the variability of processes. These studies were conducted using 10 real populations which were previously used by various authors to see the performance of their proposed estimators. The description of these populations can be found in the source given.
The summary measures for these 10 populations are given in Table 2. The summary measures of two populations indicate that the study variable and auxiliary variable are highly correlated. This is generally an underlying requirement for ratio- and exponential-type estimators for the estimation of population characteristics.
We computed the mean-square error of various estimators using the data of above populations. After computing the mean-square error, we computed the percent relative efficiency (PRE) of various estimators, relative to $t 0$, using Equation (26).
$P R E = M S E ( t i ) M S E ( t 0 ) × 100 ; i = I , S R , A , Y K 2 , r P , e R , e R G .$
The efficiency provides information about the sample size required to achieve a desired result, using an estimator that is provided by the base estimator $t 0$ with a sample of size 100. This, therefore, means that a smaller value of PRE indicates that the estimator is more efficient. The results are given in Table 3.
We can see, from Table 3, that the proposed estimator $t e R G$ is the most efficient estimator, as this estimator requires the least number of observations to obtain a desired result as obtained by $t 0$ with a sample of size 100. We then conducted a regression analysis to see the effect of the correlation coefficient and coefficient of skewness of the auxiliary variable on the efficiency of the estimator. The regression model that we built is as follows:
$P R E = β ^ 0 + β ^ 1 ρ x y + β ^ 2 { β 2 ( x ) } .$
The regression summary alongside the result of the test of significance for each estimator is given in Table 4.
From the above table, we can see that the average efficiency of our proposed estimator, $t e R G$, is the lowest, as $β ^ 0$ for this estimator has the smallest value. We can also see that the correlation coefficient has a significant effect on the efficiency of $t e R G$, and the regression model for this estimator is significant at 5%. The value of $β ^ 0$ for our proposed estimator $t e R$ is the second lowest, which indicates that this estimator is the second best estimator for the estimation of population variance. The regression coefficient $β ^ 2$ is significant at 10% for this estimator, which indicates that the correlation coefficient between the study and auxiliary variables will provide useful information to predict the efficiency of estimator $t e R$. The other significant regression model is for $t r P$; for this estimator, the coefficient of $β ^ 2$ is significant. This indicates that the coefficient of kurtosis of the auxiliary variable has a significant effect on the efficiency of this estimator. We can also see from Table 4 that the estimator $t A$ showed the worst performance in this study, as coefficient $β ^ 0$ for this estimator has the highest value.
We also show the efficiencies of various estimators in Figure 1, where estimators are numbered from 1 to 7 in the order in which they appear in Table 4. This means that estimator $t I$ is numbered 1 and estimator $t e R G$ is numbered 7. The graph also shows that our proposed estimator $t e R G$ is the most efficient estimator in this study.
We now present the results of the simulation study in Section 4.2.

#### 4.2. Simulation Results

In this section, we give the results of the simulation study to see the performance of our proposed estimator. The simulation study was conducted by generating populations of size 500 having a specific correlation structure between the study and auxiliary variables. For each of the populations with a specific correlation coefficient, various estimators were computed. The process was repeated 20,000 times, and we then computed the mean0square error of each estimator alongside the percent relative efficiency as defined in Equation (26). The results of this simulation study are given in Table 5.
The results of the simulation study clearly indicate that our proposed estimator outperformed other estimators used in the study; hence, we can say that our proposed estimator will estimate process variability with the least error. The efficiency of various estimators is given in Figure 2 below
From above figure we can see that our proposed estimator is most efficient for all values of correlation coefficient.

## 5. Conclusions and Recommendations

In this paper, we proposed a new method for the estimation of process dispersion when the information on a variable of interest and an auxiliary variable is available. We see that the proposed estimator can provide certain other estimators as a special case. We obtained the mean-square error of the proposed estimator and found that the mean-square error of our proposed estimator was less than the mean-square error of other available estimators of variability. We conducted numerical and simulation studies to see the performance of our proposed estimator, and we found that our proposed estimator estimates process variability with the least error. We can, therefore, conclude that our proposed estimator can be used for the estimation of process variability in various areas, including quality control and engineering.

## Author Contributions

Conceptualization, M.I. and M.Q.S.; methodology, M.Q.S.; software, T.A. and M.I.; validation, T.A., M.I. and M.Q.S.; formal analysis, T.A.; investigation, T.A.; resources, T.A.; data curation, M.I.; writing—original draft preparation, T.A.; writing—review and editing, M.I. and M.Q.S.; visualization, T.A.; supervision, M.I. and M.Q.S.; project administration, M.I. and M.Q.S.

## Funding

This research received no external funding.

## Acknowledgments

The authors thank the Department of Statistics, COMSATS University Islamabad, Lahore Campus, Pakistan for supporting this work.

## Conflicts of Interest

The authors declare no conflict of interest.

## References

1. Isaki, C.T. Variance Estimation Using Auxiliary Information. J. Am. Stat. Assoc. 1983, 78, 117–123. [Google Scholar] [CrossRef]
2. Das, A.K.; Tripathi, T. Use of Auxiliary Information in Estimating the Finite Population Variance. Sankhya 1978, 40, 139–148. [Google Scholar]
3. Upadhyaya, L.; Singh, H. An Estimator for Population Variance That Utilizes the Kurtosis of an Auxiliary Variable in Sample Surveys. Vikram Math. J. 1999, 19, 14–17. [Google Scholar]
4. Kadilar, C.; Cingi, H. Ratio Estimators for the Population Variance in Simple and Stratified Random Sampling. Appl. Math. Comput. 2006, 173, 1047–1059. [Google Scholar] [CrossRef]
5. Upadhyaya, L.N.; Singh, H.P.; Chatterjee, S.; Yadav, R. A Generalized Family of Transformed Ratio-Product Estimators in Sample Surveys. Model Assist. Stati. Appl. 2011, 6, 137–150. [Google Scholar] [CrossRef]
6. Yadav, S.K.; Kadilar, C. A Class of Ratio-Cum-Dual to Ratio Estimator of Population Variance. J. Reliab. Stat. Stud. 2013, 6, 29–34. [Google Scholar]
7. Asghar, A.; Sanaullah, A.; Hanif, M. Generalized Exponential Type Estimator for Population Variance in Survey Sampling. Revista Colombiana de Estadística 2014, 37, 213–224. [Google Scholar] [CrossRef]
8. Singh, R.; Chauhan, P.; Sawan, N.; Smarandache, F. Improved Exponential Estimator for Population Variance Using Two Auxiliary Variables. arXiv Preprint, 2009; arXiv:0902.0126. [Google Scholar]
9. Yadav, R.; Upadhyaya, L.N.; Singh, H.P.; Chatterjee, S. A Generalized Family of Transformed Ratio-Product Estimators for Variance in Sample Surveys. Commun. Stat.-Theory Methods 2013, 42, 1839–1850. [Google Scholar] [CrossRef]
10. Sharma, B.; Tailor, R. A New Ratio-Cum-Dual to Ratio Estimator of Finite Population Mean in Simple Random Sampling. Glob. J. Sci. Front. Res. 2010, 10, 27–31. [Google Scholar]
11. Sanaullah, A.; Ali, H.A.; ul Amin, M.N.; Hanif, M. Generalized Exponential Chain Ratio Estimators under Stratified Two-Phase Random Sampling. Appl. Math. Comput. 2014, 226, 541–547. [Google Scholar] [CrossRef]
12. Das, A.K. Contributions to the Theory of Sampling Strategies Based on Auxiliary Information. Unpublished. Ph.D. Thesis, B.C.K.V., West Bengal, India, 1980. [Google Scholar]
13. Murthy, M. Sampling Theory and Methods; Calcutta Statistical Publishing Society: Kolkatta, India, 1967. [Google Scholar]
14. Available online: https://www.statcrunch.com/app/index.php?dataid=285946 (accessed on 26 January 2019).
15. Gujarati, D.N.; Porter, D.C. Basic Econometrics, 5th ed.; McGraw Hill: New York, NY, USA, 2011; p. 189. [Google Scholar]
16. Cochran, W.G. Sampling Technique; John Wiley: New York, NY, USA, 1977. [Google Scholar]
17. Singh, R.K.; Chaudhary, B.D. Biometrical Method in Quantitative Genetics Analysis; Kalyani Publishers: New Delhi, India, 1987. [Google Scholar]
18. Mukhopadhyay, P. Theory and Methods of Survey Sampling; Prentice Hall: New Delhi, India, 1998. [Google Scholar]
Figure 1. Population wise efficiency comparison.
Figure 1. Population wise efficiency comparison.
Figure 2. Efficiency comparison with respect to correlation coefficient.
Figure 2. Efficiency comparison with respect to correlation coefficient.
Table 1. Special cases of the proposed estimator.
Table 1. Special cases of the proposed estimator.
Estimator$β$$a$g
$t o = s y 2$0ag
$t e R 1 = s y 2 exp ( s x * 2 − S x 2 S x 2 + ( a − 1 ) s x * 2 )$1ag
$t e R 2 = s y 2 exp ( s x * 2 − S x 2 S x 2 )$11g
$t e R 3 = s y 2 exp ( s x * 2 − S x 2 S x 2 + s x * 2 ) = t e r$12g
Table 2. Summary measures for populations.
Table 2. Summary measures for populations.
 Measure Source N n $Y ¯$ $X ¯$ $C y$ $C x$ Population I [12] 142 20 4015.218 2900.387 2.112 2.197 Population II [13] 80 25 5182.638 283.875 0.352 0.943 Population III [14] 64 8 141.500 51.187 0.537 0.509 Population IV [15] 51 7 13.067 543.373 0.323 0.684 Population V [16] 58 12 85.948 93.000 1.121 1.127 Population VI [16] 80 23 90.813 104.575 0.392 0.379 Population VII [17] 53 10 917.019 1417.245 0.402 0.682 Population VIII [17] 66 14 974.424 1716.136 0.512 0.612 Population IX [18] 71 12 4137.803 241.944 0.306 0.557 Population X [18] 75 12 1.377 6.347 1.748 0.418 Measure Source $ρ x y$ $β 2 ( y )$ $β 2 ( x )$ $β 1 ( x )$ $h$ $λ 21$ Population I [12] 0.995 40.854 48.157 40.218 43.762 5.979 Population II [13] 0.914 2.267 3.650 1.295 2.337 0.548 Population III [14] −0.818 2.378 1.658 0.006 1.438 −0.146 Population IV [15] 0.446 3.653 15.231 7.636 4.472 0.942 Population V [16] 0.978 24.747 6.175 3.333 5.167 155.121 Population VI [16] 0.628 4.573 5.701 0.629 10.772 96.058 Population VII [17] 0.775 5.355 8.526 5.668 4.839 562.914 Population VIII [17] 0.776 9.128 19.038 11.960 11.210 1142.321 Population IX [18] −0.382 4.458 2.635 0.255 2.181 247.516 Population X [18] 0.222 31.544 2.019 0.004 4.271 4.027
Table 3. Percent relative efficiency (PRE) of various estimators.
Table 3. Percent relative efficiency (PRE) of various estimators.
Population NumberEstimator
$t 0$$t I$$t S R$$t A$$t Y K 2$$t r P$$t e R$$t e R G$
110033.73122.284135.98881.60136.42783.20521.703
210098.10646.764158.33356.73996.20356.55241.038
310084.20280.17299.312110.28195.33782.67368.061
4100174.642103.215128.67586.31678.94673.91759.606
510031.16981.74077.46081.33082.98074.52510.052
610037.03463.53285.54183.75257.02877.20211.427
710096.51255.05489.23482.01686.18272.47648.790
810070.69329.87087.05558.14084.74564.01425.314
910078.95577.65674.89872.66375.66982.56265.989
1010081.91890.125105.36492.336100.01684.91354.820
Table 4. Regression summary for various estimators.
Table 4. Regression summary for various estimators.
CoefficientEstimator
$t I$$t S R$$t A$$t Y K 2$$t r P$$t e R$$t e R G$
$β ^ 0$88.449
(0.003)
82.312
(0.000)
94.493
(0.000)
86.106
(0.000)
90.983
(0.000)
76.948
(0.000)
53.468
(0.000)
$β ^ 1$−11.353
(0.700)
−9.407
(0.474)
7.131
(0.701)
−16.670
(0.097)
0.795
(0.930)
−10.860
(0.058)
−26.908
(0.030)
$β ^ 2$−0.408
(0.748)
−0.798
(0.181)
0.573
(0.482)
0.175
(0.656)
−1.063
(0.026)
0.282
(0.217)
−0.052
(0.907)
F-ratio0.251
(0.785)
2.420
(0.159)
0.624
(0.482)
1.946
(0.213)
4.898
(0.047)
2.618
(0.142)
4.849
(0.048)
Table 5. Percentage relative efficiency (PRE) of existing estimators.
Table 5. Percentage relative efficiency (PRE) of existing estimators.
$ρ$Estimator
$t 0$$t I$$t S R$$t A$$t Y K 2$$t r P$$t e R$$t e R G$
−0.9 100189.134119.577100.037100.031109.36691.88184.899
−0.7100193.632121.793100.038100.518109.60788.73082.266
−0.5 100196.582123.296100.042100.857109.71187.86286.497
−0.3 100198.862124.449100.047101.115109.76391.44585.654
−0.1 100199.721124.892100.040101.216109.77692.07987.677
0.1 100200.035124.974100.038101.221109.77287.66784.565
0.3 100198.773124.377100.040101.094109.76290.27288.006
0.5 100196.642123.334100.043100.866109.71588.72883.709
0.7 100192.997121.545100.045100.475109.58492.26182.935
0.9 100189.171119.600100.043100.037109.36689.81186.255

## Share and Cite

MDPI and ACS Style

Akhlaq, T.; Ismail, M.; Shahbaz, M.Q. On Efficient Estimation of Process Variability. Symmetry 2019, 11, 554. https://doi.org/10.3390/sym11040554

AMA Style

Akhlaq T, Ismail M, Shahbaz MQ. On Efficient Estimation of Process Variability. Symmetry. 2019; 11(4):554. https://doi.org/10.3390/sym11040554

Chicago/Turabian Style

Akhlaq, Tanveer, Muhammad Ismail, and Muhammad Qaiser Shahbaz. 2019. "On Efficient Estimation of Process Variability" Symmetry 11, no. 4: 554. https://doi.org/10.3390/sym11040554

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.