On E ﬃ cient Estimation of Process Variability

: Variability or dispersion plays an important role in any process and provides insight into the spread of data from some central point, usually the mean. A process with less spread is preferred over a process in which values di ﬀ er greatly from the mean. Various methods are available to estimate the process dispersion by using information on the variable of interest. Certain additional variables provide good insight to estimate the process dispersion. In this paper, we propose an e ﬃ cient method for the estimation of process variability by using the exponential method. The properties of the proposed method were studied. We conducted simulation and empirical studies to compare the proposed method with some existing methods of estimation of variability. The results of the numerical study show that our proposed method is better than the other methods used in the study.


Introduction
The variability or dispersion is an important aspect of a process, as well as of any statistical study. It provides information about spread of data from some central point. In most cases, that central point is the mean of the data; hence, the variance provides insight into the spread. For example, we may be interested to see the spread in the length of some electrical components from a certain pre-assigned target value. It is always required to use methods or to design procedures which produce smaller spread. The dispersion of estimators in any process or in any sample survey is of utmost importance, as the estimators or processes which provide smaller variability are more efficient and, hence, preferred. The dispersion of the process or estimator depends on the variable under study, method of estimation, and information of certain additional variables that can control the dispersion of the process. The information of additional variables can be efficiently utilized to propose the methods which reduce variation of the process or of the estimator. In survey sampling, the estimation of dispersion can be done using different methods which range from simple to complex. Some of these methods utilize the information of additional variables and some do not. The popular methods which utilize the information of additional variables to estimate dispersion are based on the ratio of the study variable to an additional variable and on the linear regression of the study variable to the additional variable. In some cases, the transformation of additional variables plays a useful role in decreasing the variability of the estimators; hence, certain methods are based on the transformed additional variables.
Several authors proposed various methods to estimate the process variation using the information of an additional or auxiliary variable. The method of using the information of an additional variable characterizes a certain estimation method and is very useful in decreasing the variance of estimation.
Several studies were conducted by many researchers to estimate the variability of a process, known as the population variance of the study variable S 2 y . The simplest estimator of population variance is the sample variance s 2 y , which was proposed by Reference [1]. Some other estimators Symmetry 2019, 11, 554 2 of 10 of variance were proposed in References [2][3][4][5][6][7] and are based on certain information of one or more auxiliary variables (X). The method of using auxiliary information is very useful in proposing efficient estimates of the population variance. Some methods which are useful in the estimation of population variance are given in the following section.

Materials and Methods
Suppose we have a finite process which can produce N units; let the value of the characteristic of interest be U 1 , U 2 , . . . , N N . Suppose that the variability of the process is to be estimated on the basis of a sample of size n. Let Y represent the variable of interest and X represent the auxiliary variable. Also, let Y and S 2 y be the mean and variance of the variable of interest, and let X and S 2 x be the mean and variance of the auxiliary variable for the whole process. The corresponding sample measures are y, s 2 y , x, and s 2 x , respectively. When information on two variables is available, then the strength of interdependence (correlation coefficient) between these variables is of importance and is denoted by ρ xy . Also, let β / 2 (y) and β / 2 (x) be the coefficients of kurtosis of the study and auxiliary variables, and let β 1 (x) be the squared coefficient of skewness of the auxiliary variable. Some additional notations that are useful in studying the properties of various estimators of population variance are given below.
We also use s 2 y = S 2 y (1 + e 0 ) and s 2 , and E(e 0 e 1 ) = λh / , with λ = N −1 . We will now discuss some available estimators for process variability (the population variance). The usual unbiased estimator of population variance is t 0 = s 2 y with the mean-square error (MSE) given as The estimator given above is based on the information of the study variable only. An estimator of variance which utilizes the information of an auxiliary variable was proposed by Reference [1] as The MSE of t I up to the first order is Two exponential-type estimators of variance were proposed by Reference [8] as and t SP = s 2 y exp The MSE of the estimators in Equations (4) and (5) are, respectively, and An exponential-type estimator of variance, proposed by Reference [7], is where C x = X/S x is the coefficient of variation of the auxiliary variable X. An estimator of variance, based on the coefficient of variations, was proposed by Reference [2] as with MSE as A general procedure for estimating population variance was proposed by Reference [9] as with MSE as Various authors have used certain transformations of the auxiliary variable to propose different estimators of variance. The popular transformation used by various authors is x is an unbiased estimator for S 2 x . Following the same transformation, Reference [6] proposed a dual ratio-type estimator for population variance as with MSE given as The use of a transformed auxiliary variable provides more efficient estimators of population variance; see, for example, References [6][7][8][9][10][11].
The main purpose of this paper is to propose a generalized exponential estimator for population variance by utilizing transformed auxiliary information. The estimator is proposed in Section 3.

Proposed Estimator
Using the concept of Reference [6], we propose two general exponential-type estimators, known as the exponential ratio and exponential product estimators of population variance, by utilizing transformed auxiliary information x. The proposed estimators are as follows: and where The MSE of the proposed estimators, up to the first order, are and We have also proposed another estimator, following Reference [11] as Using , and using a Taylor series expansion of Equation (21), we have The bias of the proposed estimator, up to the first order, is given as The MSE of the estimator in Equation (21) is The optimum value of a i is obtained by minimizing Equation (23) and is given as Using the optimum value of a in Equation (23), the minimum value of MSE is

Special Cases
The special cases of the proposed estimator are obtained using different values of the constants involved and are given in Table 1.
We now compare the proposed estimator with some popular available estimators of population variance. The efficiency comparison is given in Section 3.2. Table 1. Special cases of the proposed estimator.

Efficiency Comparison of the Generalized Exponential Estimator
The comparison of the proposed estimator was done by comparing the mean-square error of the proposed estimator with that of some available estimators. We compared our proposed estimator with t o , t I , t SR , and t YK2 given in Section 2. The mean-square errors of these estimators are given in Equations (1), (3), (6), and (15), respectively. Now, the comparison of the proposed estimator with t 0 shows that The comparison of t eRG with t I shows that Again, the comparison of t eRG with t SR shows that The comparison of t eRG with t YK2 shows that We now give numerical examples for the application of the proposed estimator.

Results
In this section, we present the numerical studies for the application of the proposed generalized estimator of population variance. The numerical study is twofold. We firstly provide 10 real examples by considering 10 real populations; later, we provide a simulation study to see the performance of the proposed generalized estimator of variance. The results of these studies are given in the sections below.

Numerical Study
In this section, we present a numerical study to see the performance of the proposed estimator in estimating the variability of processes. These studies were conducted using 10 real populations which were previously used by various authors to see the performance of their proposed estimators. The description of these populations can be found in the source given.
The summary measures for these 10 populations are given in Table 2. The summary measures of two populations indicate that the study variable and auxiliary variable are highly correlated. This is generally an underlying requirement for ratio-and exponential-type estimators for the estimation of population characteristics. We computed the mean-square error of various estimators using the data of above populations. After computing the mean-square error, we computed the percent relative efficiency (PRE) of various estimators, relative to t 0 , using Equation (26).
The efficiency provides information about the sample size required to achieve a desired result, using an estimator that is provided by the base estimator t 0 with a sample of size 100. This, therefore, means that a smaller value of PRE indicates that the estimator is more efficient. The results are given in Table 3.
We can see, from Table 3, that the proposed estimator t eRG is the most efficient estimator, as this estimator requires the least number of observations to obtain a desired result as obtained by t 0 with a sample of size 100. We then conducted a regression analysis to see the effect of the correlation coefficient and coefficient of skewness of the auxiliary variable on the efficiency of the estimator. The regression model that we built is as follows: The regression summary alongside the result of the test of significance for each estimator is given in Table 4. From the above table, we can see that the average efficiency of our proposed estimator, t eRG , is the lowest, asβ 0 for this estimator has the smallest value. We can also see that the correlation coefficient has a significant effect on the efficiency of t eRG , and the regression model for this estimator is significant at 5%. The value ofβ 0 for our proposed estimator t eR is the second lowest, which indicates that this estimator is the second best estimator for the estimation of population variance. The regression coefficientβ 2 is significant at 10% for this estimator, which indicates that the correlation coefficient between the study and auxiliary variables will provide useful information to predict the efficiency of estimator t eR . The other significant regression model is for t rP ; for this estimator, the coefficient ofβ 2 is significant. This indicates that the coefficient of kurtosis of the auxiliary variable has a significant effect on the efficiency of this estimator. We can also see from Table 4 that the estimator t A showed the worst performance in this study, as coefficientβ 0 for this estimator has the highest value.
We also show the efficiencies of various estimators in Figure 1, where estimators are numbered from 1 to 7 in the order in which they appear in Table 4. This means that estimator t I is numbered 1 and estimator t eRG is numbered 7. The graph also shows that our proposed estimator t eRG is the most efficient estimator in this study. We also show the efficiencies of various estimators in Figure 1, where estimators are numbered from 1 to 7 in the order in which they appear in Table 4. This means that estimator I t is numbered 1 and estimator eRG t is numbered 7. The graph also shows that our proposed estimator eRG t is the most efficient estimator in this study. We now present the results of the simulation study in Section 4.2.

Simulation Results
In this section, we give the results of the simulation study to see the performance of our proposed estimator. The simulation study was conducted by generating populations of size 500 having a specific correlation structure between the study and auxiliary variables. For each of the populations with a specific correlation coefficient, various estimators were computed. The process was repeated 20,000 times, and we then computed the mean0square error of each estimator alongside the percent relative efficiency as defined in Equation (26). The results of this simulation study are given in Table 5. The results of the simulation study clearly indicate that our proposed estimator outperformed other estimators used in the study; hence, we can say that our proposed estimator will estimate process variability with the least error. The efficiency of various estimators is given in Figure 2  The results of the simulation study clearly indicate that our proposed estimator outperformed other estimators used in the study; hence, we can say that our proposed estimator will estimate process variability with the least error. The efficiency of various estimators is given in Figure 2 below From above figure we can see that our proposed estimator is most efficient for all values of correlation coefficient. From above figure we can see that our proposed estimator is most efficient for all values of correlation coefficient.

Conclusions and Recommendations
In this paper, we proposed a new method for the estimation of process dispersion when the information on a variable of interest and an auxiliary variable is available. We see that the proposed estimator can provide certain other estimators as a special case. We obtained the mean-square error of the proposed estimator and found that the mean-square error of our proposed estimator was less than the mean-square error of other available estimators of variability. We conducted numerical and simulation studies to see the performance of our proposed estimator, and we found that our proposed estimator estimates process variability with the least error. We can, therefore, conclude that our proposed estimator can be used for the estimation of process variability in various areas, including quality control and engineering.