Estimating Variances in Weighted Least-Squares Estimation of Distributional Parameters

Abstract: Many estimation methods have been proposed for the parameters of statistical distribution. The least squares estimation method, based on a regression model or probability plot, is frequently used by practitioners since its implementation procedure is extremely simple in complete and censoring data cases. However, in the procedure, heteroscedasticity is present in the used regression model and, thus, the weighted least squares estimation or alternative methods should be used. This study proposes an alternative method for the estimation of variance, based on a dependent variable generated via simulation, in order to estimate distributional parameters using the weighted least squares method. In the estimation procedure, the variances or weights are expressed as a function of the rank of the data point in the sample. The considered weighted estimation method is evaluated for the shape parameter of the log-logistic and Weibull distributions via a simulation study. It is found that the considered weighted estimation method shows better performance than the maximum likelihood, least-squares, and certain other alternative estimation approaches in terms of mean square error for most of the considered sample sizes. In addition, a real-life example from hydrology is provided to demonstrate the performance of the considered method.


Introduction
Statistical distributions have many applications in the areas of engineering, medical sciences, air quality determination, and so forth.Since the parameters of the considered distribution are used for inference results regarding topics of interest, their estimation methods have received great interest in the literature.The maximum likelihood estimation (MLE) method has good theoretical properties for large sample sizes (n > 250) and is often preferred [1].However, it can show poor performance for small sample cases [1,2], and the MLE also requires an iterative numerical method, such as Newton-Raphson, for most distributions.Additionally, some research has been conducted to find the best linear unbiased estimators for distributions.However, the proposed estimators in the literature are generally computationally intensive and require numerical methods to obtain estimates [1].On the other hand, a regression procedure based on a probability plot to estimate the parameters of statistical distributions is frequently used by practitioners since its implementation procedure is simple in the cases of complete and censoring data.This procedure is easily performed if a distribution function can be expressed as an explicit function.A linear regression model, whose dependent variable is the nonparametric estimate of the value of the distribution function at the ranked sample, is obtained.Thus, the least squares (LS) estimates of the parameters of the resulting regression model become the estimates of the parameters of the considered statistical distribution.In the literature, parameters of different statistical distributions have been estimated by the least squares estimation (LSE) method.For example, Altun [3] compares the LSE, the weighted least squares estimation (WLSE), and certain other estimation methods for the Weibull distribution.[4][5][6][7][8][9] have used the LSE method for different statistical distributions.However, heteroscedasticity (non-constant variance) is present in the used regression model and, thus, LS estimates lose the efficiency property, even if most studies do not take into account this reality [9].In such cases, the WLSE or alternative methods should be used [10][11][12][13][14][15][16][17][18][19][20].It is well-known that when conducting the WLSE, the variances of the dependent variables are unknown and must be estimated to implement the WLSE procedure.Considering this reality, [11][12][13][14][15][16] discuss that variances should be found in order to perform the WLSE.They all generally consider different weights using large sample properties of the empirical distribution function or order statistics to stabilize the variance when conducting the WLSE.For example, Hung [11] proposes the WLSE for estimating the shape parameter of the Weibull distribution.His results from simulation studies illustrate that the mean-squared error of WLSE is smaller than competing procedures.Lu et al. [12] consider the LSE and WLSE for the Weibull distribution and compare their WLSE method with three existing WLSE methods.They found that their WLSE method is more precise and has smaller variance than Bergman's WLS estimators.Lu and Tao [14] consider the LSE and WLSE for the Pareto distribution.Zyl [15] considers regression procedure for the parameters of the three-parameter generalized Pareto distribution and applies the WLSE with the Box-Cox procedure.Zhang et al. [13] discuss the WLSE methods for Weibull distribution and related works.They find that the WLSE is an efficient method that makes good use of small datasets.Kantar [20] introduces the generalized least squares and WLSE methods, based on an easily-calculated approximation of the covariance matrix, for distributional parameters.They all emphasize that a weight function should be used when performing the regression procedures.
In this article, a different approach is considered for the estimation of the weights for the WLSE.The variances are modeled as a function of the rank of the data point in the sample by using samples which are simulated from the dependent variable of the established regression model for the specified distribution.Thus, the estimation of variances or weights is expressed as a simple mathematical function of its rank.Next, the proposed WLSE is then applied to the estimation of the parameters of the log-logistic and Weibull distributions.The simulation results show that the considered WLSE for the shape parameter of the log-logistic and Weibull distributions provides better performance than the MLE, LSE, and WLSE (Zyl and Schall) in terms of bias and mean square errors for most of the considered sample cases.
The rest of the paper is organized as follows: the LSE for estimating distributional parameters is given in Section 2; Section 3 introduces an estimation of variances to use in the WLSE as an application of estimation of the parameters of the log-logistic and Weibull distributions; to show the performance of the proposed WLSE, a simulation study is presented in Section 4; an application from real-life is provided in Section 5; and, finally, the last section summarizes the conclusions of the study.

LSE Method for Log-Logistic and Weibull Distributions
The distribution function can be transformed to a linear regression model, Y i " β 0 `β1 X i , if it can be written as an explicit function.Next, the LSE method can be used to calculate the parameters.In estimation, the sum of the squares of the errors, which is defined below, should be minimized; min The obtained LS estimates are: If we consider the LSE for the distributional parameters, the following procedure should be followed for the Weibull distribution.
The cumulative distribution function (cdf) of the Weibull random variable, widely-used in engineering fields [1,21,22] is given as follows: where λ is the scale parameter and α is the shape parameter, which is of great importance to the Weibull since it determines the shape of distribution.α and λ are unknown parameters in real applications.After algebraic manipulation, Equation ( 4) can be linearized as follows: For a sample of size n and x p1q ď x p2q ď ¨¨¨ď x pnq , the regression model is rewritten as: If we replace ln " ´lnp1 ´F ´xpiq ¯ı with Y, αlnλ with β 0 , α with β 1 , and lnx piq with X, the regression model with error term occurs as: If we consider the log-logistic distribution, its cdf is given as follows: where γ is the scale parameter and δ is the shape parameter, the following linear model is obtained to estimate the parameters of log-logistic distribution: For a sample of size n, let x p1q , . . ., x pnq be the order statistics and thus, the regression model can be expressed as follows: ln ˆ1 ´´1 ´F ´xpiq ¯¯´1 ˙" δln ´xpiq ¯´δln pγq (10) i´a pn`bq , p0 ď a ď 0.5, 0 ď b ď 1q is used as estimate of F ´xpiq ¯, where i is the rank of the data point in the sample in ascending order.Since i´0.3 pn`0.4qshows better performance than the other considered alternatives, we use it in this study.
Replacing F ´xpiq ¯by its estimate, called Fi , the Equations ( 6) and (10) with error terms yield the following equations for estimating parameters of the Weibull and log-logistic distributions: lnp1 ´p1 ´F i q ´1q " δln ´xpiq ¯´δln pγq `εi On the other hand, it should be noted that the error terms of the models given in Equations (11) and (12) are not identically distributed as mentioned by [9,[15][16][17]19,20], since x piq is beta distributed with parameters i and n ´i `1 and, thus, the models (11) and ( 12) have a different variance for each i as earlier mentioned [9,19,20].In other words, error terms of the previously mentioned models have no equal variance.This situation can negatively affect the LSE.In such cases, alternative estimation approaches to stabilize variances should be used.

Estimating Weights in the Weighted Least-Squares Estimation for Parameters of the Log-Logistic and Weibull Distributions
If we take into account the models given in Equations ( 11) and (12) it can be seen that the estimate of F ´xpiq ¯is a function of (i ´a, n `bq.Thus, the estimate of the variance of ln ´F ´xpiq ¯ı can be expressed as a function of (i ´a, n `bq.That is, the following equation can be written for the variance of the dependent variable or error term of the model Equation (11): Varpln where Var denotes the variance, G is a differentiable function and ε i is an error term.If we replace Varpln " ´ln ´1 ´F ´xpiq ¯¯ı q by Z i , the model occurs as follows: Replacing Z i by its estimate, Ẑi and considering Ĝ as estimation of G, the following model yields: It is known that cdf of any continuous random is distributed uniformly (U p0, 1q) with zero and unit parameters, thus, F ´xpiq ¯" U p0, 1q.By using this reality, the distribution of pln " ´lnp1 ´F ´xpiq ¯ıq for each, i " 1, . . ., n can be generated via simulation and then variance can be estimated.Figure 1a-d shows scatter plots the variance of ln " ´lnp1 ´F ´xpiq ¯ı, estimated from 5000 simulated samples, versus the rank i, i " 1, . . ., n, respectively.Thus, it can be seen from the figures that the variances of Z i tend to decrease as the rank i increases.In other words, the weight is inversely related to the rank (the rank 1 indicates the highest weight, while rank n indicates the lowest weight).
Thereby, the variance of ln " ´ln ´1 ´F ´xpiq ¯¯ı can be modeled by a decreasing function of rank i.

Estimating Weights in the Weighted Least-Squares Estimation for Parameters of the Log-Logistic and Weibull Distributions
If we take into account the models given in Equations ( 11) and ( 12) it can be seen that the estimate of � () � is a function of ( − ,  + ).Thus, the estimate of the variance of �− (1 − � () �� can be expressed as a function of ( − ,  + ).That is, the following equation can be written for the variance of the dependent variable or error term of the model Equation ( 11): where  denotes the variance,  is a differentiable function and ε  is an error term.If we replace ( �− �1 − � () ���) by   , the model occurs as follows: Replacing   by its estimate,  ̂ and considering  � as estimation of , the following model yields: It is known that cdf of any continuous random is distributed uniformly ((0,1)) with zero and unit parameters, thus, � () �~(0,1).By using this reality, the distribution of (�− (1 − � () ��) for each,  = 1, … ,  can be generated via simulation and then variance can be estimated.Figure 1ad shows scatter plots the variance of �− (1 − � () ��, estimated from 5000 simulated samples, versus the rank ,  = 1, … , , respectively.Thus, it can be seen from the figures that the variances of   tend to decrease as the rank i increases.In other words, the weight is inversely related to the rank (the rank 1 indicates the highest weight, while rank  indicates the lowest weight).Thereby, the variance of �− (1 − � () �)� can be modeled by a decreasing function of rank .As a simple model for G p.q, we take a " 0 and b " 0. Table 1 shows the considered models and the obtained R 2 from these models.It can be deduced from Table 1 that G pi, nq " α{i with the determination of coefficient 0.98 and the estimated α " 1.51 may be an alternative function to model variance.Since the weight is the inverse of the variance, w i " i{α, i " 1, 2, .., n.Hence, weights are expressed as a simple mathematical function or as a model of its rank.
For the log-logistic distribution case, Figure 2a-d  As a simple model for (.), we take  = 0 and  = 0. Table 1 shows the considered models and the obtained R 2 from these models.  1 that (, ) = / with the determination of coefficient 0.98 and the estimated α = 1.51 may be an alternative function to model variance.Since the weight is the inverse of the variance,   = /α ,  = 1,2, . .,  .Hence, weights are expressed as a simple mathematical function or as a model of its rank.
For the log-logistic distribution case, Figure 2a-d shows scatter plots of the variance of given in Table 2, may be used.Parallel to the Weibull distribution case, it can be concluded from Table 2 that G pi, nq " α{i with the determination of coefficient 0.98, and the estimated α " 1.43 can be a plausible model for the variance and, thus, w i " i{α.
Therefore, we suggest a simple formula to calculate the weights to be used in WLS for estimating the parameters of the Weibull and log-logistic distributions.Such formulae for weights of the Pareto, Logistic, Gumbel, Burr, and such like random variables, whose cumulative distribution functions have explicit functional form, can be easily obtained by means of the simulation study mentioned above.
The WLSE procedure for the Weibull and log-logistic distributions can be carried out by minimizing the weighted sum of squares with respect to the unknown shape and scale parameters, thus: ÿ w i plnr´lnp1 ´F i qs ´αlnλ ´α lnpx piq qq 2 ( 16) With matrix notations for the Weibull distribution, y " plnr´lnp1 ´F 1 qs, . . .., lnr´lnp1 ´F n qsq 1 , and " ¨w1 0 0 0 . . .0 0 0 w n ‹ ‚, β " pβ 0 , β 1 q, where β 0 " αlnλ and β 1 " α, the solution of the WLSE is obtained as follows: where α " β 1 and λ " exp ´β0 A similar process can be applied to estimate the parameters of the log-logistic distribution.In conclusion, the considered WLSE is explicit functions of sample observations and is, therefore, easy to compute and does not have computational complexities like the MLE [1].

Monte Carlo Simulation
Different estimation methods may result in different estimates.Thus, it is important to have objective criteria to inform the chosen method over other alternatives.A common approach to select the best method is the Monte Carlo simulation by using appropriate criteria: bias and mean squared error (MSE) [25].In this section, the performance of the WLSE with the proposed weights is evaluated by means of a simulation study.The considered WLSE is compared with the MLE, LSE, and WLSE (Zyl and Schall) for the parameters of the log-logistic and Weibull distributions.In the simulation study, the replication number is taken as 5000.All computations for simulation are performed using Matlab 10.1.Sample sizes are taken as n = 10, 20, 30, 50, 100, and 250 and the shape parameters are considered as α = 1, 2, 3 and 6 as parallel to previous studies concerning the log-logistic and Weibull distributions.Without any loss of generality, the scale parameter is taken to be equal to 1.
From the simulation results presented in Table 3 for the shape parameter of the log-logistic distribution, the following conclusions may be summarized.The MSE and bias values of the proposed WLSE decrease when sample sizes increase.Thus, we show that the proposed WLSE provides consistent estimates.According to the MSE criterion, the proposed WLSE apparently shows better performance than the MLE, LSE and WLSE (Zyl and Schall) for all considered shape parameters when sample sizes are n = 10, 20, 30, and 50.Since the MLE is asymptotically the best, it provides the best performance for n = 100 and 250, as expected.It can be seen from analysis that WLSE has a better performance than the LSE and the WLSE (Zyl and Schall) for estimation of the shape parameter of the log-logistic distribution in terms of MSE.If we evaluate the WLSE in terms of bias criterion, it is observed from Table 3 that the proposed WLSE is best for n = 10, 20 and it is the best performer next to the LSE for n = 30, 50.In addition, the WLSE provides less bias than the WLSE (Zyl and Schall) for n = 100 and 250.As a result, the considered WLSE with the proposed weights can be a good alternative estimation method for the shape parameter of the log-logistic distribution.In particular, it is useful in dealing with small sample sizes.
If the simulation results for the scale parameter of the log-logistic distribution are evaluated, it can be concluded that the proposed WLSE is the best method, when sample sizes are taken as 10, 20, 50, and 100 in terms of MSE for most of the shape parameter cases except for the cases of shape parameters 3 and 6.Similar to the results of the shape parameter, the MLE naturally provides the least MSE for n = 250.It can be seen from Table 3 that the WLSE and WLSE (Zyl and Schall) perform very similarly and they outperform the LSE for almost all cases.According to bias, the WLSE provides less bias than the LSE and WLSE (Zyl and Schall) in almost all cases as seen in Table 4.
Based on the results presented in Table 5 for the shape parameter of the Weibull distribution, the following conclusions may be summarized.The proposed WLSE shows the best performance for all sample sizes except n = 250, which favors the MLE.The WLSE outperforms the LSE and WLSE (Zyl and Schall) for all cases.According to bias criterion, while the LSE outperforms others for n = 10 and 20, the proposed WLSE is the best performer next to the LSE.For other sample sizes, the proposed WLSE gives satisfactory results.The results of the simulation for the scale parameter of the Weibull distribution are given in Table 6.While, according to the MSE criterion, the proposed WLSE, LSE, and WLSE (Zyl and Schall) yield similar results, the proposed WLSE provides comparable efficiency in comparison with LSE in terms of bias.Moreover, the proposed WLSE is compared to two alternative WLSE (Hung [11], Lu et al. [12]) and also PE [2] for the shape parameter of the Weibull distribution.It can be seen that the proposed WLSE demonstrates satisfactory results in terms of MSE.To sum up, the proposed WLSE for the shape parameter of the Weibull distribution, which characterizes failure rate and has a significant impact on the accuracy of wind power estimation, provides better performance than the MLE, LSE, and three alternative WLSE in terms of MSE for the most of the considered sample (n < 100) and shape parameter cases.For the log-logistic distribution, it may be concluded that the proposed WLSE for the shape parameter is a good alternative to the MLE, LSE, and WLSE.

Real-Life Application from Hydrology
In this study, a flood frequency analysis is performed for one station of the Aji River basin in Iran as an illustrative example.Data measured at the SofiChai station is used.An analysis of the considered flood data is studied in detail in [26] with most statistical distributions including the log-logistic and Weibull distributions.We also research whether data follows the Weibull and log-logistic distributions using statistical tests, such as Kolmogorov-Smirnov, Anderson-Darling, and Chi-squared.In the results of the considered tests and a q-q plot, we conclude that the Weibull and log-logistic distributions may be used as alternative models for model flood data.
Figure 3 shows the weights calculated for sample size, n = 34 for Weibull and log-logistic distributions in order to perform the WLSE.squared.In the results of the considered tests and a q-q plot, we conclude that the Weibull and loglogistic distributions may be used as alternative models for model flood data.
Figure 3 shows the weights calculated for sample size, n = 34 for Weibull and log-logistic distributions in order to perform the WLSE.We now calculate the estimates of the scale and shape parameters of the log-logistic and Weibull distributions using the previously mentioned estimation methods in this study; see Table 7. Figure 4 provides a histogram and the fitted Weibull probability density functions (pdfs) for flood data to evaluate the results of the MLE, LSE, and WLSE.It is observed that the Weibull pdfs capture most features of the observed frequency of the histogram of the observed data.Additionally, the performance of WLSE is evaluated based on root mean square error (RMSE) between the area of the histogram and the total area under the Weibull pdfs curve.It is clear that the values of RMSE are very close to each other for all methods.0.0532, 0.0543, and 0.0538 are obtained for the MLE, LSE, and WLSE, respectively.It is seen from this example that the RMSE of the WLSE is less than the LSE.We now calculate the estimates of the scale and shape parameters of the log-logistic and Weibull distributions using the previously mentioned estimation methods in this study; see Table 7.The log-logistic pdfs and the histogram are given in Figure 5.The RMSE of the MLE, LSE, and the considered WLSE are 0.0601, 0.0674, and 0.0610 respectively.As the Weibull case, the RMSE of WLSE is less than the LSE.Thus, the simulation results are supported by the results of application.

Conclusions
The key of WLSE is to calculate the weights.In this study, we propose estimating weights as a function of the rank of the data point in the sample based on dependent variable generated via simulation.The considered weighted least squares estimation, which is computationally easy, is then applied to the estimation of the parameters of the log-logistic and Weibull distributions.Considering the results of the Monte Carlo simulation and a real application, it is shown that the proposed WLSE for the shape parameters of the Weibull and the log-logistic distributions shows better performance than other considered methods for most of the considered parameters and sample cases.The log-logistic pdfs and the histogram are given in Figure 5.The RMSE of the MLE, LSE, and the considered WLSE are 0.0601, 0.0674, and 0.0610 respectively.As the Weibull case, the RMSE of WLSE is less than the LSE.Thus, the simulation results are supported by the results of application.

Conclusions
The key of WLSE is to calculate the weights.In this study, we propose estimating weights as a function of the rank of the data point in the sample based on dependent variable generated via simulation.The considered weighted least squares estimation, which is computationally easy, is then applied to the estimation of the parameters of the log-logistic and Weibull distributions.Considering the results of the Monte Carlo simulation and a real application, it is shown that the proposed WLSE for the shape parameters of the Weibull and the log-logistic distributions shows better performance than other considered methods for most of the considered parameters and sample cases.

Conclusions
The key of WLSE is to calculate the weights.In this study, we propose estimating weights as a function of the rank of the data point in the sample based on dependent variable generated via simulation.The considered weighted least squares estimation, which is computationally easy, is then applied to the estimation of the parameters of the log-logistic and Weibull distributions.Considering the results of the Monte Carlo simulation and a real application, it is shown that the proposed WLSE for the shape parameters of the Weibull and the log-logistic distributions shows better performance than other considered methods for most of the considered parameters and sample cases.

Figure 3 .
Figure 3.The estimated variances corresponding to Weibull and log-logistic distributions.

Figure 3 .
Figure 3.The estimated variances corresponding to Weibull and log-logistic distributions.

Figure 4
Figure 4 provides a histogram and the fitted Weibull probability density functions (pdfs) for flood data to evaluate the results of the MLE, LSE, and WLSE.It is observed that the Weibull pdfs capture most features of the observed frequency of the histogram of the observed data.Additionally, the performance of WLSE is evaluated based on root mean square error (RMSE) between the area of the histogram and the total area under the Weibull pdfs curve.It is clear that the values of RMSE are very

Table 1 .
The considered linear models for G p.q for the Weibull distribution.
shows scatter plots of the variance of which is estimated from 5000 simulated samples, versus rank i, i " 1, ..., n.It can be seen observed from the figures that the variances, Z i , tend to decrease and increase, while rank i increases to n 2 and n, respectively.Therefore, we can use the given functions in Table1to model variance of the ln ˆ1 ´´1 ´F ´xpiq ¯¯´1˙for i " 1, ..., `n 2 ˘, and the determined function can be used for i " `n 2 ˘`1, ..., n.Alternatively, U shaped functions, which are more expensive than those given in Table2, may be used.

Table 1 .
The considered linear models for (.) for the Weibull distribution.

Table 2 .
The considered linear models for G p.q for the log-logistic distribution.

Table 3 .
Bias and MSE of the estimated shape parameters of the log-logistic distribution.

Table 4 .
Bias and MSE of the estimated scale parameters of the log-logistic distribution.

Table 5 .
Bias and MSE of the estimated shape parameters of the Weibull distribution.

Table 6 .
Bias and MSE of the estimated scale parameters of the Weibull distribution.

Table 7 .
Parameter estimates for log-logistic and Weibull distributions.

Table 7 .
Parameter estimates for log-logistic and Weibull distributions.