Bias, Precision, and Accuracy of Skewness and Kurtosis Estimators for Frequently Used Continuous Distributions

Several measures of skewness and kurtosis were proposed by Hogg (1974) in order to reduce the bias of conventional estimators when the distribution is non-normal. Here we conducted a Monte Carlo simulation study to compare the performance of conventional and Hogg’s estimators, considering the most frequent continuous distributions used in health, education, and social sciences (gamma, lognormal and exponential distributions). In order to determine the bias, precision and accuracy of the skewness and kurtosis estimators for each distribution we calculated the relative bias, the coefficient of variation, and the scaled root mean square error. The effect of sample size on the estimators is also analyzed. In addition, a SAS program for calculating both conventional and Hogg’s estimators is presented. The results indicated that for the non-normal distributions investigated, the estimators of skewness and kurtosis which best reflect the shape of the distribution are Hogg’s estimators. It should also be noted that Hogg’s estimators are not as affected by sample size as are conventional estimators.


Introduction
In the health and social sciences, outcome variables usually show values of skewness and kurtosis that clearly deviate from the normal distribution [1,2]. Micceri [2] analyzed the distributional characteristics of over 400 large-sample achievement and psychometric measures and found several classes of deviation from the normal distribution in addition to skewness and kurtosis, leading him to conclude that normal distributions are uncommon with real data. This was further illustrated in the study by Blanca et al. [1], who analyzed 693 distributions corresponding to cognitive ability and other psychological variables derived from 130 different populations. They found that most distributions were non-normal. It should also be noted that variables with non-normal distributions are commonly encountered in several areas of psychological and social research, an example being the distributions for some dimensions of self-concept [3,4]. Further examples cited by Arnau et al. [5] include reaction times or response latency in cognitive studies [6], clinical assessment indices in drug abuse research [7], physical and verbal violence in couples [8], and labor income [9] and health care costs [10] in sociological studies.
In a recent systematic review, Bono et al. [11] found that the most widely used distributions in health, education, and social the sciences can be ranked in descending order as follows: gamma, negative binomial, multinomial, binomial, lognormal and exponential.
The detailed study of distribution shape requires the calculation of skewness and kurtosis. In the majority of studies these indices are estimated based on the third and fourth central moments of a distribution. However, their values are usually biased in the case of non-normal distributions, that is, there may be a discrepancy between the true (theoretical) values or accepted references of skewness and kurtosis measures, for a given distribution, and the observed (empirical) values. With the aim of reducing this bias, Hogg [12] proposed alternative estimators, which were subsequently used in the study by Micceri [2]. These estimators are referred to as elements leading to adaptive robust estimators of location [13]. Hogg et al. [14] and Redd and Stark [15] complemented the earlier study by Hogg [12], showing that these estimators are a good choice for a wide variety of distributions. The advantage of these estimators is that they provide more precise and accurate information about the data distribution, prior to choosing-on the basis of this information-the most suitable statistical analysis.
In the field of finance, Bonato [16] and Kim and White [17] proposed the use of other indices for estimating skewness and kurtosis, for example, those of Bowley [18], Groeneveld and Meeden [19] and Moors [20]. Recently, Aytaçoglu and Sazak [21] analyzed all these indices, as well as both conventional and Hogg's estimators, and made a series of recommendations for practitioners regarding the best estimator of skewness and kurtosis for certain distributions. Their conclusion was that the performance of each estimator depended upon the shape of the distribution and the sample size. For example, they found that conventional estimators performed quite well with large samples and for the Weibull and lognormal distributions, whereas the other estimators performed better with the Student distribution and were less affected by sample size. Bonato [16] and Kim and White [17] also pointed out that conventional estimators are influenced by outliers. For their part, Keselman et al. [13] examined empirical methods based on data trimming in the presence of outliers.
The results of all the aforementioned simulation studies provide information about the different tests which may be used to determine the shape of a distribution, and the applied researcher therefore has something of a guide when it comes to choosing estimators of skewness and kurtosis. However, having to choose between several estimators is impractical, especially when the majority of statistical packages only include the conventional estimators. In the present study we therefore focus solely on Hogg's estimators [12], as these are well-known, and we provide a macro that applied researchers can use to calculate them.
The purpose of this paper is to compare the behavior of Hogg's estimators of skewness and kurtosis with that of conventional estimators based on the third and fourth central moments of a distribution, with a view to their subsequent use in applied contexts.
To this end we conducted a Monte Carlo simulation study to estimate the skewness and kurtosis of the most frequent continuous distributions of the exponential family (gamma, lognormal and exponential), according to Bono et al. [11]. The indicators used to assess the bias, precision and accuracy of the different estimators of skewness and kurtosis were, respectively, the relative bias (RB), the coefficient of variation (CV) and the scaled root mean square error (SRMSE). Finally, we developed a SAS macro for calculating skewness and kurtosis with both conventional and Hogg's estimators. This macro will be highly useful for applied researchers.
In summary, this study had two objectives: (a) To assess the bias, precision and accuracy of conventional and Hogg's estimators on the basis of RB (interpreted in absolute values), CV and SRMSE, and (b) to provide applied researchers with a macro for calculating kurtosis and skewness with both conventional and Hogg's estimators. In order to address these objectives, we begin by reviewing the kurtosis and skewness estimators used in this study, detailing their theoretical values for the gamma, lognormal and exponential distributions. Extensive simulation work is then carried out to assess the estimators of skewness and kurtosis for each distribution and different sample sizes. Overall, the results indicated that for the non-normal distributions investigated, the estimators of skewness and kurtosis which best reflect the shape of the distribution are Hogg's estimators.
It should be noted that as this is a simulation study, the results apply solely to the distributions and sample sizes analyzed, and thus they cannot be compared with those of other simulation studies that analyze different conditions.

Review of Kurtosis and Skewness Estimators
The procedure for assessing the nature of a data distribution includes two tests: skewness and kurtosis. Skewness is the degree of asymmetry of a distribution, that is, how much it is skewed to the left or right. Kurtosis refers to the nature of distribution tails, that is, their length and weight. On a practical level it is more useful to know the nature of the tails first, and hence we begin by discussing kurtosis.
The traditional calculation of kurtosis is based on the fourth central moment, defined by the SAS program as: where n is the number of values for a variable, X i is the ith value of the variable, X is the sample average, and s is the sample standard deviation (SD). Equation (1) yields a value close to zero when the distribution is normal. Kr1 < 0 represents a shorter distribution with light and thinner tails, whereas Kr1 > 0 implies a longer distribution with heavy and fatter tails. The classical kurtosis coefficient is very sensitive to outlying values. Therefore, Hogg [12,22] developed two estimators of kurtosis, Q and Q 1 , which in this paper are referred to as Kr2 and Kr3. The Kr2 estimator is as follows: where U 0.05 is the mean of the upper 5% of the order statistics of the sample, L 0.05 is the mean of the lower 5% of the order statistics of the sample, and U 0.5 and L 0.5 are, respectively, the means of the upper and lower 50% of the order statistics. Thus, Kr2 is a ratio of two linear functions of the order statistics. According to Hogg [12] and Reed and Stark [15], values of Kr2 < 2 imply a light-tailed distribution (uniform), 2 ≤ Kr2 ≤ 2.6 a medium-tailed distribution (normal), 2.6 < Kr2 ≤ 3.2 a heavy-tailed distribution (double exponential) and Kr2 > 3.2 a very heavy-tailed distribution (like the Cauchy). Othman et al. [23] applied the following rule: a distribution is classified as normal-tailed if Kr2 < 3, as heavy-tailed if 3 ≤ Kr2 < 5, and as very heavy-tailed if Kr2 ≥ 5.
Hogg [12] also developed another estimator of kurtosis, a modification of Kr2, defined as: where U 0.2 and L 0.2 are, respectively, the means of the upper and lower 20% of the order statistics of the sample, and U 0.5 and L 0.5 are defined as in Equation (2). The reason for this change is that the numerator of Kr3 is an excellent linear estimate of the SD of a normal distribution [24], while the denominator is an excellent measure of the SD of a double exponential distribution. Both Kr2 and Kr3 are location-free statistics, and hence are uncorrelated with location statistics like the trimmed means [12,15].
As for skewness, the calculation of the conventional estimator is based on the third central moment, and is defined by the SAS program as: where n is the number of values for a variable, X i is the ith value of the variable, X is the sample average, and s is the sample SD. The normal distribution is symmetric, and thus the value of Equation (4) is zero. If Sk1 < 0 the distribution has a longer tail to the left than the right, in other words, it is left skewed. If Sk1 > 0 the distribution is skewed to the right. Hogg [12] proposed the estimator Q2 as a measure of skewness, labeled here as Sk2 and defined as: where U 0.05 and L 0.05 are the means of the largest and smallest 5% of the order statistics of the sample, and m 0.25 is the 25-trimmed mean, that is, the mean of the ordered observations trimmed by 25% at both the upper and lower ends. Keselman et al. [13] and Reed and Stark [15] referred to the 25-trimmed mean as T 0.25 . The calculation of m 0.25 or T 0.25 coincides with the mean of the middle (MID) 50% of the sample. Hence, the mean of the middle 50% is referred to by Hogg et al. [14] as M 0.5 and by Othman et al. [23] as MID.
According to Hogg et al. [14] and Othman et al. [23], if 0.5 ≤ Sk2 ≤ 2 the distribution can be considered symmetric, if Sk2 < 0.5 the distribution may be skewed to the left, and if Sk2 > 2 the distribution is skewed to the right. Reed and Stark [15], based on the work of Hertsgaard [25], classified distributions as symmetric (0.7 ≤ Sk2 ≤ 1.4), left skewed (Sk2 < 0.7) and right skewed (Sk2 > 1.4). Table 1 shows theoretical values of kurtosis and skewness for each estimator and distribution analyzed in the present study. The reference values of the conventional estimators are known. For the exponential distribution the theoretical values are Kr1 = 6 and Sk1 = 2. Regarding the lognormal distribution, the theoretical values of Kr1 and Sk1 are 5.9 and 1.75, respectively, with parameters ζ = 1 and σ = 0.5. For the gamma distribution we established the theoretical values of Kr1 and Sk1 according to the shape parameter α (0.5, 2 and 4), such that Kr1 = 6/α and Sk1 = 2/ √ α [26]. Finally, following the procedure described by Reed and Stark [15] with a sample size of N = 50 and 2500 iterations, the theoretical values of Hogg's estimators were obtained for N = 15,000 and 15,000 iterations in order to improve the procedure.

Method
Simulations were performed using SAS 9.4 [27], which features random number generators for known distributions. Thus, we constructed macros that enabled pseudorandom variables to be generated for the different distributions, and which could calculate kurtosis and skewness both with the conventional estimators that SAS uses by default (Kr1 and Sk1), and also with those of Hogg (Kr2, Kr3, and Sk2). The generated data were ordered from least to greatest, and we considered, in accordance with the corresponding equation, percentages above or below the order statistic of 5%, 20%, 25% and 50%.
In addition to the macros developed for the simulation study we constructed a macro for calculating, from a data file, the conventional estimators that SAS uses by default (Kr1 and Sk1), as well as those of Hogg (Kr2, Kr3, and Sk2). The first column of the data file is the subject identifier, while the second corresponds to the data. Before running the macro, the user must specify the sample size (N), and in the section "load a data file", the path and name of the data file (see Appendix A).

Study Variables
1. Sample sizes. The study considered sample sizes of N = 50, 100, 400, 1000 and 5000. 2. Distributions. In accordance with the systematic review of Bono et al. [11] we chose non-normal continuous distributions from the exponential family. Thus, data from the exponential distribution were generated using the function rand('exponential') in SAS. In order to generate data for the lognormal distribution we exponentialized the normal function with the parameters most commonly used in simulation studies, namely ζ = 1 and σ = 0.5 exp(rand('normal',1,0.5)), since if data are normally distributed, then X = exp(data) is lognormally distributed [26]. The SAS function that was used to generate data for the gamma distributions was rand('gamma', shape_parameter). In order to cover different shapes of the gamma distribution we chose shape parameters of α = 0.5, 2 and 4.
Each combination of sample size and distribution shape was replicated 10,000 times (5 × 5 × 10,000 = 250,000 simulations). For each replication the estimators of kurtosis (Kr1, Kr2 and Kr3) and skewness (Sk1 and Sk2) were computed in order to obtain the RB, CV and SRMSE of these estimators.

Evaluation Criteria
We analyzed which estimator best reflected the shape of the distribution, comparing the empirical values with the theoretical ones shown in Table 1, based on three criteria: bias, precision and accuracy [28].
Regarding the bias or deviation between the average empirical value and the theoretical value of kurtosis or skewness, we examined the RB because the amount of bias must be assessed in relative terms in order to be compared [29,30]. Thus, Equation (6) quantifies the percentage to which the estimated parameter values deviate from the true parameter values [29,31]: where β is the true value for the estimate of interest, andβ = S i=1β i /S (S is the number of simulations performed andβ i is the estimate of interest within each of the i = 1, . . . , S simulations).
In relation to precision, we calculated SD and CV. The SD is related to the variability of the estimated values, and little variation indicates that the estimator is precise [28]. In order to achieve greater comparability of precision among estimators, we also calculated the CV [28], which is the SD expressed as a percentage of the mean [32]: where SD β is the empirical SD of the estimate of interest across all simulations. Note that the calculation of CV does not require the true value to be known.
Finally, in order to evaluate accuracy, we calculated the root mean square error (RMSE). The RMSE incorporates both the bias and SD of the estimated parameters, and it transforms the mean square error (MSE) onto the same scale as the parameter [29,33,34]. The RMSE indicator represents the degree to which the estimated parameter values vary around the parameter value [30], and it is defined as: We calculated the scaled RMSE (SRMSE), dividing by the true value of the estimate of interest, in order to make comparisons between the results obtained with different estimators [28]: To sum up, we calculated the percentages of RB, CV and SRMSE of skewness and kurtosis across 10,000 simulated data sets from the gamma, lognormal and exponential distributions, in each case with respect to different sample sizes. A good criterion is choosing the estimators of skewness and kurtosis with the lowest percentages of RB, CV and SRMSE.

Results
This section presents the results of the simulations for the performance of Hogg's estimators compared with the conventional ones, depending on the shape of the distribution and the sample size. We first present results for the exponential distribution (Table 2), followed by those for the lognormal distribution (Table 3), and finally for the different shapes of the gamma distribution (Tables 4-6).

Exponential Distribution
For an exponential distribution, Hogg's estimators have a low and negative bias (see Table 2). Regarding kurtosis, the Kr3 estimator is the best, particularly given its small percentages of RB, CV and SRMSE compared with those of Kr2 and the conventional estimator Kr1, reaching values close to zero with N = 5000. The RB percentages of the Kr3 estimator are close to zero for all the sample sizes considered. The second-best estimator is Kr2, with lower values on the evaluation criteria than those of Kr1. Note that with large samples (N = 1000 and 5000) the percentages of RB are close to zero, in other words, Kr2 is less biased and similar to Kr3. Finally, the conventional estimator Kr1 yields very high values on the evaluation criteria, and it is biased negatively. In general, all the kurtosis estimators become less biased and achieve greater precision and accuracy as the sample size increases. This effect of sample size is strongest for the conventional estimator Kr1, and less marked for Kr2 and Kr3. For the estimator Kr1, although the RB percentage is close to zero with N = 5000, the RB, CV and SRMSE percentages are high in comparison with those of Hogg's estimators of kurtosis (Kr2 and Kr3). Regarding skewness, the Sk2 estimator is better than Sk1, since its percentages of RB, CV and SRMSE are all lower. The differences between Sk1 and Sk2 are, however, not as large as those between Kr1 and Kr2/Kr3. Note especially that Sk1 and Sk2 differ much less in their percentages of CV and SRMSE.

Lognormal Distribution
The results obtained with the lognormal distribution (Table 3) follow a similar pattern to those for the exponential distribution. The Kr1 estimator yields higher RB, CV and SRMSE values, and the best estimator is Kr3, followed by Kr2. Regarding skewness, Sk2 again shows lower values on the evaluation criteria. For both distributions (exponential and lognormal), the RB, CV and SRMSE percentages decrease for all the estimators as the sample size increases, with the effect being most marked for Kr1.

Gamma Distributions
As in the case of the exponential and lognormal distributions, the best estimators for the gamma distributions, in terms of RB, CV and SRMSE (Tables 4-6), are Hogg's estimators. Regarding kurtosis, the best estimator is Kr3, followed by Kr2, whereas for skewness the best estimator is Sk2. This is the case for all three shapes of the gamma distribution (α = 0.5, 2 and 4). As for the effect of sample size, this is much greater for Kr1.

Summary of the Performance of Hogg's and Conventional Estimators
To summarize, comparison of conventional estimators of kurtosis and skewness with those proposed by Hogg reveals, for the gamma, lognormal and exponential distributions, the following: (a) all the estimators have negative values of RB; in other words, they are biased negatively; (b) the Kr3 estimator yields percentages of RB close to zero for all sample sizes; (c) the Kr2 estimator yields percentages of RB close to zero with N = 400, 1000, and 5000; (d) the estimators Kr3 and Sk2 are the least biased, with low RB percentages; (e) the estimators Kr3 and Sk2 yield the greatest precision and accuracy in terms of lower CV and SRMSE percentages; (f) the estimators Kr2, Kr3 and Sk2 tend to yield RB percentages closer to zero as sample size increases; (g) with Kr3 and N = 5000 the RB, CV and SRMSE percentages tend towards zero; (h) the estimators Kr1 and Sk1 yield high percentages of RB, CV and SRMSE that quickly decrease as N increases; (i) sample size has less of an effect on the RB, CV and SRMSE percentages of the estimators Kr2 and Kr3 than on those of Kr1; (j) sample size has less of an effect on the RB percentage of the Sk2 estimator than on that of Sk1; and (k) the sample size effect is similar for the CV and SRMSE percentages of the estimators Sk1 and Sk2.

Discussion
The choice of which estimators best reflect the shape of the distribution depends on RB, CV and SRMSE. These are the main indicators that are generally used to evaluate the degree of bias, precision and accuracy of an estimator.
This study had two objectives. The first was to compare the bias, precision and accuracy of conventional and Hogg's estimators of skewness and kurtosis by calculating their RB, CV and SRMSE for the most frequent continuous distributions used in health, education, and social sciences (i.e., gamma, lognormal and exponential distributions). Second, we constructed a macro with SAS for calculating both conventional and Hogg's estimators. The first objective was addressed by performing a Monte Carlo simulation study.
Regarding kurtosis, the values estimated by Kr3, followed by those obtained with Kr2, show lower percentages of RB, CV and SRMSE than do the values estimated with the conventional estimator Kr1, regardless of sample size. Aytaçoglu and Sazak [21] found that with sample sizes of N = 300 and N = 1000, the Kr2 estimator was better than Kr1 for Student's distribution. For the lognormal distribution, by contrast, these authors concluded that Kr2 was worse than Kr1. However, their results are not comparable with our own, as they used different parameters when simulating a lognormal distribution. That is, the shape of the lognormal distribution that was simulated by these authors is different to that considered here. In addition, these authors studied bias and MSE, and to make the results totally comparable, these criteria would need to be scaled.
With respect to skewness, and according to RB values, Sk2 is less biased than Sk1. In terms of the CV and SRMSE values, Sk2 is a little more precise and accurate than Sk1. In contrast to these findings, Joanes and Gill [35] showed that Sk1 performs well and yields small MSE in samples from non-normal distributions. However, they used a different criterion to our own to evaluate accuracy (i.e., MSE vs. SRMSE in our study).
Regarding the impact of sample size, this was greater in the case of the Kr1 estimator, with the RB, CV and SRMSE values decreasing considerably as N increased. The observed influence of sample size is consistent with the results regarding bias and MSE that Aytaçoglu and Sazak [21] obtained with lognormal and Weibull distributions.
The simulations we performed confirm that the estimations obtained with the conventional measure of kurtosis deviate extremely from the theoretical values, especially with small samples. This is likely due to the presence of outliers in non-normal distributions. The calculation of conventional estimators uses all the data, whereas outliers are excluded, by means of trimming, when calculating Hogg's estimators. In this context, Kim and White [17] showed that the conventional measures are influenced by extreme observations because they are essentially based on sample averages. We also observed that changes in sample size greatly affect the estimation of Kr1. With the gamma distribution and Kr1 the effect of sample size varies depending on the shape of this distribution, it being most marked when the shape parameter is α = 4 for CV and SRMSE.
By contrast, Hogg's estimators of kurtosis are less influenced by sample size and give results closer to the theoretical values. Our results also show that the conventional skewness estimator Sk1 is less affected by sample size than is the conventional kurtosis estimator Kr1. As Bonato [16] points out, it is important that values of kurtosis and skewness are stable; that is, they do not vary too much depending on sample size and the presence of outliers. In this study we found that Hogg's kurtosis estimators are more stable than is the conventional estimator.
In conclusion, this Monte Carlo simulation study shows that Kr3 and Sk2 are the best estimators of kurtosis and skewness, since they yield less bias and are more precise and accurate. In other words, they are better according to all three evaluation criteria (RB, CV and SRMSE), irrespective of the type of distribution (exponential, lognormal or gamma).

Utility for Researchers
This study has shown that the conventional indices of skewness and kurtosis based on the third and fourth central moments of a distribution yield biased values in a wide range of distributions commonly used in health, education, and social sciences. Consequently, researchers should also compute other estimators which better reflect the shape of the distribution. The results from our Monte Carlo simulation study indicate that Hogg's estimators, specifically Kr3 and Sk2, are less biased and more precise and accurate than are conventional estimators. Although statistical packages include the conventional indices of skewness and kurtosis, they do not compute Hogg's estimators, thus restricting their use. For this reason, we developed a SAS macro (available in Appendix A) for calculating not only the conventional estimators, but also those proposed by Hogg. Our aim in doing so is to make accessible to a wide range of applied researchers and practitioners the computation of several estimators of skewness and kurtosis. Identifying the shape of the underlying distribution associated with data is a key aspect to consider when selecting a data analysis procedure. As Islam [36] points out, non-normality may lead to incorrect statistical inferences, since many estimation techniques rely on the underlying distributional assumption of normality of the data.

Limitations and Future Directions
The present study has certain limitations that should be highlighted. First, the results are limited to the conditions examined in this study, and it is difficult to compare them with other simulation studies since the sample sizes and distribution shapes differ from one study to another. In addition, other simulation studies have used different evaluation criteria, such as unscaled bias or RMSE, which also makes comparison difficult. Second, our study is focused on Hogg's estimators, although it can now be extended to other estimators of skewness and kurtosis in further research. Third, although the distributions studied are the most common non-normal distributions found in research, there may be studies whose data derive from other distributions. In this respect, it would be interesting, for example, to examine the performance of estimators with distributions that have negative values of kurtosis and skewness.
Finally, an interesting line of future research would be to review different trimming procedures in statistical analysis, and to carry out the corresponding simulation studies, manipulating sample size and the type of distribution. In fact, one kind of study that has been repeated over the years involves the analysis of the amount and type of data trimming that should be adopted in order to circumvent the biasing effects of outlying values on Type I error and/or power rates for classical tests [13,37]. For example, Othman et al. [23], building on the study by Babu et al. [38], suggested using all the sample data if Kr2 < 3; if 3 ≤ Kr2 < 5 they recommend trimming the top and bottom 10% of the sample points and using the middle 80%, while if Kr2 ≥ 5 the researcher should trim the top and bottom 20% of the sample points and use the middle 60%. Keselman et al. [13] recommend that applied researchers could adopt the minimal percentage (10% total trimming), except when data are extremely non-normal.    [2]; q2 = summary [3]; Q = q||q1||q2; varnames='Q1':'Q3'; create summary1 from Q [colname = varnames]; append from Q; close summary1; proc append base = seconddata; run; data seconddata; set seconddata; proc iml; use seconddata; read all var('Q1':'Q3')into seconddata; data seconddata1(rename = (Q1 = KR2 Q2 = KR3 Q3 = SK2)); set seconddata; run; proc print data = seconddata1; run; data firstdata; set firstdata; proc iml; use firstdata; read all var ('Qc1':'Qc2') into firstdata; data firstdata1(rename = (Qc1 = KR1 Qc2 = SK1)); set firstdata; run; proc print data = firstdata1; run; %mend kurtskew; %kurtskew;