The Properties of a Decile-Based Statistic to Measure Symmetry and Asymmetry

This paper studies a simple skewness measure to detect symmetry and asymmetry in samples. The statistic can be obviously applied with only three short central tendencies; i.e., the first and ninth deciles, and the median. The strength of the statistic to find symmetry and asymmetry is studied by employing numerous Monte Carlo simulations and is compared with some alternative measures by applying some simulation studies. The results show that the performance of this statistic is generally good in the simulation.


Introduction
In scientific studies, the researchers can summarize a given dataset using descriptive statistics. The descriptive statistics contain three known tendencies: central tendencies, dispersion tendencies and shape tendencies [1]. The central and dispersion tendencies, such as mean, median, standard deviation and variance deal with the convenience of the dataset [1][2][3][4][5]. The shape tendencies, such as skewness and kurtosis, are related to the distribution of dataset [6][7][8]. These measures which may be utilized in divergent disciplines consist of the tests of normality and of the lustiness for normal theoretical procedures. Skewness is often utilized to reference to symmetry. Nevertheless, symmetry is not often perspicuously defined, and it is thought that everybody knows it. There are some definitions about symmetry relying on the disciplines that it is utilized in. In literature, any statement related to the symmetry of a structure has to be done with reference to some rules of symmetry-a score, a line or an axis [9]. In the statistical inference, the meaningful score or axis is taken as the center of a distribution. There are several measures employed to quantify the degree of skewness of a distribution. Assume that µ, m, M, σ, µ 3 , Q 1 and Q 3 , are the mean; median; mode; standard deviation; third centered moment; and the first and the third quartiles, respectively. The statistics introduced for measuring the skewness are Pearson's coefficient of skewness: Pearson's second coefficient of skewness: Yule's coefficient of skewness: the standardized third central moment: Bowley's coefficient of skewness: and three Galip's coefficients of skewness: Although there are numerous different measures, and practical elongations of the above coefficients were proposed afterward, the original measures are still employed to this day, especially γ 1 (or its variants). It is largely utilized in statistical calculation software.
When we face a dataset containing outliers, we need a measure that can carefully consider these outliers. Therefore, probably, the measures that are based on the extreme values (max and min) such as three Galip's coefficients of skewness; are based on the first and the last quartiles (Q 1 and Q 3 ) such as Bowley's coefficient of skewness; or are based on the first and the last deciles (D 1 and D 9 ), should be more effective than other methods. The previous studies indicated that the three Galip's coefficients of skewness had the most power to detect symmetry and asymmetry. But the Bowley's coefficient of skewness acted not so well. There is no deep study about the definition of skewness based on deciles and the comparison between them and other alternatives.
In this work, at first, we consider the definition of skewness based on deciles and then study its asymptotic properties, similar to the approach that was applied in [18][19][20][21][22][23]. Finally, the power of the considered statistic to detect symmetry and asymmetry is compared with the powers of other measures of skewness.

Decile-Based Skewness
Let X 1 , . . . , X n be a sample from a distribution F on the real line, and we suppose that F is continuous so that all observations are distinct with probability one. We may then arrange the observations in increasing order without ties, X (1) < . . . < X (n) . These variables are called the order statistics, where X (k) is the k th order statistic. For 0 < p < 1, the p th quantile of F is defined as x p = F −1 (p) and the corresponding sample quantile is defined as X (k) where k = np , the ceiling of (the smallest integer greater than or equal to np). Let D 1 and D 9 be the first and nine sample deciles (0.1 and 0.9 quantiles), respectively. We consider our statistic for measuring the skewness by In the following, the asymptotic distribution of the proposed statistic is explored. Lemma 1. LetU 1 , . . . , U n be independent, identically distributed (iid in short) random variables from U(0, 1) and U (1) < . . . < U (n) , which are order statistics of U 1 , . . . , U n . If n → ∞ , then where 0 < p 1 < p 2 < p 3 < 1, and Additionally, assume that n − p 3 → 0 as k 1 , k 2 , k 3 , and n → ∞ . Then by the extension of the results given in [24], then, by Cramer's theorem [24], Finally, the proof is completed with the reality that the distribution of Corollary 1. LetX 1 , . . . , X n be iid random variables with density and distribution functions f and F, respectively.
Additionally, assume that f (x) is continuous and positive in a neighborhood of the quantiles x p 1 , x p 2 and x p 3 with p 1 < p 2 < p 3 ; then, where Proof. By applying the transformation g(y 1 , y 2 , in Lemma 1, the proof will be completed. Be careful that the derivation of g is The asymptotic distribution of SK is provided in the following theorem. This is our major contribution. It is also necessary to infer the skewness of population. Theorem 1. LetX 1 , . . . , X n be iid random variables with density function f. Additionally, assume that f (x) is continuous and positive in a neighborhood of the quantiles x 0.1 , x 0.5 and x 0.9 . Then, the asymptotic distribution of the proposed statistic can be illustrated by Proof. The proof is simply achieved using Cramer's theorem [24] and taking g( Corollary 2. LetX 1 , . . . , X n be iid random variables from U(0, 1); then, the asymptotic distribution of the proposed statistic is given by These results can be employed to build an asymptotical confidence interval and to check the hypothesis.

Asymptotic Confidence Interval
Now, T n can be utilized as a pivotal quantity to build a confidence interval asymptotic to a population's skewness,

Hypothesis Testing
Hypothesis testing related to skewness is a crucial issue in practical application. For instance, the assumption Skewness = 0 is tantamount to the symmetry. Generally, to test H 0 : Skewness = γ 0 , the test statistic can be Similar to the methodology provided in Theorem 1, it can prove that with the null hypothesis, T 0 has, asymptotically, standard normal distribution.

Asymptotic Properties of the Proposed Statistic
In this part, many data sets are drawn to analyze the performance of the proposed approach, for distinct symmetric distributions and divergent sample sizes. Firstly, we checked that the given CI and test statistic are truly the asymptotic CI and test statistic. For every parameter, the experiential coverage probability (percentage of runs for which the given CI contains zero (true skewness)) was calculated by relying on 10,000 repetitions using statistical R 3.6.2 and SPSS 25 software. In addition, for each repetition, the value of the given test statistic is presented and normal Q-Q plots of the given test statistic are provided. The Shapiro-Wilk's normality test is used to confirm the normality of the given test statistic. The experiential coverage probabilities for divergent parameters are illustrated as in Table 1. The results show that the experiential coverage probability of proposed approach is more than nominal level (0.95), especially when the sample sizes grow. In the other hand, we can admit the given CI as the asymptotic CI for the skewness of population. Figure 1 and Table 2 show the Q-Q plots for the standard normal distribution and the results of Shapiro-Wilk's normality test in the test statistic, respectively.  It can be then seen that the asymptotic properties are relatively satisfied in all situations (p-value is greater than 5%). Thereafter, it can be seen that our approach is a good choice to build a CI and execute hypothesis testing for the skewness of a population.

Comparison with Alternative Measures
To check the performances of the considered statistic, its power to detect asymmetry is compared with the conventional measures of skewness by employing a Monte Carlo simulation. As in Section 3, numerous data sets were drawn to check the performances of the measures, for different asymmetric distributions and different sample sizes using R software. For this purpose, we generated 10,000 samples of size = 10, 20, 50, from a chi-square distribution with m degrees of freedom, ( ( )). We considered three cases: extremely skewed (m = 1), moderately skewed (m = 5) and slightly skewed (m = 40). The powers (at 5% significant level) of different measures to detect asymmetry are summarized in Table 3. It can be then seen that the asymptotic properties are relatively satisfied in all situations (p-value is greater than 5%). Thereafter, it can be seen that our approach is a good choice to build a CI and execute hypothesis testing for the skewness of a population.

Comparison with Alternative Measures
To check the performances of the considered statistic, its power to detect asymmetry is compared with the conventional measures of skewness by employing a Monte Carlo simulation. As in Section 3, numerous data sets were drawn to check the performances of the measures, for different asymmetric distributions and different sample sizes using R software. For this purpose, we generated 10,000 samples of size n = 10, 20, 50, from a chi-square distribution with m degrees of freedom, χ 2 (m) . We considered three cases: extremely skewed (m = 1), moderately skewed (m = 5) and slightly skewed (m = 40). The powers (at 5% significant level) of different measures to detect asymmetry are summarized in Table 3. As preliminary results, based on the maximum power, it can be observed that the performances of SK, γ 1 , SK G1 , SK G2 and SK G3 are approximately similar and are more powerful than other methods for all simulated datasets, and are therefore are very promising. The performances of SK P , SK P2 and SK Y are approximately similar and have the next best ranks, while SK B has the worst performance in all situations. In general, the measures that are based on the extreme values (maximum and minimum), such as three Galip's coefficients of skewness, and those based on the first and the last deciles (D 1 and D 9 ), are more effective than other methods, because of their better performances and easy calculations.

Discussion
In this work, at first, we considered the definition of skewness based on deciles, and then studied its asymptotic properties. The results showed that the experiential coverage probability of this measure was more than nominal level (0.95), especially when the sample size was increased. The Q-Q plots versus the standard normal distribution and the results of Shapiro-Wilk's normality test verified the theoretical asymptotic properties. Finally, the power of the considered statistic to detect symmetry and asymmetry was compared with the powers of other measures of skewness. The power study indicated that the performances of decile-based measure and three Galip's coefficients of skewness were approximately similar, and were more powerful than other methods for all simulated datasets, and are therefore are promising for application in practice.

Conclusions
We presented a simple measure to find skewness in patterns. The new measure relies on a new definition of skewness that contains many outstanding advantages. The proposed coefficient of skewness could be obviously calculated with only three short statistics; i.e., the first and nine deacons and the median. The strength of the proposed statistic to find symmetry and asymmetry was studied by employing numerous Monte Carlo simulations. The results show that the performance of new statistic is generally very good in the simulation. There are many definitions to describe symmetry and asymmetry. To investigate the skewness in datasets including outliers, we should use the measures that consider the effects of outliers. Therefore, probably, the measures that are based on the extreme values (maximum and minimum), such as three Galip's coefficients of skewness; those based on the first and the last quartiles (Q 1 and Q 3 ), such as Bowley's coefficient of skewness; and those based on the first and the last deciles (D 1 and D 9 ), are candidates for application. Other studies showed that Galip's coefficients of skewness are more powerful for detecting symmetry and asymmetry. There is no deep study about the definition of skewness based on deciles and a comparison between them and other alternatives. In this work, at first, we considered the definition of skewness based on deciles and then studied its asymptotic properties. Finally, the power of the considered statistic to detect symmetry and asymmetry was compared with the powers of other measures of skewness. For future works, we suggest readers to use a definition of skewness based on combinations of more deciles, not only the first and the ninth deciles. We think this combination will improve the detection of symmetry and asymmetry.