# Generalizing Normality: Different Estimation Methods for Skewed Information

## Abstract

## 1. Introduction

## 2. The Data

## 3. Statistical Inference Elements

#### 3.1. Alpha-Skew-Normal (ASN) Distribution

#### 3.2. Different Estimation Methods for the ASN Distribution

#### 3.2.1. Maximum Likelihood Estimation

#### 3.2.2. Ordinary and Weighted Least-Square Estimates

#### 3.2.3. Method of the Maximum Product of Spacings

#### 3.2.4. The Cramer–von Mises Minimum Distance Estimators

#### 3.2.5. The Anderson–Darling and Right-Tail Anderson–Darling Estimators

## 4. Numerical Analysis

- Given a set of parameters from the $\mathrm{ASN}(\mu ,\sigma ,\alpha )$ distribution, N samples of size n were generated;
- For each generated set, based on the estimation methods (MLE, LSQ, WLQ, MPS, CME, ADE, and RADE), estimates of the parameters ($\mu $, $\sigma $, and $\alpha $) were calculated;
- Then, considering $\widehat{\mathit{\theta}}=\left(\right)open="("\; close=")">\widehat{\mu},\widehat{\sigma},\widehat{\alpha}$ and $\mathit{\theta}=\left(\right)open="("\; close=")">\mu ,\sigma ,\alpha $, the bias and mean squared error (MSE) of $\widehat{\theta}$, which were given, respectively, by $\frac{1}{N}{\sum}_{k=1}^{N}\left(\right)open="("\; close=")">{\widehat{\theta}}_{j}^{\left(k\right)}-{\theta}_{j}$ and $\frac{1}{N}{\sum}_{k=1}^{N}{\left(\right)}^{{\widehat{\theta}}_{j}^{\left(k\right)}}2$ for $j=\{1,2,3\}$ (each parameter), were computed. ${\widehat{\theta}}_{j}^{\left(k\right)}$ denotes the estimate of ${\theta}_{j}$ obtained from sample k for $k=1,2,\cdots ,N$.
- The overall bias and the overall MSE were computed with $\frac{1}{N}{\sum}_{k=1}^{N}{\sum}_{j=1}^{3}$$\left(\right)$ and $\frac{1}{N}{\sum}_{k=1}^{N}{\sum}_{j=1}^{3}{\left(\right)}^{{\widehat{\theta}}_{j}^{\left(k\right)}}2$.

## 5. Results

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

**Figure 1.**Visual summary of the role of probabilistic reasoning in knowledge discovery in databases as a cornerstone for the quantification of uncertainty. Statistical inference procedures enable us to draw conclusions based on a sample and generalize them to an entire population.

**Figure 2.**The PDF $f\left(t\right)$ of the ASN distribution, where t is a random variable, assuming $\mu =0$ (location), $\sigma =1$ (scale), and different values for $\alpha $ (skewness).

**Figure 3.**Bias and MSE of the estimates of $\mu =0$, $\sigma =1$, and $\alpha =1$ for N = 10,000 simulated samples of size n using the following methods: MLE, MPS, ADE, RADE, LSE, WLSE, and CME. Based on Figure 2, by choosing the configuration of these parameters ($\mu =0.5$, $\sigma =0.5$, and $\alpha =3$), a bimodal PDF can be seen, which presents a larger peak to the left and a smaller peak to the right.

**Figure 4.**Bias and MSE of the estimates of $\mu =0$, $\sigma =1$, and $\alpha =6$ for N = 10,000 simulated samples of size n using the following methods: MLE, MPS, ADE, RADE, LSE, WLSE, and CME.

**Figure 5.**Empirical density function of the water flux in the 21 rivers/channels in the surroundings of Copiapó city. The solid gray shade represents the density (frequency) of each of the numerical records of the water flux, and the solid red line represents a smooth adjusted function.

**Figure 6.**The empirical distribution of the log of the water flux and its frequency, which is represented by gray blocks. The black dashed line represents the adjusted ASN distribution based on the MPS ($\mu =-1.93,\sigma =0.896,\alpha =-7.87$), which is represented by the blue dashed line, and the RADE ($\mu =-1.87,\sigma =1.05,\alpha =-8.88$), which is represented by the red dot-dashed line.

**Figure 7.**The logarithm of the water flux dispersion records (y-coordinates) for each year (per panel) by month (x-coordinates).

**Figure 8.**Ecdf-based test of the ASN distribution; RADE returned an AIC estimation of 274 and MPS returned an AIC estimation of 251.

Estimation Method | Abbreviation | Created by |
---|---|---|

Maximum Likelihood Estimation | MLE | Fisher [25] |

Ordinary Least-Square Estimate | LSQ | Swain et al. [5] |

Weighted Least-Square Estimate | WLQ | Swain et al. [5] |

Maximum Product of Spacings | MPS | Cheng & Amin [6] |

Cramer–von Mises Estimators | CME | Macdonald [26] |

Anderson–Darling Estimator | ADE | Boos [27] |

Right-Tail Anderson–Darling Estimator | RADE | Luceno [8] |

Month | Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | NA’s |
---|---|---|---|---|---|---|---|

JAN | 0.02 | 0.06 | 0.31 | 0.5374 | 0.68 | 3.45 | 39 |

FEB | 0.01 | 0.065 | 0.2 | 0.5165 | 0.6875 | 3.15 | 40 |

MAR | 0.01 | 0.06 | 0.31 | 0.5449 | 0.85 | 3.24 | 37 |

APR | 0.03 | 0.08 | 0.27 | 0.4494 | 0.5275 | 2.25 | 36 |

MAY | 0.03 | 0.12 | 0.29 | 0.7859 | 0.55 | 19.47 | 47 |

JUN | 0.02 | 0.12 | 0.35 | 0.9106 | 0.62 | 19.01 | 51 |

JUL | 0.01 | 0.14 | 0.46 | 0.5636 | 0.64 | 2.58 | 53 |

AUG | 0.01 | 0.1125 | 0.33 | 0.4692 | 0.6175 | 2.23 | 50 |

SEP | 0.01 | 0.2025 | 0.45 | 0.5356 | 0.6775 | 2.66 | 52 |

OCT | 0.02 | 0.1 | 0.37 | 0.5229 | 0.77 | 2.46 | 49 |

NOV | 0.01 | 0.0775 | 0.365 | 0.5536 | 0.855 | 3.36 | 48 |

DEC | 0.01 | 0.055 | 0.265 | 0.5639 | 0.7975 | 5.04 | 46 |

CUM Prob. | 1% | 10% | 50% | 99% | 99.99% |

Flux | 0.0059 | 0.0174 | 0.3396 | 1.5068 | 16.281 |

