1. Introduction
Lehmann [
1] proposed a class of asymmetric distributions. The cumulative distribution function (cdf) for such class is given by:
where
F is in itself a cumulative distribution function and
, with
the set of rational numbers. In the special case where
is an integer number, the above cdf corresponds to the distribution of the maximum in a sample of size
.
Durrans [
2] gives an interpretation for (
1) in the more general case
based on
fractional order statistics. Assume
F is an absolutely continuous function and
f denotes its respective probability density function (pdf), i.e.,
. The pdf related to (
1) is:
Henceforth, we refer to a random variable with pdf as in (
2) as the
power distribution (), and we use the notation
. The particular case where
, the cdf of the standard normal model, was approached in [
2]. In such a case, the respective pdf of the model is reduced to:
where
is the standard normal pdf. The authors used the term
generalized Gaussian distribution to refer the model in Equation (
3). This model also was studied with more detail by [
3]. Pewsey et al. [
4] call Model (
3) the
power-normal (PN) model, denoting
, and show that its Fisher information matrix (FIM) for the location-scale extension is nonsingular for
(i.e., the symmetric case).
The generalization of the normal distribution in (
3) also is a particular case of the Beta-normal model discussed in [
5].
On the other hand, the random variable
X follows a half-normal distribution with scale parameter
if its pdf is given by:
for
. We denote
. Cooray and Ananda [
6] extended the half-normal (HN) model by introducing the generalized half-normal (GHN) model, that is
X is a random variable with the GHN distribution with scale parameter
and shape parameter
, if its pdf is given by:
We use the notation
. Observe that
, that is one obtains the half-normal model with scale parameter
.
Some properties of the GHN distribution are:
, for
where
is the cdf of
X and
is the gamma function. The proofs of those properties are presented in [
6]. Recent extensions of the HN model are considered in [
7,
8], among others.
The recent literature has experienced a growth in the theory and applications of the continuous truncated models. Among others, we refer the reader to [
9,
10,
11,
12,
13,
14,
15,
16].
The main focus of this paper is to study the positive truncation for the model considered in (
3), where the normalizing constant for the pdf (
3) is to be determined, and the resulting model is an extension of the half-normal distribution. That is, we generate a more flexible extension of the half-normal distribution that we call the truncated positive power-normal (TPN) distribution, where the asymmetry parameter
is a shape parameter. Given its flexibility, the model is quite useful for fitting positive data related to survival analysis and reliability.
The paper is organized as follows. In
Section 2, we present the TPN distribution. Some basic properties such as the quantile function, the risk function and some moments are considered, and Shannon entropy is studied. In
Section 3, we discuss some inferential aspects such as the log-likelihood function and its maximization, the corresponding Fisher information matrix (FIM) and the method of moments estimation.
Section 4 deals with an extension of the TPN model and presents results for a small-scale simulation study, indicating good parameter recovery. Results of using the proposed model in two real applications are reported in
Section 5. The main conclusion is that the TPN model can be a viable alternative for adjusting positive data.
4. Simulation Study
In this section, we present a brief simulation study in order to assess the performance of the MLEs of the TPN model in finite samples. To simulate from the distribution, it is sufficient to simulate from the PN distribution, accepting only those values greater than c. The simulation algorithm is then:
Simulate , and compute .
If , make . Otherwise, go to the previous step.
The acceptance ratio is then
. Hereafter,
c is considered known and taking values of 0, 0.5 and 1.0. Likewise, for
and
were chosen three values, and the generated samples were of sizes
,
,
and
. For each combination of sample size and parameter values, 1000 samples were generated and MLEs were computed.
Table 1 and
Table 2 summarize the mean of the estimated parameters (mean), the mean of the estimated standard deviations (s.d.) and the root of the mean squared error (
). Note that a small sample size (say
and
) presents a moderate bias for both parameters, which are decreasing for
n increasing. Additionally, the s.d.’s are closer to
, especially when
n is increased, suggesting that the s.d. are well estimated even in small sample sizes.
Examples
Figure 5 depicts the model fitting for some simulated samples of size
and truncations at
and at
.
5. Real Data Illustration
In this section, we present two applications to illustrate the performance of the TPN model compared with other usual distributions in the literature, such as the Weibull, gamma, GHN, Birnbaum–Saunders (BS, [
24,
25]),
-Birnbaum–Saunders (
-BS, [
26]), epsilon half-normal (EHN, [
7]), power half-normal (PHN, [
27]) and truncated positive normal (TN, [
28]) models. Model comparison is implemented by using the AIC ([
29]).
5.1. Australian Athletes
This dataset consists of several variables recorded on 202 Australian athletes and reported in [
30]. Concretely, we analyze here measurements of the body mass index (BMI).
Table 3 presents basic descriptive statistics for the dataset. We use the notation
and
to represent sample asymmetry and kurtosis coefficients, respectively.
Using results from
Section 3.1, moment estimators were computed leading to the following values:
and
, which were used as initial estimates for the maximum likelihood (ML) approach. In this case, we fixed two values for the TPN model, namely
and
(a value close to the sample minimum).
Table 4 depicts parameters’ estimates by maximum likelihood using the
bbmle function in [
22]. The standard errors of the MLE are calculated using the information matrix of each model. For each, we report the estimated log-likelihood function and the corresponding AIC. It can be noted that the AIC scores indicate better fit of the TPN model. On the other hand, results for
and
are similar. Therefore, we chose the standard model with
. In
Figure 6, the estimated densities of the models using the ML estimates are shown with the data histogram. This also indicates good fit for the TPM model. Finally,
Figure 7 shows the q-q plots for the TPN model and the other considered models. Note that TPN is a more appropriate model than Weibull, gamma, GHN and TN for this dataset because the sample quantiles are closer to the respective theoretical quantiles. Excepting the TPN distribution, all the other models present serious difficulties in accommodating the right tail of the data. Finally, the estimated skewness and kurtosis coefficients for the TPN model consider that the MLEs are 0.694 and 3.731. The 95% confidence intervals (CI) for those coefficients estimated via bootstrap (based on 10,000 bootstrap samples) are given by (0.167; 1.420) and (2.411; 6.841), respectively. Note that the sample versions of both coefficients are contained in the estimated CI.
5.2. Breaking Stress of Carbon Fibers
This dataset is considered in [
31] and corresponds to breaking stress of carbon fiber (BSFC) measures in Gba. Cordeiro and Lemonte [
26] already analyzed these data comparing the BS and
-BS models. Additionally, we also compared those models with the EHN and PHN distributions.
Table 5 presents basic descriptive statistics for the dataset. Note that for this dataset, the sample minimum is close to zero. Therefore, in this case, it seems reasonable to consider
.
We also computed the moment estimators, resulting in and , which were used as initial estimates for the maximum likelihood approach.
Table 6 shows the MLEs. It can be noted that AIC shows a better fit of the TPN model. In
Figure 8, the ML setting of models is shown with the probability histogram. Finally, from the q-q plots in
Figure 9, we have that the TPN model fits the data better than the other models considered.