Slash Truncation Positive Normal Distribution and Its Estimation Based on the EM Algorithm

: In this paper, we present an extension of the truncated positive normal (TPN) distribution to model positive data with a high kurtosis. The new model is deﬁned as the quotient between two random variables: the TPN distribution (numerator) and the power of a standard uniform distribution (denominator). The resulting model has greater kurtosis than the TPN distribution. We studied some properties of the distribution, such as moments, asymmetry, and kurtosis. Parameter estimation is based on the moments method, and maximum likelihood estimation uses the expectation-maximization algorithm. We performed some simulation studies to assess the recovery parameters and illustrate the model with a real data application related to body weight. The computational implementation of this work was included in the tpn package of the R software.


Introduction
The modeling of non-negative data has grown exponentially, since many datasets have this characteristic. Distributions with support in the positive line are used widely in the engineering and reliability fields related to failure time (also known as lifetime data). The half-normal distribution (HN) is a very well-known model for non-negative data, discussed extensively in the literature. For instance, Rafiqullah et al. [1] used the HN model to analyze survival data related to breast cancer in Hispanic black and non-Hispanic black women. Bosch-Badia et al. [2] studied the applicability of the HN distribution to risk analysis traditionally performed using risk matrices. Tsizhmovska et al. [3] analyzed the length of sentences where one of their distributions was the HN.
Olmos et al. [4] generated an extension of the HN distribution, obtaining a distribution that captures atypical data, but with little flexibility, called the slashed half-normal distribution (SHN). Cooray and Ananda [5] generalized the HN distribution, obtaining a new flexible model, which they denominated the generalized half-normal (GHN) distribution, which includes the HN model as a particular case. Despite the flexibility offered, a major difficulty appears, commonly related to limitations on the use of atypical data. To solve the obstacle, Olmos et al. [6] proposed an extension of the GHN model, named the slashed generalized half-normal (SGHN) distribution. The main aim of the authors was to generate a model with a higher kurtosis that allows better modeling of positive data in the presence of outliers. Other authors have worked on a similar idea, e.g., Iriarte et al. [7], Reyes et al. [8], Olmos et al. [9], Segovia et al. [10], and Astorga et al. [11].
Gómez et al. [12] truncated the normal distribution, conditioning it to positive values, i.e., if X has a normal distribution, the authors studied X|X > 0 (see Jonhson et al. [13]), creating a distribution that they named the truncated positive normal distribution (TPN).
where φ denotes the pdf of the standard normal model, σ > 0 is a scale parameter, and λ ∈ R is a shape parameter.
On the other hand, the slash distribution is defined stochastically as the quotient between two independent rv, let us say Z and U, as follows: where Z ∼ N(0, 1) and U ∼ U(0, 1) are independent and q > 0 is a shape parameter. Olmos et al. [6] used this idea to propose an extension to the half-normal generalized model of Cooray and Ananda [1], called the slashed generalized half-normal distribution (SGHN). The density function of this rv is as follows: where σ, α, q > 0 and G(·; a) is the cumulative distribution function (cdf) of the gamma distribution with shape parameter a and rate parameter one. We denote Z ∼ SGHN(σ, α, q). The objective of this paper is to propose an extension of the model proposed by Gómez et al. [12] using the "slash" procedure, utilizing a TPN (σ, λ) rv in the numerator. Thus, the new model, which we call the slash truncated positive normal (STPN), will become a direct competitor model for SGHN, since it creates heavier tails and, moreover, allows the fitting of atypical data.
The paper is organized as follows. Section 2 presents the pdf of the STPN distribution and some properties such as moments, the hazard function, and the kurtosis coefficient. Section 3 studies the inference for the proposed model. In particular, we discuss the moments estimator and the expectation-maximization (EM) [14] algorithm to find the maximum likelihood estimator. In addition, we offer the observed Fisher information using Louis' method [15]. Section 4 shows a simulation study to assess the recovery parameters. Section 5 conducts a real data application, where the STPN is compared with other proposals in the literature. Finally, Section 6 presents the conclusions of the manuscript.

The Slash Truncation Positive Normal Model
In this section, we describe the stochastic representation of the STPN model, its pdf, and some basic properties of the model.
By construction, the following models are particular cases for the STPN distribution: Figure 1 summarizes the relationships among the STPN and its particular cases.

Density Function
Proposition 1. Let Y ∼ STPN(σ, λ, q). Then, the pdf of Y is given by: where σ > 0 is a scale parameter, λ ∈ R is a shape parameter, and q > 0 is a parameter related to the kurtosis of the distribution.
Proof. Using the representation in (4) and computing the Jacobian of the transformation for Y = Z/U 1/q and W = U 1/q , we obtain: Marginalizing with respect to variable W, we obtain the density function corresponding to the rv Y, that is, An alternative way to obtain this pdf is by substituting u = yw σ − λ, obtaining: With t = u 2 2 in the last expression, we obtain:

Some Properties
In this section, we study some basic properties of the STPN distribution.

Proposition 2.
Let Y ∼ STPN(σ, λ, q). Then, the cdf of Y is given by: Proof. It is immediate from the definition.
Proof. The marginal distribution of Y can be computed as: With the transformation w = t 1/q , we obtain Equation (5).
Remark 1. Proposition 4 implies that for q → +∞, the pdf of the STPN distribution converges to the pdf of the TPN model. Proposition 5 shows that the STPN distribution can also be seen as a scale mixture of the TPN model. This property is very important for obtaining random values from this model and for the application of an EM-type algorithm to estimate the parameters of the model.

Moments
The following proposition provides the moments of the STPN distribution.
, then its first four moments are determined as follows: Proof. It is immediate from Proposition 6.
Remark 2. Proposition 6 shows that the moments of the STPN distribution depend essentially on the moments of the TPN distribution. Equations (6) and (8) show the effect of parameter q on the model; a lower value of q produces greater variance and kurtosis. Table 1 shows some values of the kurtosis coefficient of the STPN distribution for different values of λ and q.  Figure 3 shows the mean, standard deviation, asymmetry coefficient, and kurtosis coefficient for the STPN(σ = 1, λ, q) in terms of λ and q.

Inference
In this Section, we discuss a classical approach for the inference for the STPN distribution. In particular, we discuss the moments estimators and maximum likelihood (ML) estimation based on the EM algorithm.

Moments Estimators
The moments estimators result from the solution of the equation Replacing this, we have the following nonlinear equations These equations can be solved using different software. For instance, in R [17], we can use the nleqslv function to obtain the moments estimators λ M and q M . The moments estimator σ M is obtained by substitution in Equation (9).

Maximum Likelihood Estimation
Given y 1 , . . . , y n , a random sample from the STPN(σλ, q) distribution, the log-likelihood function for θ = (σ, λ, q) is given by: where: Deriving in relation to the components of θ, we obtain the following ML equations: For j > 0, we define: With those notations, the ML equations can also be written as: , the equations are equivalent to: The ML estimators can be obtained directly using numerical procedures. However, to increase the robustness of the procedure for obtaining those estimators, we also discuss an EM-type algorithm for estimation in the model.

EM Algorithm
The EM algorithm is a well-known tool for ML estimation in the presence of nonobserved (latent) data. For this particular problem, the algorithm takes advantage of the stochastic representation of the STPN model in Equation (4). Let W = U 1/q . The representation of the model can be seen as In this context, the STPN distribution can also be written using the following hierarchical representation: ∼ Beta(q, 1), i = 1, . . . n.
In our context, y = [y 1 , . . . , y n ] and w = [w 1 , . . . , w n ] represent the observed and nonobserved data, respectively. The complete data are given by y c = [y , w ]. We also denote c (θ|y c ) as the complete log-likelihood function, which up to a constant is given by: Note that Q(θ| θ (k) ) = E( c (θ|y)|y, θ = θ (k) ); the expected value of c (θ) provided the observed data is given by: do not have a closed form; they therefore need to be computed numerically. In short, the k-th step of the EM algorithm is detailed as follows: • E-step: For θ (k) = ( σ (k) , λ (k) , q (k) ) , the value for the vector of parameters at the k-step, compute w , for i = 1, . . . , n; • CM-Step I: Given λ (k) and w n , update σ as follows: • CM-Step III: Given log w 1 (k) , . . . , log w n (k) , update q as follows: The E-, CM-I, CM-II, and CM-III steps are repeated until an ad hoc criterion is satisfied.
For instance, we considered ( θ (k+1) ) − ( θ (k) ) < , for a fixed . In other words, the difference in the observed log-likelihood for successive steps is lower than a determined value.
The initial values for the algorithm can be obtained, for instance, using the σ M , λ M , and q M , moments estimators.

Observed Fisher Information Matrix
The variance of the estimators can be estimated based on the observed Fisher information matrix, say I(θ) = −∂ 2 (θ)/∂θ∂θ . In particular, we have that: where N 3 (0 3 , I 3 ) denotes the standard trivariate normal distribution. The computation of I(θ) is not trivial, because it involves the derivation of functions that depend on integrals.
Taking advantage of the complete log-likelihood function, I(θ) can also be approximated by Louis' method [14] as follows: The details of the components of I(θ) are provided in the Appendix A.

Computational Aspects
The EM algorithm and Louis' method to obtain the ML estimators and their standard errors for the STPN distribution are included in the tpn package [18] from R [17]. The following function can be used to obtain these results: est.stpn(y, sigma0=NULL, lambda0=NULL, q0=NULL, prec = 0.001, max.iter = 1000) where y is the response variable, sigma0, lambda0, q0 are the initial values for the algorithm (they are not defined by default), prec is the precision for the parameters, and max.iter is the maximum number of iterations to be applied for the algorithm. The tpn package also includes the functions dstpn, pstpn, and rstpn, which compute the pdf, cdf, and generation for the STPN distribution.

Simulation
In this section, we study the performance of the ML estimators using the EM algorithm for the STPN distribution under different scenarios. We considered two values for σ: 2 and 10; three values for λ: −1, 1, and 3; two values for q: −1.5 and 3; and four sample sizes: 50, 100, 200, and 500. For each combination of σ, λ, q and n (totaling 48 combinations), we drew 1000 replicates, and we used the tpn package to estimate the parameters based on the EM algorithm and estimated the standard deviations based on Louis' method to estimate the observed Fisher information matrix. Table 2 summarizes the mean of the estimated bias for the 1000 replicates (bias), the mean of the standard errors (SEs), the root of the estimated mean-squared error (RMSE), and the estimated coverage probability based on the asymptotic distribution for the ML estimator using a 95% confidence level (CP). Note that the bias and the RMSE terms are reduced when the sample size is increased, suggesting that the estimators are consistent even in finite samples. The SE and RMSE terms are closer when the sample size is increased, suggesting that the standard errors are also consistently estimated. Finally, the CP terms converges to the nominal value when the sample size is increased, suggesting that the asymptotic distribution of the ML estimators also works well in finite samples.

Application
In this section, we present a real data application in order to illustrate the performance of the STPN model in comparison with other proposals in the literature. For this, a comparison was conducted utilizing the TPN distribution and the model proposed by Gómez et al. [19], which is a generalization of a TPN model, denominated the generalized TPN (GTPN). The density function of the GPTN model is given by: with x > 0, σ, α > 0, and λ ∈ R. A real dataset of body fat was considered, which measured weight and various body circumferences (see http://lib.stat.cmu.edu/datasets/bodyfat (accessed on 8 October 2021)); for examination purposes, the weight variable (measured in pounds (lbs)) was chosen to conduct the application. When calculating basic statistics (Table 3 shows basic statistics), high kurtosis can be observed for the variable, suggesting the use of a distribution with heavy tails as the STPN.  Table 4 shows the estimated parameters for the three models considered. Based on the AIC [20] and BIC [21], the STPN model provides a better fit. In addition, Figure 4 shows the histogram for the data and the estimated pdf for all the models, where a better performance of the STPN model is shown. In order to check the better fit of the STPN model in comparison with the rest of the models, we also computed the quantile residuals (QRs). If the model is appropriate for the data, the QRs should be a sample from the standard normal model. This assumption can be validated with traditional normality tests such as the Anderson-Darling (AD), Cramér-von Mises (CVM), and Shapiro-Wilkes (SW) tests. Figure 5 suggests that the STPN model provides a better fit for this dataset.

Conclusions
This study presents a new distribution with positive support denominated the slash truncation positive normal. This distribution serves as a more general model compared to the TPN model, pursuing the increase of kurtosis in order to improve the modeling of positive databases with high kurtosis. The basic properties of the model were analyzed, and a simulation study was conducted implementing the EM algorithm. Finally, an application with real data was performed proving that the new model performs better than competing models. Data Availability Statement: The data used in Section 5 were duly referenced.

Conflicts of Interest:
The authors declare no conflict of interest.