New Regression Models Based on the Unit-Sinh-Normal Distribution: Properties, Inference, and Applications

: In this paper, two new distributions were introduced to model unimodal and/or bimodal data. The ﬁrst distribution, which was obtained by applying a simple transformation to a unit-Birnbaum–Saunders random variable, is useful for modeling data with positive support, while the second is appropriate for ﬁtting data on the (0,1) interval. Extensions to regression models were also studied in this work, and statistical inference was performed from a classical perspective by using the maximum likelihood method. A small simulation study is presented to evaluate the beneﬁts of the maximum likelihood estimates of the parameters. Finally, two applications to real data sets are reported to illustrate the developed methodology.


Introduction
The Birnbaum-Saunders (BS) distribution has been used principally for modeling the lifetime of certain structures under dynamic load, and it was introduced by Birnbaum and Saunders [1]. The probability density function (pdf) of the BS distribution is given by: where φ(·) is the pdf of the normal distribution and a t = 1 α t/β − β/t , where α > 0 is a shape parameter and β > 0 is a scale parameter. We use the notation T ∼ BS(α, β). The BS model has been extended to a large number of families of distributions. Castillo et al. [2], for example, introduced the epsilon Birnbaum-Saunders family of distributions based on the epsilon-skew-symmetric distribution, while Vilca-Labra and Leiva-Sánchez [3] proposed a new fatigue model from the skew-elliptical family of distributions. The new proposal is called the doubly generalized Birnbaum-Saunders distribution, and within its main properties, it is highlighted that the incorporation of the elliptical aspect allows the kurtosis to be flexible and that the skewness makes the asymmetry flexible. Martínez-Flórez et al. [4] introduced the asymmetric alpha-power extension of the BS model. A generalization referred to as the proportional hazard Birnbaum-Saunders distribution was studied by Moreno-Arenas et al. [5], which includes a new parameter that provides more flexibility in terms of skewness and kurtosis. The BS model also has been used in the study of linear regression models as in Rieck and Nedelman [6], where it was supposed that Y i = log(T i ) with T i ∼ BS(α, β) for i = 1, 2, . . . , n and the errors in the linear model have a sinh-normal (SHN) distribution with parameter vector (α, 0, 2) . Santos and Cribari-Neto [7] numerically evaluated the finite sample performances of the likelihood ratio, score, and Wald tests in the log-Birnbaum-Saunders regression model and introduced a RESET-like misspecification test for the proposed model by Rieck and Nedelman [6]. Furthermore, Balakrishnan and Zhu [8] discussed the maximum likelihood estimation of the model parameters under a log-linear link function for the BS lifetime regression model with equal and unequal shape parameters.
The pdf of the SHN model is given by: where α > 0 is a shape parameter, γ is a location parameter, and σ > 0 is a scale parameter. A random variable Y following the model in (2) is denoted by SHN(α, γ, σ).
The SHN model was extended by Barros et al. [9] by considering a Student t distribution for the errors. This proposed Student t log-BS regression model allows attenuating the influence of the outlying observations. Other extensions of the SHN model were considered by Leiva et al. [10] and Santana et al. [11].
Generalizations of the BS distribution to model data with support in the interval (0, 1) have also been considered by several authors. Mazucheli et al. [12] presented a type of BS distribution with support in the interval (0, 1), which became a new alternative to the beta and Kumaraswamy distributions. This new proposal is called the unit-Birnbaum-Saunders (UBS) model and has the pdf given by: where x ∈ (0, 1), α > 0 is a shape parameter and β > 0 is a scale parameter.
To explain response variables between zero and one, such as proportions or rates, alternative statistical models to the beta regression model were studied by Martínez-Flórez et al. [13]. The beta regression model is useful to study relations between variables where the response corresponds to rates, proportions, or indexes. Among the several studies related to the issue, we have Ospina et al. [14], Simas et al. [15], Rocha and Simas [16] and Cribari-Neto and Souza [17], among others. Recent applications of the beta regression model can be found in Ghosh [18], who developed the robust minimum density power divergence estimator and a class of robust Wald-type tests for the beta regression model. For the applications, the author considered data on health measurements of several athletes collected at the Australian Institute of Sport (HIV data) and data on anxiety, depression, and stress in non-clinical women in Australia (stress-anxiety data). On the other hand, Kim et al. [19] proposed control charts of mean and variance by using a copula Markov statistical process control (SPC) and a conditional distribution with diverse copula functions. The authors used beta regression to explain the behavior of the average run lengths of the control charts of conditional variance with data on Major League Baseball (MLB) batting average (BA) and earned run average (ERA) data from the 1998 to 2016 seasons. The main objective of this work is to introduce new families of distributions capable of modeling bimodal data with positive support or on the unit interval. The extension to the case of regression models is also studied.
The rest of this paper is organized as follows: Section 2 introduces the non-negative sinh-normal distribution, and its main statistical properties are studied in detail. The logsinh-normal regression model is also studied. In Section 3, the log-sinh-normal regression model is introduced, and its main properties are discussed. Section 4 presents the normal distribution, and its respective extension to the case of regression models is studied. In Section 5, a small Monte Carlo simulation study is presented. Finally, in Section 6, two real data applications are reported and compared with several rival models.

Non-Negative Sinh-Normal Distribution
In this section, a new non-negative distribution is introduced, which is obtained by extension of the UBS model. Let X be a random variable following a UBS distribution. If Y = − log(X), then the distribution of Y has positive support and is referred to as a non-negative sinh-normal (SHN) distribution. The pdf of the non-negative SHN model is given by: where φ(·) is the pdf of the standard normal distribution. The distribution in (4) can also be called log-unit-Birnbaum-Saunders (LUBS). One can see that a more general form of the non-negative SHN model is given by the pdf: where y > 0, α, γ, and σ are the parameters of the shape, location, and scale, respectively. This model is denoted by LSHN(α, γ, σ), and we refer to it as the log-sinh-normal model. The density function in Equation (5) integrates to one, and the proof of this can be seen in Appendix A. Figure 1 displays some forms of the pdf of the LSHN distribution for selected values of α, γ, and σ. One can see in Figure 1a that the LSHN density is unimodal for α ≤ 2, whereas for α > 2, the LSHN density is bimodal (see Figure 1b). This is a great result since it is possible to have a distribution for positive bimodal data.

Distribution Function, Survival Function, and Hazard Function of the LSHN Model
The cumulative distribution function (cdf) of the LSHN model is given by: where F SHN (·) is the cdf of the SHN distribution. It follows from (5) and (6) that the survival and hazard functions are given, respectively, by: and: where S SHN (·) and r SHN (·) are the survival and hazard functions of the SHN model. The graphs in Figure 2 show the form of the hazard function for some selected values of the parameters. The plots reveal that the LSHN density increases up to a certain value and then decreases to zero.

Moments of the LSHN Model
It can be shown that the r-th moment of the random variable Y following a LSHN(α, γ, σ) distribution is given by: where M Z (r) is the moment-generating function (mgf) of the random variable with the SHN distribution. Following some results found by Rieck [20], we have that: where a = rσ+1 2 , b = rσ−1 2 , and k λ (·) is the third-order Besser function defined by: For the special case of σ = 2 (the LUBS model), one can prove that: and: Var(Y) = e 2γ α 2 (5α 2 + 4) 4 .
From the above results, it can be concluded that the LSHN distribution can be obtained by applying the transformation Y = e Z to a random variable Z ∼ SHN(α, γ, σ).

Cumulant-Generating Function and Mode
Let Y = − log(X) with X ∼ UBS(α, β), then the random variable Y has an LSHN distribution. It follows that: Letting r = −t, for t < 0, and following Mazucheli et al. [21], we have: Now, the cumulant-generating function (cgf) is given by: where K j (Y) is the j-th moment of the random variable Y. We have that, The modes of the LSHN distribution can be obtained by maximizing the logarithm of the pdf. Thus, let in the logarithm of the pdf of the LSHN model; taking the derivative and setting the resulting derivative equal to zero, it is obtained that the mode (or modes) of the pdf of the LSHN distribution is (are) the solution(s) of the non-linear equation: Solving this non-linear equation, the mode(s) of the LSHN distribution is (are) found.

The LSHN Regression Model
Regression models have been a statistical technique widely used in many areas of knowledge to explain the behavior of a response variable, say Y, as a function of other variables called explanatory variables, say X 1 , . . . , X p , and a vector of unknown parameters called regression coefficients, which is denoted by θ. In this section, the LSHN linear regression model is introduced by considering a random sample of variables Y i , such that: for i = 1, . . . , n, with x i = (X i1 , . . . , X ip ) and θ = (θ 1 , . . . , θ p ) . In this case, we suppose the functional relationship: where the random variables ε i ∼ SHN(α, 0, σ) for i = 1, . . . , n.
The functional relationship in Equation (11) is justified below from Theorem 1.
Proof. Consider X ∼ UBS(α, β), and let Y = X 1/c , then X = Y c and dX/dY = cY c−1 ; thus: To construct the model, we considered a random sample X 1 , X 2 , . . . , X n , such that X i ∼ UBS(α i , β i ) for i = 1, 2, . . . , n; and we supposed that X i = f (Z 1 , Z 2 , . . . , Z p ) and , and then: It can be seen from the previous result that the regression model given in (10) generalizes the obtained model from Theorem 1.

Maximum Likelihood Estimation in the LSHN Regression Model
To get the estimates of the parameters in the LSHN regression model, we considered the maximum likelihood method. Thus, given a random sample of size n, say , the log-likelihood function for the parameter vector ϕ = (θ , α, σ) can be written as follows: where After taking partial derivatives of the log-likelihood function (12) with respect to the parameters of interest and setting them equal to zero, we obtain the following score equations: where The maximum likelihood estimators for θ 1 , . . . , θ p , α and σ, are the solutions to the equations U(θ j ) = 0 (j = 1, . . . , p), U(α) = 0, and U(σ) = 0, which require a numerical method, such as the Newton-Raphson or quasi-Newton.

Observed and Expected Information Matrix
The elements of the observed information matrix J(ϕ) for the parameter vector ϕ = (θ , α, σ) , which are denoted by j ϕ j ϕ k with ϕ j ∈ (θ 1 , . . . , θ p , α, σ) , can be obtained by calculating the second partial derivative of the log-likelihood function (12), i.e., j ϕ j ϕ k = −∂ 2 (ϕ; Y)/∂ϕ j ∂ϕ k . These elements are given by: The previous results are similar to those obtained by Rieck and Nedelman [6]. The elements of the expected information matrix, I(ϕ), defined as n −1 times the expected values of the elements of the observed information matrix, are denoted by i θθ , i αθ j , i αα , i σθ , i σα , and i σσ . Following Rieck and Nedelman [6], we make: Then, the following elements of the I(ϕ) matrix are obtained: where: One can be show that det(I(ϕ)) = 0, that is the information matrix is non-singular, which guarantees the existence of the covariance matrix of the maximum likelihood estimators. The Fisher information matrix is given by Var(ϕ) = I −1 (ϕ). The existence of I −1 (ϕ) also guarantees that the vector of maximum likelihood estimators has asymptotic distribution: that is the maximum likelihood estimators of the model parameters are consistent and asymptotically follow a normal distribution with the covariance matrix being the inverse of the Fisher information matrix. The approximation N p+2 ϕ, n −1 I −1 (ϕ) can be used to construct confidence intervals for the parameters ϕ j . These confidence intervals are given by:φ j ± z 1−α/2 × se(φ j ), where se(φ j ) corresponds to the square root of the r-th diagonal element of the matrix I −1 (ϕ) and z 1−α/2 denotes the 100(1 − α/2) quantile of the standard normal distribution.

Unit-Sinh-Normal Distribution
Now, we introduce the SHN model with support on interval (0, 1), which is denominated by the unit-sinh-normal model, and it is denoted by Y ∼ USHN(α, γ, σ). The pdf is given by: where y ∈ (0, 1), α > 0 is a shape parameter, γ ∈ R is a location parameter, and σ > 0 is a scale parameter. It can be seen in the complement of the sinh and cosh functions that the density function in (16) is defined on the log-log complementary transformation, which is widely used in generalized linear models. Although the density (16) could be defined from the log-log link function, we used the log-log complement link function. Note that, if y ∈ (0, 1), then (1 − y) ∈ (0, 1), and the simple transformation Z = 1 − Y leads to the model with the log-log link function. Figure 3 displays some plots of the pdf of the USHN distribution for some selected values of the parameters. The plots reveal that the USHN density is unimodal for α ≤ 2 (see Figure 3a), and the density function is bimodal for α > 2 (see Figure 3b). One of the advantages of the USHN distribution is that it can be used for modeling data sets of proportions and rates with bimodal behaviors.

Distribution Function, Survival Function, and Hazard Function of the USHN Model
Is easy to see that the corresponding cdf of the random variable Y ∼ USHN(α, γ, σ) is given by: where F SHN (·) is the cdf of the SHN(α, γ, σ) distribution. The survival function S USHN (t) and hazard function r USHN (t) are given by: and: respectively, where S SHN (·) and r SHN (·) are the survival function and hazard function of the SHN model, respectively. From (17), it is concluded that: SHN(α, γ, σ). Figure 4 shows the behavior of the hazard function of a USHN random variable for some selected values of the parameters. The graphs reveal that the hazard function is increasing up to a certain value and then is decreasing to zero.  One can see that a random variable Y following a USHN(α, γ, σ) distribution can be generated by using the expression: where U ∼ U(0, 1) denotes the uniform distribution on the (0, 1) interval and Φ −1 (·) refers to the inverse of the cdf of the standard normal distribution.

Moments of the USHN Model
The r-th moment of a random variable Y following a USHN(α, γ, σ) distribution is given by: where X ∼ LSHN(α, γ, σ). Using the Taylor expansion for e −jX and from the r-th moment of the LSHN(α, γ, σ) distribution, it follows that: where a 1 = lσ+1 2 and b 1 = lσ−1 2 , with k a (·) the third-order Besser function defined in (7).

Cumulant-Generating Function and Mode
The mgf of the USHN model can be obtained by using: where Z ∼ UBS(α, β) and r = −t, with M Z (r) being the mgf of the UBS distribution. Thus, where K Z (r) is the cgf of the UBS distribution. To find the mode of the USHN distribution, we reasoned in the same way as in the LSHN model. Then, let ξ * . Deriving the logarithm of the pdf of the USHN distribution, substituting ξ * 1 and ξ * 2 , and equaling to zero, we obtain the non-linear equation: Solving this non-linear equation, the mode(s) of the USHN distribution is (are) found.

Asymptotic Distribution
Let Y ∼ USHN. One can prove that random variable converges to a normal distribution when α tends to zero. Thus, if Y ∼ USHN(α, γ, σ), then: It follows from the result above that:

The LUSHN Regression Model
Now, we introduce the LUSHN linear regression model. We considered a set of p explanatory variables, which are denoted by x i = (x i1 , . . . , x ip ) , and a p-dimensional vector of unknown parameters θ = (θ 1 , . . . , θ p ) , such that, for i = 1, . . . , n, it follows the functional relationship: where ε i ∼ SHN(α, 0, σ). From (18), we have that, hence, E(Z i ) = x i θ. Thus, Z i = x iθ , and therefore, To obtain the estimates of the model parameters, we considered the maximum likelihood method as in the LSHN regression model. Thus, given a random sample of size n, say Y = (Y 1 , . . . , Y n ), the log-likelihood function for the parameter vector ρ = (θ , α, σ) is given by: where: The score function and the observed information matrix of the LUSHN regression model have the same form as the respective expressions of the LSHN regression model by substituting log(Y i ) by log(− log(1 − Y i )) and by defining: . . , n. The MLEs for θ, α, and σ, are the solutions to the equations U(θ j ) = 0 (j = 1, . . . , p), U(α) = 0 and U(σ) = 0, which require a numerical method such as the Newton-Raphson or quasi-Newton.

Simulation Study
To analyze the behavior of the estimators of the parameters in the LSHN regression model, we carried out a small Monte Carlo simulation study. To generate the random variable USHN, we applied the described algorithm in this paper. In this simulation study, we analyzed the behavior of the estimators of the model parameters: where ε i ∼ SHN(α, 0, σ). The values of the explanatory variable X were taken from a uniform random variable on the (0,1) interval, that is X i ∼ U(0, 1). Without loss of generality, we took the value of the scale parameter equal to σ = 1.0; however, the following results can be obtained for any value of the scale parameter from the simple transformation ε i = σν i with ν i ∼ SHN(α, 0, 1). The values of shape parameter were taken as α = 0.50, 0.75, 1.25, 1.75, 2.25, 2.75 to take into account different configurations in the form of the pdf of the random variable ε i . On the other hand, since the coefficients θ i , i = 0, 1 in the model (18) can be any number in the set of real numbers and there are no restrictions on the values that can be assumed, we took the particular values θ 0 = 0.75 and θ 1 = 0.25. To analyze some statistical measures of the maximum likelihood estimator (MLE), we considered small, moderate, and large sample sizes: n = 10, 25, 50, 75, 100, 200, 500, and 5000 iterations were performed for each scenario. The studied characteristics were: the relative bias (RB), the root of the mean squared error (RMSE), and the ratio between the standard deviation (SD) of the estimate and the average SD (RSD). Finally, we examined the coverage probability (CP) of the 95% confidence interval based on the asymptotic normality of the ML estimators. Tables 1 and 2 present the results of the simulation study. It can be observed that the RB and RMSE of the MLEs tend to decrease when the sample size increases, which guarantees the unbiasedness and asymptotic consistency of the MLE. It is also observed that, for small sample sizes, important biases are obtained in the estimates of α and θ 1 . It is also observed that, for small sample sizes, important biases are obtained in the estimates of α and θ 1 . Another interesting aspect to take into account is that, for values less than one of the α parameter, the bias of σ is quite considerable for small sample sizes; however, this bias is quite negligible for values of α above of one.
Regarding the coverage rates of the confidence intervals (CP), the simulation results showed that these were higher than 95% for the parameter α in all of the considered sample sizes. For the scale parameter σ, the CPs were low when there were small sample sizes (less than 50), and they tended to increase when the sample size increased to around 90 %. It was also observed that the CPs for the coefficients θ 0 and θ 1 were close to 95 % for moderate and large sample sizes (greater than 75). Table 1. Empirical relative bias (RB), root of the mean squared error (RMSE), ratio between the standard deviation of the estimate and the average standard deviation (RSD), and coverage probability (CP) of the 95% confidence interval for the MLEs of the α and σ in the LUBS model.   Table 2. Empirical relative bias (RB), root of the mean squared error (RMSE), ratio between the standard deviation of the estimate and the average standard deviation (RSD), and coverage probability (CP) of the 95% confidence interval for the MLEs of the θ 0 and θ 1 in the LUBS model.

Applications
To illustrate the potentiality of the proposed distributions, we considered two data sets of real-life examples taken from the literature. The first data set was an example of positive data called fatigue data in hardened steel. The second data set corresponded to data of the body fat data in athletes of the Australian Institute of Sport (AIS), which is an example of the observations on the (0, 1) interval.

Fatigue Data
This data set consisted of failure times (T) in rolling contact fatigue of ten hardened steel specimens tested at each of the four values of four contact stress points, X. The data were obtained using a four-ball rolling contact test rig at the Princeton Laboratories of Mobil Research and Development Co. This data set was analyzed by Chan et al. [22] by considering the regression model: We considered that the positive response variable T followed the distribution: For this data set, we fit the log-BS (LBS) model, the log-skewed BS (LSBS) of Lemonte [23], and the proposed LSHN distribution. The MLEs of the parameters of the fitted models are given in Table 3. To compare the fitted models, we used the AIC and BIC criteria, which are given by: where p is the number of parameters of the model in question and n is the sample size. The best model is the one with the smallest AIC or BIC. According to AIC and BIC criteria in the table, we can see that the asymmetric models LSBS and LSHN fit better than the LBS model, that is the data present a larger degree of asymmetry than allowed by the BS model. We can conclude that the regression model with the LSHN error distribution provides a better fit than the regression model with the LSBS error distribution. The significance of the variable log(x i ) on the response variable T i can be tested through the Wald statistic,θ 2 /se(θ 2 ), which gives the value −12.618/1.371 = −9.203501 with the respective p-value < 0.0001, in such a way that the logarithm of contact stress points affects the failure time of the hardened steel.
We recall that if Y i ∼ LSHN(α, x i θ, σ), then: for i = 1, . . . , n. Figure 5b plots the envelope of the random variable Z i . The plot reveals that the LSHN regression model presents a good fit to the Fatigue data. The plot in Figure 5a depicts the envelope for the log-BS model regression.

Body Fat Data
We considered the data set included in the library sn of R Development Core Team [24] available for download at http://azzalini.stat.unipd.it/SN/index.html (accessed on 20 March 2021). We considered only the data of 37 rowing athletes in the AIS dataset. We were interested in the prediction of the body fat percentage (Bfat) of each athlete by considering their lean body mass (lbm). For the analysis, we considered the random variable: where Y i is the body fat percentage of the i-th athlete for i = 1, . . . , 37. We also fit the beta regression model with logit link and the natural logarithm link to model the dispersion parameter. The MLEs of the parameters and their corresponding standard errors (in parenthesis) were: for the beta regression model,θ 1 Figure 6 plots the envelope of the variable Z i = 2 α sinh log(− log(Y i ))−θ 1 −θ 2 X i σ ∼ N(0, 1) in which we can see that the LUSHN regression model presents a good fit to the body fat data.

Conclusions
In this paper, two new families of bimodal distributions were introduced. The new families were generated by applying transformations to the unit-Birnbaum-Saunders and were very useful alternatives for modeling data limited on the interval (0,1) or with positive support, due to their flexibility to fit data with a high degree of asymmetry and/or kurtosis. The main statistical properties of the families and the problem of the parameter estimation were studied in detail by using the maximum likelihood method. The observed and expected information matrix for the family was also deduced. A small Monte Carlo simulation was carried out, showing that the maximum likelihood estimators had good asymptotic properties for moderate and large sample sizes. Extensions to regression models were also presented based on the new family of distribution. Furthermore, we showed that such families of distributions can be useful to fit better to real data sets, especially when the variables are considered to explain the response variable in a regression model.
Proposition A1. The density function in Equation (5) integrates to one.