Abstract
In this paper, we propose a linear regression model in which the error term follows a log-gamma-normal (LGN) distribution. The assumption of LGN distribution gives flexibility to accommodate skew forms to the left and to the right. Kurtosis greater or smaller than the normal model can also be accommodated. The regression model for censored asymmetric data is also considered (censored LGN model). Parameter estimation is implemented using the maximum likelihood approach and a small simulation study is conducted to evaluate parameter recovery. The main conclusion is that the approach is very much satisfactory for moderate and large sample sizes. Results for two applications of the proposed model to real datasets are provided for illustrative purposes.
Keywords:
log-gamma-normal distribution; linear regression models; asymmetric data; censored data; maximum likelihood estimators MSC:
60E05
1. Introduction
Regression models are one of the main statistical techniques frequently used in data analysis in any area of knowledge, especially when there is interest in studying the relationship between a dependent variable (response) and two or more independent (explanatory) variables. In this sense, a regression model with a response variable following a normal distribution is perhaps best known in the literature, and could be considered one of the most widely used; however, the assumption of normality may not be adequate in the dataset under analysis, since these may present degrees of skewness or kurtosis that are not within the range covered by the normal model. Consequently, inferences made from the fitted model may not have statistical validity, and erroneous conclusions may be reached. A solution to the problem of the assumption of normality of the variable of interest is the use of transformations, although it is well known that this solution makes it difficult to interpret results since data are not in the original measurement scale. As an alternative to this issue, many authors have introduced new family distributions that are capable of capturing degrees of skewness and kurtosis greater than those that the normal distribution can capture.
One of the most important works in the context of data with a high degree of asymmetry is Azzalini [1], which is known in the statistical literature as the skew-normal (SN) model. The main characteristic of the SN model is its ability to fit degrees of asymmetry (on the left and right) greater than those of the normal model; however, it is not the best model in terms of capturing high degrees of kurtosis. Relating to the latter, the power-normal (PN) model introduced by Durrans [2] has the particularity of fitting data with a higher degree of kurtosis than the normal and SN model but with less range of asymmetry. The SN and PN models have been studied extensively by many authors, and different extensions of this model have been considered. In Gupta and Gupta [3] for example, the authors showed the existing practical problems when the asymmetry parameter of the SN model is estimated, and they proposed an alternative model named the PN model. The authors also investigated the closeness between the proposed model and the SN model. In Pewsey et al. [4], the authors presented the general results of the likelihood-based inference for the family of power distributions, with particular emphasis on the case of the PN model, complementing the work of Gupta and Gupta [3]. In Martínez-Flórez et al. [5], the authors introduced a new model that generalized both the SN model by Azzalini [1] and the PN model by Durrans [2]. The new model, which is called power-skew-normal (PSN), has the particularity of fitting data with degrees of asymmetry greater than those of the SN model, and is also capable of capturing degrees of kurtosis greater than those of the PN model. Furthermore, the authors showed that the information matrix of the new model is non-singular, which permits carrying out hypothesis tests on the asymmetry parameters based on likelihood-ratio statistics. On the other hand, Martínez-Flórez et al. [6] generalized the log-normal (LN) model from the SN and PN models. In addition, these new proposals contain the LN model as a particular case, and they are more flexible regarding skewness and kurtosis to fit positive data.
Alternatives for fitting asymmetric data with a high degree of kurtosis were reported by other authors such as Tovar-Falón et al. [7], who introduced a new model that generalizes the skew-t model of Azzalini and Capitanio [8] and power-t of Zhao and Kim [9]. Here, the inference was carried out from a classical perspective using the maximum likelihood method. This new model also has, as particular cases, the PSN, SN, PN, Student-t and normal models. In Tung et al. [10], the authors considered a mixture class of log-F distributions to characterize asymmetric distributions by integrating it into a pH acceleration model. The authors studied the impact of the new model in the presence of misspecification of particle size distribution.
Models for asymmetric data with high degrees of skewness and kurtosis, and presenting more than one mode and censored data, were also considered. More details about these topics can be found in Martínez-Flórez et al. [11] and Martínez-Flórez et al. [12], respectively. In addition, all the aforementioned models are easily extensible to the situations of regression models, including cases in which the data show censoring in some value; see Sahu et al. [13], Martínez-Flórez et al. [14].
In Amini et al. [15], the authors introduced a new family of distributions useful for modelling asymmetric data. This new family of continuous distributions is generated by a distribution F and two positive real parameters and , which control the skewness and tail weight of the distribution. The probability density function (PDF) of this family is given by
where and is the complete gamma function, is the cumulative distribution function (CDF) of X, and is the associated PDF. In this work, the authors studied the main properties of the distribution and addressed the estimation process of the unknown parameters of the model using the likelihood approach.
From the generator, the authors studied some particular properties of the family, among which are the exponential, Weibull, power, Pareto, extreme value and Gumbel distributions. If and and in model (1), i.e., the CDF and PDF of the standard normal distribution, respectively, the model in (1) is reduced to the PN model by Durrans [2]. Hence, the model in (1) is an extension of the PN model. In Cordeiro et al. [16], the authors studied in detail the properties of the log-gamma-generated family of distributions introduced by Amini et al. [15] and presented some applications of this family. Other particular cases of the model introduced by Amini et al. [15] correspond to the generalized gamma and log-gamma distributions, which have been extensively studied by many authors; see Prentice [17], Lawless [18], Young and Bakir [19], Ortega et al. [20, 21] among others.
The main goal of this article is to focus on the study of the regression model under the assumption that the errors follow a log-gamma-normal (LGN) distribution, which is obtained by taking and in model (1). We also consider the case of the regression model for censored data, and we conduce the parameter estimation using the maximum likelihood approach and its large sample properties.
Although there are many works in the literature related to the generalized log-gamma distribution, our proposal is based on the family of distributions presented by Amini et al. [15], which is known in the literature as log-gamma-generated. For the case of this family, we focus on the case in which the generating function is the normal distribution and the distribution called log-gamma-normal is obtained, and this distribution does not correspond to the distributions previously mentioned. In our proposal, we change the assumption that the errors in the multiple linear regression model follow a normal distribution to that of errors with a log-gamma-normal distribution. It is also important to note that the generalized log-gamma and log-gamma distributions are also particular cases of the family introduced by Amini et al. [15] (assuming a gamma distribution instead of the standard normal distribution), but in our proposal we do not consider these cases.
In addition to carrying out the estimation process of the parameters in the model, we present two applications using real datasets. The first dataset was previously analyzed by Zhang and Davidian [22], and the second dataset is related to a study on the abundance of beryllium scaled to the Sun’s abundance. For the particular case of these datasets, the model fits well, and therefore we can conclude that, apart from the existence of statistical literature for the analysis of asymmetric data, our proposal is a viable alternative that competes with existing models. The main contribution of this model is that the trend of the dataset under examination is better explained using a model with log-gamma type errors instead of one with asymmetric errors using another distribution.
The article is organized as follows. In Section 2, we define the family of LGN distributions and discuss some of its properties. In Section 3, the LGN regression model is defined, and its properties studied. The inference is implemented using the maximum likelihood approach. The censored LGN model for dealing with censored data by maximum likelihood estimation is discussed in Section 4. The results of a small-scale simulation study reveal the good performance of the estimation approach in Section 5. In Section 6, two real data applications are considered, revealing that the datasets in question are better fitted by LGN model than PN and models.
2. Log-Gamma-Normal Distribution
In this section, we define the LGN model, which is obtained from the family given in (1) by taking the CDF of the standard normal distribution, and we study some basic properties.
Definition 1.
The random variable X is said to have a LGN distribution, if X has PDF given by
where , is the gamma function, and the functions and are the PDF and CDF of the standard normal distribution, respectively.
A random variable with LGN distribution is shortly denoted by . One can note that the function (2) is a proper PDF since for all and . Thus, letting , it follows that
Figure 1 depicts some shapes of LGN distribution for some selected values of the parameters and . It can be seen that the parameters and affect both the skewness and kurtosis of the model, and hence, the LGN distribution is more flexible for fitting data that may be skewed as well as having thinner or thicker tails than the normal, SN and PN distributions.
Figure 1.
PDF of the LGN distribution: (a) and (dotted line), 0.60 (dashed line), 1.0 (dotted–dashed line), 3.0 (long dashed line) and 6.0 (solid line). (b) and (dotted line), 0.60 (dashed line), 1.0 (dotted–dashed line), 2.0 (long dashed line) and 5.0 (solid line).
The LGN distribution reduces to some specific distributions as special cases for specified values of the parameters and ; some of them are available in the literature and have been widely studied.
Proposition 1.
Let
- (i)
- if, then,
- (ii)
- if, then,
- (iii)
- ifand, then.
Proof.
Demonstration of (i)–(iii) is immediate from the definition of LGN distribution. □
2.1. Moments
Measures of skewness and kurtosis can be given from the moments of the LGN distribution. The following proposition gives an expression of the rth moment of the random variable which does not have a closed form.
Proposition 2.
Letthen
whereis the inverse of the CDFand the random variable W follows a gamma distribution with parameters δ and γ.
Proof.
We have by definition that
Letting , then , it follows that
which is the expected value of the function , where W follows a distribution. □
Based on moments (3), one can obtain the skewness and the kurtosis coefficients of the LGN model using the following expressions
and
respectively, where for . The skewness and kurtosis coefficients for values of and ranging between 0.1 and 200 were calculated using numerical integration with an integrate function of R Development Core Team [23] for LGN model. It was found that and . The given intervals contain the corresponding intervals of skewness and kurtosis coefficients of the SN model, which are and , respectively, and the PN model, which are and , respectively. More details can be found in Pewsey et al. [4]. The previous results illustrate the fact that the LGN model contains models with greater (and smaller) asymmetry degree than both the SN and PN models.
2.2. Distribution Function
In this section, we present the explicit formula for the CDF of LGN distribution.
Proposition 3.
Let, then
whereis the gamma function andis the upper incomplete gamma function.
Proof.
The CDF of the LGN distribution is obtained as follows:
□
It can be shown that (see Cordeiro et al. [16]) for the density function given in (2), the quantile function is given by:
where is the inverse function of .
The inversion method can be used to generate a random variable with LGN distribution. Thus, let and U be a random variable with uniform distribution, namely . Then, the random variable X with distribution can be obtained by letting
where and are the inverses of the CDF of the normal and gamma distributions, respectively. The survival and hazard functions for the LGN distribution can be obtained from (2) and (4), and they are given by
and
respectively.
2.3. Location-Scale Extension
Let . The location-scale extension of the random variable X is defined using the transformation , where and . The corresponding PDF of Y is given by
where is a location parameter and is a scale parameter. The random variable Y that has a distribution with density function given in Equation (5) is denoted as .
The previous representation of location scale can be extended to the case where response variable depends on regressor variables, say , through the relationship ; where is an unknown vector of regression coefficients and is a vector of known regressors correlated with the response vector.
The rth moment of a variable can be obtained using the formula
where .
Proof.
Let , then, for and it has
In the second line, the binomial theorem is used. □
3. Log-Gamma-Normal Regression Model
Regression models have been a statistical technique widely used in many areas of knowledge to explain the behavior of a response variable, say Y, as a function of other variables called regressors, say , and a vector of unknown parameters called regression coefficients denoted by . Specifically, for a random sample of n individuals indexed by , we have
where is a random variable (random error) with certain PDF, the most common being the normal distribution assumption, i.e., . Given the multiple departures from the normality assumption and the actual behavior of the random variable , this assumption has been replaced in numerous instances by other more realistic ones, usually looking for distributions to fit data with higher or lower skewness and/or kurtosis than that allowed by the normal distribution. Notable inferential mistakes are made (invalid results) when we work under the normal assumption and this assumption is not true. In some cases, a simple transformation helps to solve this problem, but this strategy typically has problems of interpretability of the results or the coefficients of the model.
Now, we change the normal assumption using the LGN assumption in the random error term , so we suppose that and this leads to for . The case follows the ordinary normal regression model. Using the least squares method, we obtain the estimators , which are not unbiased for the parameters of the regression coefficients but the correction , where the last term represents the estimated expected value of the random variable , such that we can obtain unbiased estimators of the parameters.
Estimation Using Maximum Likelihood Method
We initially define some quantities: is a matrix where rows correspond to observations for the ith individual for p independent variables; is a vector corresponding to responses for the ith individual; and is an unknown vector of regression coefficients. Thus, given a random sample of size n, say , where for ; the log-likelihood function for the vector can be written as follows:
After taking the first partial derivatives of the log-likelihood function (7) regarding the parameters of interest and setting them equal to zero, we obtain the following score equations:
where , and with and ; and for ; is the identity matrix of order n, and is the digamma function.
The elements of the observed information matrix for the parameter are easily computed by taking second partial derivatives, obtaining:
where and with ; , , and with , , for , is the trigamma function and with for .
The Fisher information matrix can be obtained numerically, calculating times the expected value of the observed information matrix. When , we obtain the case of the normal distribution for the random variable . Using numerical approximation, the determinant of the Fisher information matrix is , where denotes the determinant function of a matrix and denotes the mean in the sample of the variable . Thus, the determinant of the information matrix is different to zero, and the information matrix is non-singular, ensuring the conditions to apply asymptotic approximation to the normal distribution of the maximum likelihood estimator vector of . Here, the covariances matrix of is the inverse of the Fisher information matrix, i.e., .
Approximation can be used to construct confidence intervals for , which are given by , where corresponds to the rth diagonal element of the matrix and denotes quantile of the standard normal distribution.
4. Censored LGN Model
Models for censored data are common in economic research, medicine, biology, and survival analysis. Usually, this type of data is analyzed using the Tobit model (see Tobin [24], also known as censored normal model (CN)). In some cases, the tails of the distribution of the random errors are more or less heavy than the tails of the normal distribution, consequently showing that the Tobit model does not estimate the probability in the censored part very well, and this leads to bad estimates. In these cases, it must be assumed that another distribution to model errors, especially in the case of asymmetric errors, can work with the power-normal Tobit model (PNT) (see Martínez-Flórez et al. [25]), the censored SN model, or any other model that fits the degree of asymmetry and the kurtosis of the errors in the model. We now extend the LGN regression model to the censored data, which we will call the censored LGN regression model (CLGN).
Censored LGN Variable
Consider a random variable and let be a random sample of size n of . Let T be a value of censorship for the variable. The CLGN random variable Y is defined as
for . We use the notation . Consequently, the probability mass at the value T is , where . For , the distribution of the variable Y is . Although the formulation above the threshold T is not null, it can be transformed back to zero by taking Hence, there is no loss of generality in taking .
When we have regressor variables, say , through the relationship , where is an unknown vector of regression coefficients, and is a vector of known regressors correlated with the response vector, we have a CLGN regression model defined by the random variable , with , i.e.,
For a sample of size n, , where for ; the log-likelihood function for the vector is given by
where and denote the sum in the censored part and uncensored part, respectively; and .
Special cases from model (12) occur when , so the Tobit model follows (see Tobin [24]) and with the Tobit PN model follows (see Martínez-Flórez et al. [25]). The parameters estimation can be performed by the maximum likelihood method, i.e., by maximizing the function , whose solution using iterative numerical methods leads to the maximum likelihood estimator (MLE) of the model.
5. Simulation Study
To study the performance of the MLE of parameter vector , we conducted a Monte Carlo simulation study with small and moderate samples. In the study, we generated 5000 samples of sizes 50, 100, 200 and 500, and we considered the LGN model. The following parameter values were taken: , 1.50; and we took .
We considered a linear model with a single covariate Z whose values were generated according to a uniform distribution . We also took errors . To evaluate estimators performance for point estimates we considered the bias , the relative bias (RB) defined as (absolute value of bias / true parameter value) and the square root of the mean squared error , which is the mean over all samples of the squared bias plus the variance. Maximum likelihood parameter estimates were computed using the optim function in statistical package R Development Core Team [23].
Table 1 and Table 2 present the results of the simulation study. It can be seen from the table that the RMSEs of MLEs for , , , and decreases as sample sizes increase, which is expected since estimators are consistent. The relative bias of the MLEs also decrease as sample sizes increase. The MLEs of are unstable because this parameter is affected by the asymmetry parameter; however, its MLE becomes more stable as the sample size becomes larger. It can also be seen that when the parameter increases, the bias of the MLEs of the , and is larger. The main conclusion is that we are quite safe working with the MLEs if sample sizes are greater than 100.
Table 1.
Performance evaluation for the MLE of , , , and under LGN model for and .
Table 2.
Performance evaluation for the MLE of , , , and under LGN model for and .
6. Real Data Applications
6.1. Application 1
We consider a dataset related to longitudinal data on cholesterol levels collected as part of the famed Framingham heart study. The file includes information for randomly selected individuals, reported in Zhang and Davidian [22]. The considered variables were the cholesterol level , the age of the individual at baseline and the gender indicator () ( female, male). For this application, we take only the observations in the second period of time of the measurement . Table 3 presents the summary statistic, including measures of skewness and kurtosis for cholesterol data. Clearly, the values of the skewness and kurtosis for cholesterol data justify using an asymmetric model, the PN, SN or LGN model.
Table 3.
Summary statistics for cholesterol levels for 176 subjects of the Framingham cholesterol study.
A model with errors following a normal distribution was fitted, and it was found that the Shapiro–Wilk normality test gives a value of the test statistic with p-value = , so the normality of the errors is rejected. We fitted linear regression models by assuming errors following an asymmetric distribution, namely SN, PN and LGN distributions. For estimating parameters in the considered models, we use the optim function available in R Development Core Team [23].
Table 4 presents the MLE for the estimated parameters of the fitted models. We took the obtained estimates from the normal model using the function lm R Development Core Team [23] as the initial values. For and (and some cases ) we took the obtained estimates under the SN, PN and LGN location-scale models fitted to Y variable. From the table, the age at baseline variable () is not significant and the cholesterol level depends solely on the gender in normal, PN and SN models. For the LGN model, it follows that the cholesterol level depends on the sex and the age of the individual at the baseline.
Table 4.
Estimates and standard error (SE) for normal, PN, SN and LGN linear regression models fitted to cholesterol data.
The considered linear model was
To compare the normal, PN and LGN models, which are nested models, we used the AIC, by Akaike [26], AICc (corrected Akaike information criterion), and BIC (Bayesian information criterion) by Schwarz [27], which are written as
where k is the number of unknown parameters in the considered model. The best model is the one with the smallest AIC or AICc or BIC.
Using the Normal, PN, SN and LGN distributions, the scaled residuals are evaluated and presented in Figure 2 and Figure 3.
Figure 2.
Histogram for scaled residuals for (a) Normal model, (b) PN model, (c) SN model, and (d) LGN model fitted to the cholesterol data.
Figure 3.
QQplots for (a) Normal model, (b) PN model, (c) SN model, and (d) LGN model fitted to the cholesterol data.
The normality assumption for errors can be tested by the hypothesis
using the likelihood-ratio (LR) statistics, , which for the dataset under study, leads to so that p-value , with strong indication against the null hypothesis.
Similarly, the assumption of PN distribution for the errors can be tested by the hypothesis
using the LR statistics, , which leads to so that p-value , with strong indication against the null hypothesis.
Table 5 presents the AIC, AICc and BIC criteria for the normal, PN, SN and LGN models. Please note that according to these criteria, the model that best fits the dataset is the SN, since it has a lower value of AIC, AICc and BIC, followed by the LGN model. However, we remember that the SN model presents a singular information matrix when the asymmetry parameter is zero, and therefore, hypothesis tests about the model parameters using likelihood-ratio statistics are not feasible from the theory of large samples; for example, for testing the significance of the asymmetry parameter in the SN model. This constitutes a disadvantage related to the LGN model, for which it was shown that it has a non-singular information matrix. In addition, as mentioned in Section 2, the LGN model has higher ranges of asymmetry and kurtosis than the SN model, so in practice it may be preferable in certain situations.
Table 5.
AIC, AICc, and BIC for normal, PN and LGN linear models.
This discussion illustrates that the final selection of a model is often simply a matter of choice. The LGN model can be considered appropriate if we want to use a model with which we can carry out hypothesis tests about the parameters, especially those associated with skewness and kurtosis in the model. In any case, the final choice must be duly justified.
For non-nested models, we used a generalized LR statistic test studied by Vuong [28]. This test was derived to compare competing models that are strictly non-nested. Since and are two non-nested models, and two densities corresponding to these non-nested models, the LR statistics to compare both models is given by
which does not follow a chi-square distribution. To overcome this problem, Vuong [28] proposed an alternative approach based on the Kullback–Liebler information criterion [29]. Based on the distance between each model and the true process generating the data, namely the model , he arrived at the statistics
where
For strictly non-nested models, the statistic (13) converges in distribution to a standard normal distribution under the null hypothesis of equivalence of the models. Thus, the null hypothesis is not rejected if . On the other hand, we reject at significance level p the null hypothesis of equivalence of the models in favor of model being better (or worse) than model if (or ).
We now use Voung for comparing the LGN versus SN and PN versus SN models fitted to the data, since they are two non-nested models. Let be the LGN model and the SN model. The generalized LR test statistic value is . For PN versus SN, the generalized LR test statistic value is Therefore, the LGN and PN models are significantly superior to the model, according to the generalized LR statistic. Then, the LGN model is the better model compared with the normal, PN and SN models.
6.2. Application 2
For the second application, we consider a dataset consisting of measurements for 68 solar-type stars. These data were previously described and analyzed by Santos et al. [30] and Tovar-Falón et al. [31]. The dataset is available in the astrodatR library of the R Development Core Team [23] package under the name Stellar Abundances. In this application, we consider the response variable: , which represents the log of the abundance of beryllium scaled to Sun’s abundance, i.e., the Sun has . The explanatory variable is , which represents the effective stellar surface temperature (in Kelvin).
In astronomy, objects such as stars, galaxies or X-ray sources, among others, are observed in some new wave bands. Some of these objects can go unnoticed due to limited sensitivities, leading to upper limits in the measurement of their luminosity (see Feilgelson [32]). For the dataset, 14 observations (19.35%) were censored at 0.0, i.e., 12 beryllium measurements were not detected.
We fitted the censored normal (CN) or Tobit model using the censReg function of R Development Core Team [23]. Likewise, we also fitted the censored power-normal (CPN) and censored LGN (CLGN) models. The Table 6 shows the MLEs of the fitted models. The initial values for the parameters were initially taken from those returned by the censReg package of the CN model. The outputs show that the explanatory variable X is significant in the considered models.
Table 6.
Estimates (standard error) for CN, CPN, and CLGN linear models.
Table 7 contains the AIC and AICC values for the fitted models, where it is observed that the CLGN model presents the best fit. Figure 4a–c show the histogram, the CDF and the qqplot of the CLGN model of the scale residual errors of the uncensored part. Here one can see the good fit of the CLGN model.
Table 7.
AI and AICC for CN, CPN and CLGN linear models.
Figure 4.
(a) histogram for scaled residuals CLGN model, (b) CDF of the scaled residuals CLGN model (c) qqplot for scaled residuals CLGN regression model.
We compare the Normal and PN models against the LGN model, so for hypothesis testing
and
we have and both statistics with p- for which both tests are rejected and therefore the LGN model performs better than the Normal and PN models.
7. Conclusions
In this paper, we have proposed the asymmetric LGN distribution to give flexibility to the term of error in linear regression models. The LGN is based on the log-gamma-generated families of distributions of Amini et al. [15]. This new model presents greater ranges of asymmetry and kurtosis, and it extends the PN family of distribution; therefore, it has more flexibility in terms of asymmetry and kurtosis. The ordinary Tobit model Tobin [24] and the Tobit power-normal model Martínez-Flórez et al. [25] are special cases from an extension of the studied model LGN to the case of censored data. The maximum likelihood method was implemented, and the Fisher information matrix was derived, and it was shown numerically to be non-singular, which guarantees valid large sample results for the likelihood-ratio statistics. Two illustrations of real data reveal that the proposed model can be a useful alternative to existing models such as normal, power-normal, Tobit normal and Tobit power-normal. In addition, under certain considerations such as the non-singularity of the information matrix of the model and larger ranges of asymmetry and kurtosis, it may be a better alternative to the skew-normal distribution.
Author Contributions
Conceptualization, R.T.-F. and G.M.-F.; Methodology, R.T.-F., G.M.-F. and H.B.; Data curation, G.M.-F.; Formal analysis, R.T.-F., G.M.-F. and H.B.; Investigation, R.T.-F., G.M.-F. and H.B.; Resources, R.T.-F. and G.M.-F.; Software, R.T.-F. and G.M.-F.; Supervision, H.B.; Validation, G.M.-F. and R.T.-F.; Visualization, R.T.-F. and G.M.-F.;Writing—original draft, R.T.-F., G.M.-F. and H.B.; Writing—review and editing, R.T.-F., G.M.-F. and H.B. All authors have read and agreed to the published version of the manuscript.
Funding
Resolución de Problemas de Situaciones Reales Usando Análisis Estadístico a través del Modelamiento Multidimensional de Tasas y Proporciones; Esquemas de Monitoreamiento para Datos Asimétricos no Normales y una Estrategia Didáctica para el Desarrollo del Pensamiento Lógico-Matemático. Universidad de Córdoba, Colombia, Code FCB-05-19.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Details about data available are given in Section 6.
Acknowledgments
G. Martínez-Flórez and R. Tovar-Falón acknowledges the support given by Universidad de Córdoba, Montería, Colombia.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
- Durrans, S.R. Distributions of fractional order statistics in hydrology. Water Resour. Res. 1992, 28, 1649–1655. [Google Scholar] [CrossRef]
- Gupta, R.D.; Gupta, R.C. Analyzing skewed data by power normal model. Test 2008, 17, 197–210. [Google Scholar] [CrossRef]
- Pewsey, A.; Gómez, H.W.; Bolfarine, H. Likelihood-based inference for power distributions. Test 2012, 21, 775–789. [Google Scholar] [CrossRef]
- Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Skew-normal alpha-power model. Statistics 2014, 48, 1414–1428. [Google Scholar] [CrossRef]
- Martínez-Flórez, G.; Vergara-Cardozo, S.; González, L.M. The family of log-skew-normal alpha-power distributions using precipitation data. Rev. Colomb. Estad. 2013, 36, 43–57. [Google Scholar]
- Tovar-Falón, R.; Bolfarine, H.; Martínez-Flórez, G. The Asymmetric Alpha-Power Skew-t Distribution. Symmetry 2020, 12, 82. [Google Scholar] [CrossRef] [Green Version]
- Azzalini, A.; Capitanio, A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew-t distribution. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2003, 65, 367–389. [Google Scholar] [CrossRef]
- Zhao, J.; Kim, H.M. Power-t distributions. Commun. Stat. Appl. Methods 2016, 23, 321–334. [Google Scholar]
- Tung, H.P.; Tseng, S.T.; Hsu, N.J.; Hou, Y.T. A generalized pH acceleration model of nano-sol products and the effects of model misspecification on shelf-life prediction. IISE Trans. 2022, 54, 496–504. [Google Scholar] [CrossRef]
- Martínez-Flórez, G.; Tovar-Falón, R.; Jimémez-Narváez, M. Likelihood-Based Inference for the Asymmetric Beta-Skew Alpha-Power Distribution. Symmetry 2020, 12, 613. [Google Scholar] [CrossRef]
- Martínez-Flórez, G.; Tovar-Falón, R.; Martínez-Guerra, M. The Censored Beta-Skew Alpha-Power Distribution. Symmetry 2021, 13, 1114. [Google Scholar] [CrossRef]
- Sahu, S.K.; Dey, D.K.; Branco, M.D. A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 2003, 31, 129–150. [Google Scholar] [CrossRef] [Green Version]
- Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Asymmetric regression models with limited responses with an application to antibody response to vaccine. Biom. J. 2013, 55, 156–172. [Google Scholar] [CrossRef]
- Amini, M.; MirMostafaee, S.M.T.K.; Ahmani, J. Log–gamma–generated families of distributions. Statistics 2014, 48, 913–926. [Google Scholar] [CrossRef]
- Cordeiro, G.M.; Bourguignon, M.; Ortega, E.M.M.; Ramires, T.G. General mathematical properties, regression and applications of the log-gamma-generated family. Commun. Stat.—Theory Methods 2018, 47, 1050–1070. [Google Scholar] [CrossRef]
- Prentice, R.L. A log-gamma model and its maximum likelihood estimation. Biometrika 1974, 61, 539–544. [Google Scholar] [CrossRef]
- Lawless, J.F. Inference in the generalized gamma and log gamma distributions. Technometrics 1980, 22, 409–419. [Google Scholar] [CrossRef]
- Young, D.H.; Bakir, S.T. Bias correction for a generalized log-gamma regression model. Technometrics 1987, 29, 183–191. [Google Scholar] [CrossRef]
- Ortega, E.M.M.; Bolfarine, H.; Paula, G.A. Influence diagnostics in generalized log-gamma regression models. Comput. Stat. Data Anal. 2003, 42, 165–186. [Google Scholar] [CrossRef]
- Ortega, E.M.M.; Cancho, V.G.; Paula, G.A. Generalized log-gamma regression models with cure fraction. Lifetime Data Anal. 2009, 15, 79. [Google Scholar] [CrossRef] [PubMed]
- Zhang, D.; Davidian, M. Linear mixed models with flexible distributions of random effects for longitudinal data. Biometrics 2001, 57, 795–802. [Google Scholar] [CrossRef] [PubMed]
- R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021; Available online: http://www.R-project.org (accessed on 31 July 2021).
- Tobin, J. Estimation of relationship for limited dependent variables. Econometrica 1958, 26, 24–36. [Google Scholar] [CrossRef] [Green Version]
- Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. The alpha–power tobit model. Commun. Stat.—Theory Methods 2013, 42, 633–643. [Google Scholar] [CrossRef]
- Akaike, H. A new look at statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–722. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Vuong, Q.H. Likelihood ratio tests for models selection and non–nested hypotheses. Econometrica 1989, 57, 307–333. [Google Scholar] [CrossRef] [Green Version]
- Kleiber, C.; Zeileis, A. Applied Econometrics with R, 1st ed.; Springer: New York, NY, USA, 2008. [Google Scholar]
- Santos, N.; López, R.G.; Israelian, G.; Mayor, M.; Rebolo, R.; García-Gil, A.; De Taoro, M.P.; Randich, S. Beryllium abundances in stars hosting giant planets. Astron. Astrophys. 2002, 386, 1028–1038. [Google Scholar] [CrossRef]
- Tovar-Falón, R.; Bolfarine, H.; Martínez-Flórez, G. The Asymmetric Power-Student-t Model for Censored and Truncated Data. Acad. Bras. Cienc. 2021, 93, e20190920. [Google Scholar] [CrossRef]
- Feilgelson, E.D. astrodatR: Astronomical Data. R Package v. 0.1. Available online: https://cran.r-project.org/web/packages/astrodatR/ (accessed on 31 July 2021).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).