Comparison between Highly Complex Location Models and GAMLSS

This paper presents a discussion regarding regression models, especially those belonging to the location class. Our main motivation is that, with simple distributions having simple interpretations, in some cases, one gets better results than the ones obtained with overly complex distributions. For instance, with the reverse Gumbel (RG) distribution, it is possible to explain response variables by making use of the generalized additive models for location, scale, and shape (GAMLSS) framework, which allows the fitting of several parameters (characteristics) of the probabilistic distributions, like mean, mode, variance, and others. Three real data applications are used to compare several location models against the RG under the GAMLSS framework. The intention is to show that the use of a simple distribution (e.g., RG) based on a more sophisticated regression structure may be preferable than using a more complex location model.


Introduction
With the increasing use of new data analysis techniques, mainly artificial intelligence, machine learning, neural networks, and big data, regression analysis has become, perhaps, the most important tool among the various statistical (learning) methods of optimization, and of decision-making management. Evidently, the greater the complexity of the databases, the greater the complexity in the proper treatment of these data. The number of papers with increasingly complex techniques is naturally emerging because of the need to extract more accurate information from the data.
This manuscript is more of a work belonging to this class of papers, although we think it is less complex compared to its alternatives, which are mainly presented as (log linear) location models. Usually, the location parameters are associated to other important parameters like mean, percentiles, standard deviation, skewness, and kurtosis, in which these characteristics are implicitly modeled. There are papers that perform a good work obtaining the solutions. For instance: the three parameter log-xgamma Weibull regression model [1], the four parameter Topp Leone generated Burr XII [2], log-odd log-logistic Marshal Olkin generalized half-normal [3] and log-beta Burr XII [4] regression models, and the five parameter log-Hjorth Weibull regression model [5]. We note that many of these complex distributions suffer from the interpretation of the parameters and their estimations needed whenever predictions are demanded.
In the sequel, instead of developing and considering highly complex location models to deal with complex data, a different approach will be used, considering a more sophisticated class of regression models based on the reverse Gumbel (RG) distribution, a simple distribution with simple parameters and predictions interpretations. The chosen tool for the presented analyses is the generalized additive models for location, scale, and shape (GAMLSS) [6] framework, since they allow that any and all of distribution parameters to be explicitly modeled.
Hence, the aim of this paper is to compare if a GAMLSS model based on a very simple distribution (RG) is able to outperform several highly complex location models. In this sense, Section 2 presents a description of the location models, the GAMLSS framework and some statistical inference concepts. In Section 3 we present three real data applications (voltage data, class-H insulation, and heart transplant) comparing some recently developed location models against the RG distribution under the GAMLSS framework. Finally, Section 4 ends the paper with some concluding remarks.

Location Models
Location regression models are useful to relate a dependent (response) variable to one or more explanatory variables. Suppose a response Y, with location parameter µ(v), which depends on the explanatory variable vector v. For this case, a class of regression models for location is characterized by where Z follows a specific distribution that does not depend on v.
For instance, let us consider that Y follows a reverse Gumbel distribution (RG), i.e., Y∼RG(µ, σ), also known as the type I extreme value distribution, given by where −∞ < µ < +∞ is the mode, and σ > 0 is the scale parameter, E(Y) = µ + 0.57722σ and the median is µ + 0.36611σ [7]. The RG distribution is appropriate for moderately positive skew data. Considering that Z follows a standard RG distribution, i.e., µ = 0, in Equation (1), then Y will follow a RG distribution with model parameters θ = (µ(v), σ). Note that, by modeling only µ, we are actually explicitly modeling the mode of the response and also implicitly modeling both the average and median of Y.

GAMLSS Framework
An alternative approach, when other measures are affected by explanatory variables, e.g., variance, skewness, and excess of kurtosis, is to explicitly model the parameters related to these measures. In this sense, the GAMLSS framework [6] occupies a prominent position among the beyond the mean (or location) regression models [8], generalizing both generalized linear [9] and generalized additive [10] models. GAMLSS are semi-parametric regression models in which any distribution may be defined to describe the response Y, and different regression structures may be considered to explain any or all of its parameters, using linear and/or nonlinear functions.
Let Y ∼ D(θ), where D is the distribution of the response variable, and θ is its parameter vector. Then, a GAMLSS can be written as where g k (·) denote appropriate link functions for the kth parameter, which is usually determined by the range of the parameter considered [11], X k is a known n × (m k + 1) model matrix, m k denotes the number of explanatory variables related to the kth parameter, β k = β 0k , β 1k , . . . , β m k k is a parameter vector of length (m k + 1), and s jk (.) are smoothing functions (in this paper, it will be considered as a P-spline [12,13]). When J k ∑ j=1 s jk (x jk ) = 0, model (2) reduces to a fully parametric GAMLSS version [6] (pGAMLSS, for short).
Since any distribution may be used in GAMLSS, usually there is no need to transform the data in study, resulting in clearer interpretations. A wide list of distributions in GAMLSS may be found in Reference [7]. For instance, if Y∼RG(µ, σ), then a GAMLSS model based on the RG distribution is given by Here, the considered link functions for µ and σ were the identity and logarithm due to their range, respectively. Moreover, we can actually rewrite a location model in terms of the GAMLSS framework. Let us consider Y∼RG(µ, σ) again, and then Equation (1) can be rewritten as It is noteworthy that, depending on the parameterization of the response variable distribution [7], µ is not necessarily a location parameter. Nonetheless, the model presented in Equation (2) can be applied more generally to any type of parameter from a population distribution [6].

Estimation and Model Selection
The maximum likelihood estimates for a GAMLSS model can be performed in the gamlss package [14] (and its add-ons) in R software [15]. The algorithms used are the RS and CG procedures described by References [6,11,14] and are available in the documentation of the package.
In order to deal with censored observations (events that will occur in the future) within the GAMLSS framework, the methodology is identical to the one used in classical models, i.e, we must add the probability that this information will occur in the future 1 − F(y i ; θ k ) into the likelihood , where F(·) denotes the cumulative density function. Then, the loglikelihood is given by l(θ k ) = ∑ i∈F log f (y i ; θ k ) + ∑ i∈C log 1 − F(y i ; θ k ). Computationally, we can use the gamlss.cens [16] package to obtain the model estimates in the presence of censored observations.
As the explanatory variables can be included in any of the regression structures of all parameters, there are some procedures to select the additive terms. In this paper, we are using the so-called Strategy A [11,17], a stepwise-based method applied to select the terms for each model parameters based on the Akaike information criterion (AIC) [18]. This approach can be achieved using the stepGAICAll.A() function in the gamlss package.
After selecting the additive terms, we verify the model assumptions by conducting a residual analysis. The worm plots (WP) [19] are a useful tool based on the normalized quantile residuals [20], that graphically show if the fitted model presents an adequately fit to the data. With this plot we can compare the differences between the empirical and model residual mean, variance, skewness, and kurtosis, respectively, within the range in the QQ plot. More information regarding WP may be found in Reference [11].

Results
In this section, we will consider three classical data sets that were used as motivational examples to develop new (log-)location models in the past few years. These models will be compared to the GAMLSS framework based on the two-parameter Reverse Gumbel distribution [7]. All comparisons are made using both AIC [18] and Bayesian information criterion (BIC) [21]. We also provide, in each application, the effective degrees of freedom for all fitted models, i.e., the sum of the degrees of freedom of linear terms with the smoothing parts (when they are considered in the fitting process).

Application 1: Voltage Data
This data set was reported by Lawless [22], who conducted an experiment considering accelerated voltage life test to study specimens of solid epoxy electrical-insulation. Basically, the experiment consists in determining the failure times for epoxy insulation specimens (in min), considering three levels of voltage (x i ): 52.5, 55.0, and 57.5 kV. The total of times observed were n = 60, where six observations were classified as censored observations. These data have already been modeled by the following (log-)location models: • Four-parameter exponentiated logistic geometric type I(ELGI) distribution; fourparameter exponentiated logistic geometric type II (ElGII) distribution [30].
Note that, as mentioned in Section 2.2, no transformation on the response variable is necessary while using the GAMLSS framework. However, in this application, as considered in the above papers, we will model the logarithm of the failure times, i.e, the response variable considered in this example is y: log-time in minutes. Further, x i will be considered as continuous (as in the previous applications), since the goal here is not to check if there is a significant difference between the levels of voltage x i but to understand how x impacts in the failure times. Figure 1 displays the densities of the response variable (log-time in minutes) for each voltage level. The idea here is to check whether it is necessary to fit a regression structure (consider x i ) for the scale parameter σ of the RG distribution on the GAMLSS framework. As we can see, there is clearly a difference between the dispersion of the three different levels and thus σ may be modeled as a function of x i . Moreover, we may note that the mode for x i = 57.5 and x i = 55.0 seem to be quite similar, but different from the mode presented by x i = 52.5, indicating a non linearity effect between µ and the voltage levels.
Based on the Strategy A variable selection method [11,17], the final fitted GAMLSS model, to represent Y is given by and log σ i = 5.83 + s(x i ).
Note that, for both regression structures, a P-spline [12,13] was considered due to the nonlinear relationship between x i and both parameters. The smoothing parameters λ for µ and σ are 3.23 and 2.85, respectively. In order to show the advantage of the fitted GAMLSS model (3), Table 1 presents all AIC, BIC, and effective degrees of freedom values for all models considered to fit such data. In addition of the semiparametric GAMLSS model presented in (3), we also provide the results of the fully parametric GAMLSS (pGAMLSS) based on the RG distribution, which regression structures are given by µ i = 13.157 − 0.129 x i and log σ i = 6.073 − 0.113 x i . The idea here is to show how much reduction in AIC and BIC is caused by the addition of a smoother in the GAMLSS framework (please note that this addition may occur based on practical reasons, i.e., when a nonlinear effect is observed between an explanatory variable and a given parameter). Further, we shall highlight that the maximum likelihood estimates (MLEs), as well as AIC and BIC values presented in Reference [23], seem slightly off for the LWMOW model and the results presented in Table 1 differ from their original paper. The same occurs with the AIC and BIC values for the ELGII model available in Reference [30].  Table 1 illustrates that the GAMLSS model, based on the RG distribution considering smoothing functions, outperformed all other previous fitted models, i.e., a more flexible class of regression model (GAMLSS) is able to capture more information provided by the data, granting good fit even when a very simple distribution (RG) is considered. Nonetheless, even the parametric GAMLSS version, i.e., the pGAMLSS based on the RG distribution, presents a better fit than all other (log-)location models considered, according to the BIC measure (170.5). Figure 2 displays the fitted survival functions based on the RG distribution and its residuals analysis through the WP. These plots indicate that the proposed model provides a reasonable fit to these data.

Application 2: Class-H
We are now considering the data set about failure of motorettes with a new Class-H insulation. These data were introduced by Nelson [31], where the response variable y is the logarithm of the failure time (in hours). In order to investigate the effects of the temperatures in the failure times, four temperatures were considered in this experiment, 190, 220, 240, and 260 • C.
As in previous applications to these data, we will consider the temperature as a continuous variable, i.e., we are not only interested to test the difference between the levels of temperature. Once again, in order to compare previous works with the GAMLSS framework, the RG distribution will be considered. The previous (log-)location models considered to model these data are: • Four-parameter log-Lomax Weibull (LLW) distribution [32]; • Five-parameter log-beta transmuted Weibull (LBTW) distribution [33]; • Five-parameter log-beta exponentiated Weibull (LBEW) distribution [34]; • Four-parameter log-beta-Weibull (LBW) distribution [35]. Figure 3 displays the densities for each temperature level. With this plot, we have a visual of information indicating that both parameters, µ and σ, may be modeled by the explanatory variable. We may also note a possible nonlinearity of the temperature effect in mode µ parameter, since the mode for temperature 190 • C is quite lower than the other levels.
Through the Strategy A variable selection method [11,17], the final fitted GAMLSS model based on the RG distribution is given by where the fitted smoothing parameter λ for µ is 17.47. Note that, although temperature was considered to model both regression structures, the smoothing function was only necessary to model the mode µ. Table 2 shows the values of AIC, BIC, and effective degrees of freedom values for all fitted models to the Class-H data. As in the previous application, we also provide the results of the pGAMLSS framework (i.e., only considering linear effects on both parameters) based on the RG distribution, which regression structures are given by µ i = 15.314 − 0.033 temperature i and log σ i = −3.444 + 0.008 temperature i . Once again, we can conclude that, by using a simpler distribution, but considering a flexible regression structure (as GAMLSS), we may have better goodness-of-fit measures.  For a visual check of the goodness-of-fit, Figure 4 provides the fitted and empirical survival functions, as well the residuals WP from the fitted GAMLSS model based on the RG distribution, where it seems that the model is adequately fitted to the data.

Application 3: Heart
In this last application, we are considering the data provided by Kalbfleish and Prentice [36], where a study regarding the longevity of patients waiting for a heart transplant was conducted. During the study, some patients (27%) died before an appropriate heart could be found, so, by considering the response variable the time to receive the transplant, these events were considered as censored information.
The goal here is to study the effects of some explanatory variables on the time until transplant. The variables taken into account are y: log-time in days since acceptance into the transplantation program to transplant and to death; δ i : failure indicator (0: censored, 1: observed); x i1 : age at acceptance (in years); x i2 : previous surgery (0: no, 1: yes); and x i3 : transplant (0: no, 1: yes). Figure 5 shows the relationship between the response and all explanatory variables. We may note that the mode of y changes for each level of X 2 and X 3 , and, as the age at acceptance increases, the mode of y decreases, indicating that all three variables might be used to fit the mode µ parameter of the RG distribution. We may also note that the dispersion is influenced by X 1 and X 2 , indicating that they are probably good predictors to fit the scale parameter σ. Using the Strategy A variable selection method [11,17], the final fitted GAMLSS model based on the RG distribution is given by No smoothing functions were applied onto the age at acceptance in both parameters, i.e., in fact, the final selected GAMLSS model to explain the behavior of the response variable according to the available explanatory variables is the fully parametric version, pGAMLSS. As stated in the first application in Section 3.1, the smoothing functions may be considered when there is a nonlinear effect of a explanatory variable in a given parameter (which is not observed in this case).
We will compare model (4) with the following (log-)location models already proposed in the literature to deal with these data: Three-parameter log-odd log-logistic Weibull (LOLLW) distribution [43]. Table 3 presents the AIC, BIC, and effective degrees of freedom values. Even though the LEOF-GHN model presents the smallest AIC, the pGAMLSS based on the RG distribution returns an AIC of only 1.5 units greater. Moreover, the RG model based on the fully parametric GAMLSS framework produces the best BIC value; thus, by the parsimony principle and also considering the model with the simplest interpretability, the GAMLSS alternative would be preferable. In order to check the model assumptions, the WP of the fitted pGAMLSS model is presented in Figure 6, showing that, in fact, the model provides a reasonable fit. Since there is a continuous covariate in this problem, we do not present the estimated and empirical survival functions. Table 3. AIC, BIC, and effective degree of freedom (df) from the fitted models for the heart data.

Discussion
Although there is a reasonable number of new regression models being developed in the last few years (e.g., the ones previously fitted to the three applications considered in this paper), usually they present a highly complex structure that may suffer from the interpretation of the parameters. This is a critical drawback since the interpretability of such characteristics is still the major advantage of regression models compared to other methods.
The key point within the discussion in application sections in papers that develop new (log-)location models is usually based on goodness-of-fit measures, such as AIC and BIC. Focusing on this specifically point, let us suppose a response variable Y that follows a Gaussian distribution, and an explanatory variable X which directly affects both the mean µ and standard deviation σ of Y. To fit such behaviors, should we build a location model or a heteroscedastic model (GAMLSS in other words) or propose a new location model? The natural choice here seems to be the GAMLSS (distributional regression) approach.
Further, we may be interested in discussion on why more complex models might present better statistics, like AIC and BIC, when compared to some of their special and/or limiting cases. Looking at the properties of these models, we usually note the association between their location parameter and other important characteristics, such as mean, percentiles, standard deviation, skewness, and kurtosis. This means that, in the modeling stage of the location parameter, we are implicitly modeling these characteristics, as well. In the GAMLSS structure, we can explicitly model any and all parameters directly, i.e., different regression structures can be considered to explain all the parameters of the response variable distribution. Thus, apart from producing better goodness-of-fit measures, we can still identify which characteristics affect each of the parameters.
Finally, we present a review of regression models, based on fitting any and all parameters using linear and/or nonlinear structures, and consequently modeling more accurately the data behavior through the GAMLSS framework. The use of simpler models, with interpretable parameters, based on very sophisticated regression structures, presented better results than the ones obtained through highly complex location models. Following the parsimony principle and/or the interpretability of the parameters, we may conclude-at least from a practical point of view-that, by using the GAMLSS framework, the development and proposal of new models with a high number of parameters is, in some cases, avoidable.