A New Quantile Regression Model and Its Diagnostic Analytics for a Weibull Distributed Response with Applications

: Standard regression models focus on the mean response based on covariates. Quantile regression describes the quantile for a response conditioned to values of covariates. The relevance of quantile regression is even greater when the response follows an asymmetrical distribution. This relevance is because the mean is not a good centrality measure to resume asymmetrically distributed data. In such a scenario, the median is a better measure of the central tendency. Quantile regression, which includes median modeling, is a better alternative to describe asymmetrically distributed data. The Weibull distribution is asymmetrical, has positive support, and has been extensively studied. In this work, we propose a new approach to quantile regression based on the Weibull distribution parameterized by its quantiles. We estimate the model parameters using the maximum likelihood method, discuss their asymptotic properties, and develop hypothesis tests. Two types of residuals are presented to evaluate the model ﬁtting to data. We conduct Monte Carlo simulations to assess the performance of the maximum likelihood estimators and residuals. Local inﬂuence techniques are also derived to analyze the impact of perturbations on the estimated parameters, allowing us to detect potentially inﬂuential observations. We apply the obtained results to a real-world data set to show how helpful this type of quantile regression model is.


Bibliographical Review
In the context of usual regression, it is common to model the relationship between a response variable and covariates by employing the mean response conditioned to such covariates. In this usual modeling, the normal distribution is often considered. However, there are many real-world phenomena in which the data follow an asymmetrical distribution. In this case, the relation between the response and covariates utilizing the mean is not suitable since it is strongly affected by asymmetry and atypical observations. Another limitation of the usual regression approach is when we are interested in studying other parameters in addition to the mean; see [1,2].
Observations that follow an asymmetrical behavior can come from different models, with the Weibull distribution frequently being considered. This distribution is skewed, has positive support, and possesses two parameters which modify its shape and scale; for a detailed description of its main properties and associated inference, see [3] (pp. 629-666) and [4]. Estimation and testing methodologies based on several data configurations and situations may be found in [5]. In its origins, the Weibull distribution was used to study the breaking strength of materials; see [6]. Simple parsimonious Weibull models were derived in [7] and applied to fatigue life data of longitudinal elements by considering functional equations, proportional hazards techniques, and subsequent likelihood analysis. Other applications include different areas in problemas related to health sciences, lumber industry, microscopic degradation, migratory systems, quality control, rainfall and flood, reliability, and wind speed. Details of these and other applications of the Weibull distribution, as well as data sets described with this distribution, can be found in chapter 7 of [8] (pp. 275-310) and references therein.
Unlike the mean, which can be challenging to interpret when the distribution of the response variable is asymmetrical, the median remains highly informative in that case. Thus, under this scenario, the modeling of the median response based on values of covariates is more appropriate. The first idea of median regression was presented in [9]. However, quantile regression models have the median regression as a particular case (50th percentile) and can describe other locations (non-central) of the distribution. In [10], the authors introduced quantile regression models, and from then, different versions and applications of these models have been developed; see [11][12][13]. Therefore, to describe the relationship between a response variable that follows an asymmetrical distribution and the covariates, quantile regression is a better alternative to the usual regression.
The standard approach in parametric quantile regression considers a functional equation that relates the response (say Y), a parametric component (say x β, which corresponds also to the modeled quantile of Y), and an error component (say ε) with its associated assumptions; see [11] (p. 29). The traditional procedure for estimating the model parameters in this approach does not make a distributional assumption for the error component. However, if we add this assumption, it is natural to incorporate it in the response variable rather than in the error component. In addition, the maximum likelihood method is often chosen to estimate parameters because of the good properties of the obtained estimators; see [14] (pp. 94-125). Based on these two previous considerations, a similar approach to generalized linear models (GLM) can be used for quantile regression; see [15][16][17]. In GLM, the mean is modeled, which is besides one of the parameters of the assumed distribution. In our approach, the modeled quantile is a parameter of the distribution as well. When using a parametric distribution, we can develop statistical analysis employing the likelihood function to perform estimation, hypothesis tests, and local influence analysis.
Diagnostic analytics plays a relevant role in statistical modeling, including global, local influence methods and goodness of fit. Goodness-of-fit techniques for a determined model permit us to evaluate the adequacy of the model to the data; see [18]. The pseudo-R 2 proposed by [19]-from now denoted as R 2 M -and randomized quantile (RQ) and generalized Cox-Snell (GCS) residuals are helpful tools for evaluating goodness of fit; see [20,21]. Local influence assesses the effect of small perturbations in the data and/or model assumptions on parameter estimates; see [22]. Different scenarios of perturbation are considered to detect potentially influential cases. Local influence techniques have been developed for different non-Gaussian and asymmetrical models; see, for example, refs. [16,17,[23][24][25]. As a motivation to develop our work, next, we show the inadequacy of the usual mean regression when analyzing real-world data with an asymmetrical distribution.

Limitations of the Usual Regression Model
The usual regression model can be formulated as where Y i and x i are the response variable and the vector that contains the values of covariates X (with the first component equal to one), respectively, for the ith observation, and β is a vector of the unknown regression coefficients to be estimated.
When the data follow a skew distribution, the mean model is not appropriate. To demonstrate this fact, consider a data set with n = 41 observations regarding the time (in hours) to electrical breakdown of an insulating fluid (response variable Y) and the test voltage in kV (covariate X). This data set is taken from [26] and is available in the R software by the package survival; see [27,28]. The characteristics of the insulating fluid defined in various standards can be broadly classified into chemical, electrical, and physical features. For example, the electrical characteristics (breakdown voltages) of the insulating fluid are affected by elements such as water content and electrostatic charges, but also possibly affected by trace components in this fluid.
A descriptive summary of the times to electrical breakdown is presented in Table 1, including the median, mean, standard deviation (SD), coefficients of variation (CV), skewness (CS), and kurtosis (CK), besides minimum (y (1) ) and maximum (y (n) ) values. Figure 1a presents a histogram for Y, and Figure 1b shows the corresponding adjusted and usual boxplots. An adjusted boxplot is used when the data present an asymmetrical distribution; see details in [29]. In this case, the adjusted boxplot gives a better description to detect atypical cases.  (d) Figure 1. Histogram (a) and boxplots (b) for the data of times to electrical breakdown with the full data set, and histogram (c) and boxplots (d) for the data set without cases #2 and #3.
From Table 1, note that the median is noticeably smaller than the mean, whereas Figure 1a allows us to observe that the empirical distribution of the times to electrical breakdown is unimodal and positively skewed. Therefore, the assumption of an asymmetric distribution for the response variable seems to be adequate. This asymmetry is also evidenced by the values of the CS, which is positive. Furthermore, in Figure 1b, we highlight two atypical cases (#2 and #3), which can correspond to potentially influential cases. The possible potential influence of these and other cases is analyzed by using local influence in Section 6.2. In Figure 1c, we observe the empirical distribution of Y without cases #2 and #3, whereas the boxplots associated are displayed in Figure 1d. Note that the asymmetrical behavior of the data is kept. However, now the adjusted boxplot does not present atypical cases despite the usual boxplots identify some of them.
Next, the model stated in (1) is adjusted to this data set employing the ordinary least squares method. Then, we obtain the predictive model The fit of the model is evaluated by the usual standardized Pearson residual, presented in the theoretical quantile versus empirical quantile (QQ) plot with envelopes in Figure 2. Note that the points follow an irregular behavior around the straight line, and many observations are outside the bands. Hence, it is not clear that the usual regression is appropriate for modeling this data set due to the asymmetry of the response distribution. For these data (full set and set without cases #2 and #3), we may assume an asymmetrically distributed response. In addition, in this case, the modeling of the conditional median is a better alternative for describing the relation of the response with the covariates (as we show below) because the median is a robust measure in the presence of atypical observations. However, the median is a quantile and, in consequence, a description of the full range of the response based on covariates can be performed by using quantile regression.

Objective and Outline
The main objective of this work is to propose a new quantile regression model based on a parameterization of the Weibull distribution, following the approach of [16]; see [30,31] for similar but not identical models. Our approach intends to be an alternative to the existing quantile models in the literature. Some characteristics of the proposed Weibull quantile regression are as follows: (i) flexibility for modeling different types of data, since the Weibull distribution, as mentioned, has been successfully applied in several areas; and (ii) easy computational implementation, since the Weibull distribution has a simple closed-form inverse cumulative distribution function, which facilitates its utilization when modeling data by a parametric quantile regression with distributional assumption for the response.
The maximum likelihood method is used for model parameter estimation. Our study includes the evaluation of the adequacy of the models to the data by Akaike (AIC), Bayesian (BIC), and corrected Akaike (CAIC) information criteria; see [1] for details. In addition, R 2 M as well as RQ and GCS residuals are considered in this evaluation. We identify potentially influential observations under different scenarios of perturbation employing local influence techniques; see [22]. Moreover, an application to real-world data is discussed to illustrate the proposed methodology and show how helpful this type of quantile regression model is in practice.
The rest of this paper proceeds as follows. The new parametric quantile regression model based on the Weibull distribution is formulated in Section 2. In contrast, in Section 3, we describe the parameter estimation method, associated inference, and the related RQ and GCS residuals to evaluate the fit of the model to the data. In Section 4, two Monte Carlo simulation studies are conducted to evaluate the statistical performance of the maximum likelihood estimators and the empirical distribution of residuals. In Section 5, we propose techniques to study potentially influential cases by using local influence and four perturbation schemes. In Section 6, an illustration of the proposed Weibull quantile regression models is carried out for the same real-world data set presented in Section 1. Finally, in Section 7, we present some concluding remarks.

A Reparameterized Weibull Distribution
The probability density function of a random variable Y that follows a Weibull distribution with shape and scale parameters k > 0 and λ > 0, respectively, is given by It is possible to prove that, if q ∈ (0, 1) is a fixed number, the qth quantile of Y corresponds to Q = λ (− log(1 − q)) 1/k , from which we obtain For more details about properties of the Weibull distribution, see [3] (pp. 629-666) and [8]. Replacing the formula stated in (3) for λ in the expression given in (2), we have a new parameterization of the Weibull distribution based on its quantiles, which is denoted by Wei(Q, k), and its probability density and cumulative distribution functions are formulated, respectively, as and Figure 3 shows the behavior of the reparameterized Weibull probability density function defined in (4) under different values of the parameters. Note that, as Q decreases, the kurtosis of the distribution increases; see Figure 3d-i. Thus, when Q increases, the tails are heavier. Moreover, observe in Figure 3a-c that, when k takes values less or equal to one, the distribution mode is zero, while if it takes values greater than one, this mode is positive.

The Weibull Quantile Regression Model
Let Y 1 , . . . , Y n be independent random variables with Y i ∼ Wei(Q i , k), for i ∈ {1, . . . , n}. Suppose that the quantile parameter Q i can be modeled by where β = (β 0 , β 1 , . . . , β p−1 ) , for p < n, is a vector of unknown regression parameters and represents the values of p covariates. Note that the link function h is invertible, at least twice differentiable, and has positive support. The last condition of h guarantees that the quantile is modeled for a positive expression. Link functions that may be considered are, for example, h(u) = log k (u) and h(u) = a √ u, with a ≥ 2 and k being a positive integer number.
Note that the reparametrization of the Weibull distribution by quantiles is necessary to formulate the Weibull quantile regression defined in (6), which allows us to model any quantile value of the distribution. Furthermore, this reparameterization makes it possible to incorporate directly the regression structure given in (6) into the corresponding likelihood function. Note that this structure is different from the traditional quantile regression model with an error component; see [11] (p. 29). Doing that, as mentioned, the distributional assumption is directly related to the response variable, permitting statistical tools based on the associated likelihood function to be obtained in a similar form to GLM.

Parameter Estimation
Let y = (y 1 , . . . , y n ) be an observation of (Y 1 , . . . , The log-likelihood function of the model given in (6) for θ = (β , k) based on y can be written as where i (Q i , k) stated in (7) is formulated as Therefore, the score vector has as components˙ β j , for j ∈ {0, 1, . . . , p − 1}, and˙ k , expressed as˙ where The elements of the associated Hessian matrix are written as where To estimate the vector θ of parameters with the maximum likelihood method, we often solve the equation˙ θ = 0 p+1 , where 0 p+1 is the p + 1 null vector. However, no closed-form expressions for the maximum likelihood estimates can be obtained, and therefore numeric procedures must be used to calculate the estimate of θ. For example, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method or other quasi-Newton algorithms may be considered; see [32]. Different algorithms are implemented in the R software, including the BFGS approach for constrained and unconstrained maximization; see [27].

Inference and Hypothesis Testing
Under some regularity conditions [14] (pp. 118-119), it is possible to establish that θ∼ N p+1 (θ, (I(θ)) −1 ), (10) where I(θ) is the expected Fisher information matrix, which may be computed by We can obtain approximate confidence intervals using the results provided in (10), whereas for approximating the information matrix defined in (11), we may employ the observed Fisher information matrix stated as whose elements established in (12) may be calculated from (9), evaluated at θ = θ.
Note that if we want to test the hypothesis H 0 : θ = θ 0 versus the alternative hypothesis H 1 : θ = θ 0 , where, as mentioned, θ = (β , k) , then we can use the Wald and likelihood ratio tests. The Wald [33] and likelihood ratio statistics based on the observed Fisher information matrix [34] are, respectively, given by When n → ∞, both statistics converge to a random variable that follows a χ 2 distribution with r degrees of freedom, χ 2 r in short, where r is the number of parameters under H 0 , which is rejected, at a nominal level of significance α, if the statistic computed according to (13) or (14) is greater than χ 2 r,1−α , which denotes the 100(1 − α)th χ 2 r quantile.

Residuals
To evaluate the model adequacy-that is, to assess the fit of our model to a data set-we can employ the RQ and GCS residuals. For our reparameterized Weibull model, these residuals are given, respectively, by where Φ is the standard normal cumulative distribution function; F is given by (5); Q i and k are the maximum likelihood estimates of Q i and k, respectively; and S = 1 − F is the corresponding survival function. The RQ residual is approximately standard normal distributed, whereas the GCS residual follows a standard exponential asymptotic distribution when the model is correctly specified, whatever its specification is.

Scenario 1: Maximum Likelihood Estimation
We employ the R software and its maxBFGS function, which implements the BFGS algorithm with constraints for maximization and requires initial values for estimating β = (β 0 , β 1 ) and k. We utilize the least square estimator of β assuming a usual linear regression and the maximum likelihood estimate of k based on the observations y 1 , . . . , y n without considering covariates. The maximum likelihood estimates are presented in Tables 2-7 wherein the empirical mean, bias, variance, root mean squared error (RMSE), CS, and CK are all reported. A look at the results in Tables 2-7 allows us to conclude that, in general, as the sample size increases, the bias, variance, and RMSE of the estimators decrease, as expected. Moreover, β 0 , β 1 , and k seem all to be consistent and asymptotically normal distributed. Our study was conducted on a Dell Inspirion 5748 personal computer with an Intel core i7-4510U CPU, 2.00 GHz × 4, and 8 GB of RAM.

Scenario 2: Empirical Distribution of the Residuals
Now, we report a second Monte Carlo simulation study to evaluate the performance of the GCS and RQ residuals defined in (15). Tables 8 and 9 present the empirical mean, SD, CS, and CK for β 0 = 0.5 and β 1 = 1.0, whose values are expected to be 0, 1, 0, and 3, respectively, for r RQ , and 1, 1, 2, and 9, respectively, for r GCS . From Tables 8 and 9, we observe that, in general, the considered residuals conform well with the reference distributions. The same conclusions are obtained for the other values of β 0 and β 1 .      Table 8. Summary statistics of the GCS residuals (β 0 = 0.5; β 1 = 1.0).

Perturbation Matrix and Potentially Influential Cases
Local influence techniques examine the effect of small perturbations in the model data and/or assumptions regarding the estimated parameters. Let (θ) be the log-likelihood function for the parameter θ of the model defined by (6), which is named the non-perturbed model. Consider a vector of R n , ω namely, called the vector of perturbation, and we define (θ; ω) as the log-likelihood function of the perturbed model and θ ω as the maximum likelihood estimate of θ obtained from (θ; ω). Further, let ω 0 ∈ R n be a non-perturbation vector such that (θ) = (θ; ω 0 ). The likelihood displacement function (LD) defined as is used to detect the impact of ω. We study the local behavior of the surface plot (ω , LD(ω)) around ω 0 . The direction in which the LD locally changes most rapidly is evaluated; that is, the maximum curvature of the surface. For LD(ω) given in (17), the maximum curvature is established as where B defined in (18) is given by B = −∆ ¨ θ θ ∆ and d is a unit-length direction vector; see [22]. The expression¨ θ θ is the Hessian matrix of (θ) evaluated at θ and ∆ is a (p + 1) × n perturbation matrix also evaluated at θ = θ and ω = ω 0 . Hence, the elements of ∆ are stated as Then, d max is a unit-length eigenvector associated with the maximum absolute eigenvalue of B. The plot of d max versus the index i may be considered to detect whether case i is potentially influential on θ. The direction d = e i , where e i is an n × 1 vector of zeros, with one at the ith position, is another relevant direction to analyze. For such a direction, the normal curvature is C i (θ) = 2|b ii |, where b ii is the ith element of the diagonal of the matrix B. If then the case i is potentially influential; see [35]. Next, we describe the matrix ∆ for different perturbation schemes with its elements defined in generic terms in (19).

Perturbation on the Response
Now, consider an additive perturbation on the response i by making y i (ω i ) = y i + ω i s Y , where ω i ∈ R and s Y is a scale factor that can be the sample SD of Y, for i ∈ {1, . . . , n}. Then, the perturbed log-likelihood function corresponds to (θ; The column vectors of ∆ may be expressed as ω=(0,...,0) , i ∈ {1, . . . , n},

Perturbation in the Continuous Covariate
Consider an additive perturbation on a particular continuous covariate, x t namely, with t ∈ {1, . . . , p − 1}, by making x ti (ω i ) = x ti + ω i s X t , where s X t is, again, a scale factor, which can be taken as the sample SD of X t , and ω i ∈ R, for i ∈ {1, . . . , n}. Therefore, the perturbed log-likelihood function is given by (θ; . . , n}. Hence, the perturbation matrix takes the form given by ω=(0,...,0) , with a i being the derivative of a i defined in (8). Here, ∆ k = (ζ 1 , . . . , ζ n ), with ζ i = s X t β t a i m i .

Perturbation of the Parameter k
In this case, the perturbation scheme consists of changing k by making k i = k/ω i , with ω i > 0. Then, the perturbed log-likelihood is (θ; The column vectors of ∆ can be expressed as ω=(1,...,1) , i ∈ {1, . . . , n}, where ξ i = −k m i and η i = −k d i .

The Adjusted Weibull Quantile Regression
To illustrate the use of the Weibull quantile regression formulated in this paper, we assume Y i ∼ Wei(Q i , k) and that our goal is to model the median. Consider two link functions (logarithm and square root) for a systematic component of the regression model, which are respectively stated as for i ∈ {1, . . . , 41}, where β = (β 0 , β 1 ) is the vector of regression coefficients, and We implement the function quant.weibull.reg() in the R software, which allows us to fit Weibull quantile regression models to a data set, computing information criteria and residuals. To select the best model amongst a set of options, the AIC, BIC, and CAIC can be used. These information criteria assume the existence of an unknown "true model". The AIC chooses the model whose divergence in relation to the "true model" is the minimum within the competing models and may be computed by where ( θ) is the log-likelihood function evaluated at θ = θ and m is the number of parameters of the proposed model, in our case m = p + 1. When the number of parameters is large, the AIC can have a deficient behavior. For this reason, a correction to the AIC is proposed as where n is the sample size. The BIC is another information criterion for model selection based on maximizing the probability of choosing the true model and corresponds to In all these criteria, the best model, among a set of candidates, has the smallest value. Another measure to be employed to choose among competing models is R 2 M , which works similarly to the usual R 2 measure in mean regression and is defined as where (θ) and ( θ) are the maximized log-likelihood for the regression model without any covariate and with all covariates, respectively.
Values of the AIC, BIC, and CAIC defined in (22), (23), and (24), R 2 M stated in (25) and the corresponding log-likelihood functions are reported in Table 10. We conclude that the model with the logarithm link function (L1) should be used to describe the median. Table 10. Values of AIC, BIC, CAIC, and log-likelihood function for Weibull median-regression models with the data of time to electrical breakdown of an insulating fluid. Now, we compare model L1 with the proposed model in [10]. This comparison is not obvious since the construction of both models is different. Then, we compare both models in terms of R 2 M defined in (25) and the pseudo-R 2 proposed for the Koenker-Bassett model [11] given by

Model
where V 1 (q) is the sum of weighted distances for the full qth quantile regression model and V 0 (q) is the sum of weighted distance for the model that includes only an intercept; that is, with no covariates. For our data, using (25) and (26), we obtain R 2 M = 0.71 and R 2 KB = 0.03, allowing us to conclude that our model is a better option for describing this data set.
Another relevant comparison is to consider a GLM-type model based on the Weibull distribution and reparameterized by its mean. We fit this model to our data taking the logarithmic link function. The value of the mean squared prediction error for the Weibull mean regression is 425,939.8, and for the median regression it is 231,845.7, meaning that, in terms of prediction error, our adjusted quantile model outperforms a GLM-type model based on the Weibull distribution. Table 11 reports the maximum likelihood estimates for the model parameters, their approximated standard errors (SEs), and p-values based on the Wald test (described in Section 2). Thus, the predictive model is given by log( Q) = 20.97 − 0.56x, for x > 0. We evaluate the distributional assumption of the model by using the RQ and GCS residuals; that is, r RQ i and r GCS i , respectively. The QQ plots with envelopes for these residuals are presented in Figure 4, where all points are inside the bands. Therefore, r RQ i (a) and r GCS i (b) follow approximately standard normal and standard exponential distributions, respectively. This result allows us to validate that the response variable follows a Weibull distribution.
To compare the Weibull quantile regression model with other direct competing models, we adjust the Birnbaum-Saunders quantile regression model [2,16] with logarithm link function, which considers a response variable with an asymmetric distribution. For the Birnbaum-Saunders model with our data, the CAIC, BIC, and R 2 M values are 334.96, 339.45 and 0.67, respectively. Note that the CAIC and BIC values are greater than the corresponding values for model L1; see Table 10. Furthermore, the value of R 2 M is less than for model L1. The CAIC and BIC values are 531.31 and 535.80 for the normal model given in (1). Additionally, the residual r RQ i has been computed for the Birnbaum-Saunders model, and the QQ plot with envelopes is shown in Figure 4c. We observe that, compared with the QQ plot in Figure 4a, the behavior is less homogeneous around a straight line, and there are points outside of the bands. Therefore, by considering the CAIC, BIC, and QQ plots of residuals, we conclude that the Weibull quantile regression outperforms the Birnbaum-Saunders quantile regression as well.

Local Influence Analysis
Next, we analyze potentially influential cases by their local influence for the Weibull quantile regression with link L1, considering the four perturbation schemes as described in Section 5. In Figure 5, we show the index plots of C i (θ) defined in (20) for each of them. Note that five cases are indicated as potentially influential, namely cases #1, #2, #3, #32, and #33. Observe that the local influence technique detects some atypical cases identified previously. From Figure 5d, note that small values for covariate X influence the estimates. We study the impact on the model inference considering the three cases most repeated in the index plots of Figure 5, which are cases #1, #3, and #33 . The sets of cases {#1}, {#3},  {#33}, {#1, #3}, {#1, #33}, {#3, #33}, and {#1, #3, #33} are removed and the model parameters are re-estimated. To determine the variation in the estimates of model parameters and in the associated SEs, we use the value of the relative changes (RCs) for each component of the parameter vector θ; that is, where θ j(i) and SE( θ j ) (i) denote the maximum likelihood estimates of θ j and the estimated SE of the associated estimator, respectively, obtained after removing case i, for j ∈ {1, 2, 3} and i ∈ {1, . . . , 41}, with θ 1 = β 0 , θ 2 = β 1 , and θ 3 = k. Table 12 reports the values of RCs for the data of time to electrical breakdown of an insulating fluid and the Weibull quantile regression. Note that the largest values of RCs are obtained when we remove cases #1 and #33, with the highest change occurring in the parameter k. The RCs of all parameters show a change close to 20%. However, we do not find any inferential change. Therefore, our study of local influence measures derived in this paper allows us to detect potentially influential cases, but these do not affect the model inference. Thus, the analysis of local influence presented with the data of the time to electrical breakdown of an insulating fluid permits us to conclude that the Weibull quantile regression model is nonsensitive to the atypical cases detected and exhibits an excellent performance to model this data set.

Coefficients across Quantiles
Quantile regression gives us a full description of how the covariates can affect the different values of the response variable. To show this, we consider the model given by log(Q) = β 0 + β 1 x. If the covariate X increases from x 0 to x 1 = x 0 + 1, then the value of modeled quantile changes from Q 0 = exp(β 0 + β 1 x 0 ) to Q 1 = exp(β 0 + β 1 (x 0 + 1)), and then we have (Q 1 − Q 0 )/Q 0 = exp(β 1 ) − 1. Therefore, the coefficient β 1 is related to the percentage of change in the considered quantile when the covariate increases in one unit; see [36]. To illustrate this, we fit the Weibull regression model formulated in (21) considering the quantiles q ∈ {0.1, 0.25, 0.5, 0.75, 0.9}. In addition, we use a procedure to find the optimal value of q, q opt namely, that is, the value of q that maximizes the loglikelihood function. We consider the profile log-likelihood method based on a grid of values of q ∈ {0.01, 0.02, . . . , 0.99}. Then, we estimate the Weibull regression parameters and compute the corresponding log-likelihood function. This procedure has been used in other contexts for Weibull models by [8] (pp. 426-433), where it is called a non-failing algorithm. The results are presented in Table 13. We observe that the covariate has the largest impact on higher levels of the response variable.

Concluding Remarks
This paper has proposed novel quantile regression models for a response variable that follows an asymmetrical behavior based on a new parameterization of the Weibull distribution. We have estimated the new model parameters by using the maximum likelihood method and discussed hypothesis testing based on the Wald and likelihood ratio statistics. In addition, we have used the randomized quantile and generalized Cox-Snell residuals to evaluate the fit of the model. Monte Carlo simulation studies have found that (i) the maximum likelihood estimators are empirically consistent and asymptotically normal distributed (Tables 2-7), and (ii) the randomized quantile and generalized Cox-Snell residuals follow a standard normal distribution and exponential distribution with a parameter equal to one, respectively (Tables 8 and 9). Furthermore, we have derived local influence techniques to analyze the impact of a perturbation on the estimation of model parameters considering four schemes. We have applied the proposed model to a data set related to the time to electrical breakdown of an insulating fluid. The experimental results of this data analysis have shown the excellent performance of the proposed model to the data, making it a better choice than the usual normal regression model and other asymmetrical quantile regression models proposed in the literature. Some observations have been detected as potentially influential cases for our local diagnostic analysis ( Figure 5) but without inferential change. Furthermore, we have studied the impact of the covariates on the quantiles of the response.
This work has evidenced that the new proposed model is helpful for independent data and a response variable with positive support. This new quantile regression model can also be suitable for small samples. However, we remark some limitations of our models and the proposed methodology. For example, diverse phenomena frequently provide other types of data to those analyzed in this study, such as censored, functional, spatial, and temporal data, as well as structures of measurement errors, and partial least squares, all of which are suitable to be studied to increase the predictive capability in the modeling; see [37][38][39][40][41]. Then, it is necessary to formulate new models based on our approach to study these phenomena in such types of data and modeling structures. These structures are not an easy aspect to be explored, especially with spatially correlated data, because new multivariate distributions based on asymmetrical models need to be proposed and parameterized in terms of quantiles; see [38]. Furthermore, our proposal allows likelihood methods to be used, and thus this proposal can be applied to different distributions for modeling data, but adaptations of the corresponding methodology for each of these distributions must be performed.
An idea to enhance the empirical analysis of our proposal involves the following steps. First, consider the covariate with the highest simple correlation coefficient. Second, estimate the slope and intercept parameters in h(Q). Third, taking a quantile as the median, create a data set on y and x stating a table with the observed values of y, the fitted values of y, and the residuals as their difference. Fourth, plot the observed and fitted values against the x values to allow the assessment of the model. In addition, least squares-fitted values can be displayed in the same graphical plot. A one-at-a-time cross-validation separates one observation for prediction from the remaining data, which adds a simple aspect about prediction that is also valuable. Other aspects related to k-fold cross validation are also appealing. Additionally, the relationship between a quantile and the covariates by means of a link function must be evaluated in each case, since it may not be correctly specified, implying extra analyses to achieve a better modeling. Moreover, measures such as the Cook distance and generalized leverage are essential diagnostic aspects of all statistical modeling, and they must be further studied for the newly proposed model. Weibull-type distributions with an extreme value index are widely used in many areas such as environmental sciences, hydrology, and meteorology; see [42]. Our proposed methodology can be adapted to this type of distributions. These and other aspects are part of our ongoing research.