Inverted Weibull Regression Models and Their Applications

: In this paper, we propose the classical and Bayesian regression models for use in conjunction with the inverted Weibull (IW) distribution; there are the inverted Weibull Regression model (IW-Reg) and inverted Weibull Bayesian regression model (IW-BReg). In the proposed models, we suggest the logarithm and identity link functions, while in the Bayesian approach, we use a gamma prior and two loss functions, namely zero-one and modiﬁed general entropy (MGE) loss functions. To deal with the outliers in the proposed models, we apply Huber and Tukey’s bisquare (biweight) functions. In addition, we use the iteratively reweighted least squares (IRLS) algorithm to estimate Bayesian regression coefﬁcients. Further, we compare IW-Reg and IW-BReg using some performance criteria, such as Akaike’s information criterion (AIC), deviance (D), and mean squared error (MSE). Finally, we apply the some real datasets collected from Saudi Arabia with the corresponding explanatory variables to the theoretical ﬁndings. The Bayesian approach shows better performance compare to the classical approach in terms of the considered performance criteria.


Introduction
McCullagh and Nelder [1] published a book on the generalized linear models (GLMs) that led to their widespread use and appreciation. They extended the scoring method to maximum likelihood estimation (MLE) in exponential families. Nelder and Pregibon [2] described methods of jointly estimating the parameters of both the link and variance functions. The iteratively reweighted least squares (IRLS) algorithm is amenable to some statistics and measures that are common to all the GLMs. Nelder and Wedderburn [3] used the Newton-Raphson process for regression coefficients estimates. They reported that the Newton-Raphson process with expected second derivatives is equivalent to the Fisher's scoring technique. Additionally, De Jong and Heller [4] reported that the Newton-Raphson iteration equation leads to a sequence that often rapidly converges. These include the D statistic, along with some specific residuals and influence measures. Yuan and Bentler [5] reported that the convergence properties of the Fisher-scoring algorithm are affected by many factors. One of them is multicollinearity among the observed variables. If the sample or model implied covariance matrix is close to being singular, the Fisher-scoring algorithm may have difficulty reaching a set of converged solutions. Liao [6] introduced a systematic way of interpreting commonly used probability models: logit, probit, and the other GLMs.
The inverse Weibull (IW) model that was derived by Keller and Kamath [7] based on physical considerations on some mechanical components' failures subject to degradation phenomena, assuming that the strength of a component decreases with time with a power law. Calabria and Pulcini [8] proposed the IW distribution as a suitable model to describe mechanical degradation phenomena. They investigated a statistical property of the maximum likelihood estimator of the IW reliability. Jiang et al. [9] derived the Weibull (W) and IW mixture models with a common shape parameter for a system's components. They also used an example to illustrate that the proposed mixture model can be used to approximate Stats 2021, 4 the reliability behaviors of the consecutive-k-out-of-n systems. Mahmoud et al. [10] considered the order statistics arising from IW distribution. They also derived an exact expression for the single moments of order statistics. Pasari and Dikshit [11] investigated the suitability of W distribution in the probabilistic assessment of earthquake hazards. The performance is also compared with two other popular models from the same W family, namely the two-parameter W model and the IW model. Jazi et al. [12] proposed a discrete IW distribution, a discrete version of the continuous IW variable of fitting discrete-time reliability and survival data sets. Kundu and Howlader [13] considered the Bayesian inference and prediction problems of the IW distribution based on Type-II censored data. Qingtian et al. [14] discussed the definition of environmental factors and restricting conditions of constant failure mechanism Based on generally accepted basic hypotheses. They estimated the IW distribution environment factor using the MLE and Bayes estimation methods. Musleh and Helu [15] considered two types of inference procedures: the classical (MLE, Approximate MLE and the least square method (LSE)) and the Bayesians (the squared error loss function (SQR), Linex loss function (LIN), General entropy loss function (GE), the Precautionary loss function (PRE)) to estimate the unknown parameters of the IW distribution when data under consideration are progressively type-II censoring. Akgul et al. [16] used IW distribution for modeling the seasonal wind speed using the modified maximum likelihood (MML) estimators of the parameters. The MML estimators' efficiencies are compared with the well-known maximum likelihood (ML) and the least-squares (LS) estimators via the Monte-Carlo simulation study. Nassar and Abo-Kasem [17] discussed the estimation problem of the unknown parameters of the IW distribution based on adaptive type-II progressively hybrid censored data. They used classical and Bayesian estimation methods to estimate the unknown parameters.
The Bayesian approach to modelling provides an alternative to the standard GLMs. The posterior mode estimation is an alternative to full posterior analysis or posterior mean estimation, which avoids numerical integrations or simulation methods. It has been proposed by many authors; see Reference [18,19]. Dey et al. [20] described how to conceptualize, perform, and critique the traditional GLMs from a Bayesian perspective and how to use modern computational methods to summarize inferences using simulation. Olsson [21] given an overview of the GLMs and has presented practical examples. The exponential family of distributions are discussed along with the maximum likelihood estimation and ways of assessing the fit of the model. Dobson and Barnett [22] presented a theoretical background of the GLMs. For the Bayesian estimation in this context, a useful asymmetric loss function, known as the LINEX loss function, was introduced by Varian (1975) and has been widely used by several authors. A suitable alternative to the modified LINEX loss is the general entropy (GE) loss function proposed by [23]. This loss function is a generalization of the entropy loss function used by several authors [23][24][25]. One highly used one is the zero-one loss function (for more details, see Reference [26]).
In order to reduce the influence of outliers on the estimate, some robust measures were proposed in the literature. The common robust estimation method can be divided into several categories: M, MM, median, L1, Msplit, R, S, least-trimmed squares, and sign-constraint robust least squares estimation. Among these, Huber's M estimation has become one of the main robust estimation methods by virtue of its simple calculation and convenience to implement [27]. The key aspect is the involvement of a loss function that is applied to data errors that was selected to less rapidly increase than the square loss function that is used in least-squares or maximum-likelihood procedures. There exist several well-known families of loss functions, such as Huber, Hampel, and Tukey's biweight (or bisquare) that can be used for the computation of M estimators [28]. The IW distribution is flexible distribution can be used as a competing to gamma and Weibull distributions to describe more widely real life data, failure characteristics, such as infant mortality, useful life and wear-out periods, applications in medicine and ecology, determining the cost-effectiveness, maintenance periods of reliability centered maintenance activities, and biological research (for more details, see Reference [7,9,13,[29][30][31][32][33][34][35]). This paper is structured as follows: In Section 2, we present an overview of the GLMs and propose the IW-Reg model under various link functions for estimating the model parameters. In Section 3, we estimate the IW-Bayesian regression (IW-BReg) model under a gamma prior, various link and two loss functions. In Section 4, we apply the theoretical results of both of IW-Reg and IW-BReg models to some real datasets collected from Saudi Arabia. Next, we investigate the performance of the proposed models in terms of some criteria, such as Akaike's information criterion (AIC), mean squared error (MSE), and deviance (D). In addition, we propose Huber and Tukey's bisquare (biweight) functions to improve IW-BReg models . In addition, we adopt the iteratively reweighted least squares (IRLS) algorithm to estimate the Bayesian regression coefficients. Finally, Section 5 draws a succinct conclusion to the findings.

Classical Approach
Nelder and Wedderburn [36] introduced the class of the GLMs, defined according to the assumption that y 1 , y 2 , ...y n are observations of the response variable, with the density function of y i as follows: where ψ(·), c(·) are known functions, with θ i being the canonical parameter. A link function, g(.), relating to the regression coefficients, is given by where g(µ i ) = θ i , β = (β 1 , ..., β p ) is a vector of p unknown regression parameters, is a vector of explanatory variables, and η i is a linear predictor of the vectors x i and β. Here, the g(.) is a link function, which is a monotonic differentiable invertible function. The model given by (1) and (2) is called the GLM. The GLM class includes, as special cases, linear regression and analysis of variance models, logit and probit models for quantal responses, log-linear models, and multinomial response models for counts; for more details, see Reference [36]. Consider that the probability density function of the IW distribution as follows [37] f (y; α, γ) = αγy −(γ+1) e −αy −γ ; y > 0, α, γ > 0; here, α > 0 and γ > 0 are the shape and scale parameters, respectively. The mean value of the response variable is given by The cumulative function of IW distribution is given by Let y i be a random sample from IW, and α i = µ i Γ 1− 1 γ γ , the log-likelihood function based on y i , is given by The regression coefficients are estimated using the Fisher's scoring technique [3,36]. In order to develop the GLMs for our models, the IW-Reg are similar to GLMs, except that the distribution of the response variable is not a member of the exponential family [38]. We also suggest some convenient link functions of g(.), in view of (2), as in the following lemmas.
Lemma 1 (The IW-Reg model with a log link function). Let the response variable Y have an IW distribution, i = 1, 2, ..., n, and let the link function of the form be Thus, the estimated coefficientsβ = (β 0 ,β 1 , . . . ,β p ) using Fisher's scoring technique at the s th iteration is given bŷ where X is a covariates matrix,β j is an initial vector, W = diag(w 1 , w 2 , . . . , w n ), and Z = (z 1 , z 2 , . . . , z n ), and The procedure in (7) can be repeated until |β (s) −β (s−1) | ≤ ε. The IW-Reg model in this case is given byμ Proof. Suppose that, in y i ∼ f (y i ; α, γ), as in (3), the parameter γ is assumed to be known. The log-likelihood function based on y i , i = 1, 2, ..., n is given as in (5). The link function connecting the µ i with the linear model x i β, in this case, is given as in (6). U r for the log-likelihood is written from one observation as From (5) and (6), we have which can be written in the matrix notation as Taking the second derivatives of l(β), we have where W β is the diagonal matrix of weights, and w i is as it is in (8). Then, From (13) and (14), we have Finally, the estimated coefficientsβ is given bŷ as given in (7).
To derive the MLS of β, the IRLS is used. Under certain regularity conditions on the likelihood function, the MLEβ (s) are asymptotically normal, unbiased, and efficient, with covariance matrix equal to the inverse of Fisher's information matrix [39]. Thus,β has asymptotically normal distribution, where (X WX) −1 is the inverse of Fisher's information matrix.
Lemma 2 (The IW-Reg model with identity link function). Let the response variable Y have an IW distribution, i = 1, 2, ..., n, and let the link function of the form be Thus, the estimated coefficientsβ = (β 0 ,β 1 , . . . ,β p ) using Fisher's scoring technique at the s th iteration is given bŷ where X is a covariates matrix,β j is an initial vector, W = diag(w 1 , w 2 , . . . , w n ), and Z = (z 1 , z 2 , . . . , z n ), and The procedure in (17) can be repeated until |β (s) −β (s−1) | ≤ ε. The IW-Reg model in this case is given byμ Proof. Suppose that, in y i ∼ f (y i ; α, γ), as in (3), the parameter γ is assumed to be known. The log-likelihood function based on y i , i = 1, 2, ..., n is given as in (5). The link function connecting the µ i with the linear model x i β, in this case, is given in (16). From (5), (11), and (16), we have which can be written in the matrix notation as in (13). Taking the second derivatives of l(β), we have and I β is given as in (14), where W β is the diagonal matrix of weights, and w i is as it is in (18). Using (15), we haveβ (s) as in (17).
Lemma 3 (Convergence estimates in the Fisher's scoring process). Let the response variable Y have an IW distribution, and let the link function of the form be g(µ i ) = η i = x i β i = 1,2,...,n, the estimated coefficientsβ = (β 0 ,β 1 , . . . ,β p ) using Fisher's scoring technique at the s th iteration is given byβ where X is a covariate matrix, W is the diagonal matrix of weights, and Z is a vector of the response variable.
Proof. Suppose that, in y i ∼ f (y i ; α, γ), as in (3), the parameter γ is assumed to be known. The log-likelihood function based on y i , i = 1, 2, ..., n is given as in (5). Furthermore, suppose that the link function g(µ i ) connecting the µ i with the linear model x i β is given as in (2). The Fisher's scoring process to obtain the MLEs estimatesβ is given by computing the iterations:β where U β (s−1) is the score vector for the log-likelihood (5), and I β (s−1) is the Fisher's information matrix. Taking the expectation of the Equation (21), we have since the estimatesβ (s−1) are the MLEs, and Using the Chebyshev inequality, for every ω > 0, we find Now, by the Jensen inequality, Let ω = ε and, using (25), we then obtain since E β (s) −β (s−1) = 0. On the other hand, by choosing ω = ε n into the Equation (26), this becomes p β (s) −β (s−1) ≤ 0 = 1 as n → ∞.

Bayesian Approach
Diaconis and Ylvisker [40] introduced a conjugate prior distribution for the exponential family, which, as in (1), can be shown as where k 1 is a normalization constant, and m, µ 0 are natural parameters. The θ i values are connected to the regression coefficients by the link function η i = x i β as The posterior distribution of θ i is given by Das and Dey [41] suggested a Jacobian transformation and rewrote (29) with the term η i , as where k 2 is a normalization constant, and They used a zero-one loss function to attain the posterior mode of (30) asη i = h(y i ); hence, the estimated coefficientŝ whereβ * is the least square estimates, andη = (η 1 ,η 2 , ...,η n ) [40,41]. Under regularity conditions, the estimatorβ * has a asymptotically normal distributionβ ≡ N β, F −1 , where F −1 is the inverse of Bayesian Fisher's information (BIF). Note that the BFI is given by where π(η|y) is the posterior pdf of η [39,42,43].
In order to develop the Bayesian approach, we propose a modified loss function of the general entropy (MGE) loss function to be appropriate for the Bayesian estimates. The MGE loss function is introduced in the following lemma.

Lemma 4 (The MGE loss function).
Consider that the posterior distribution of the η i is π(η i |y i ), η i is an estimate of η i , and y i , i = 1, 2, ..., n are independent observations. A suitable alternative loss function to the GE loss is the MGE loss function, given as Thus, the posterior Bayes estimates of η i is given by solving the equation Proof. Consider the MGE loss function is given as in (32). The posterior expectation of the loss function with respect to the posterior π(η i |y i ) is given as The value ofη i that minimizes the posterior expectation of the MGE loss function is obtained by solving the equation as given in (33).
In order to develop a Bayesian approach, we suggest inverted Weibull Bayesian generalized linear models (IW-BReg) that are similar to the approach in Section 3, except that the distribution of the response variable is not a member of the exponential family. We use the general form of the posterior in (30), and since g(·) is a monotonic differentiable function, then we attain the posterior Bayes estimates. Moreover, we use the a log and identity link functions with different loss functions. The IW-BReg estimates correspond to the link functions using different loss functions, as in the following lemmas.
Lemma 5 (The IW-BReg model based on zero-one loss function). Let the response variable Y have an IW distribution and let the link function of the form be as in (2). Consider that α has a gamma prior G(ν, λ) with the following density function: Thus, the posterior mode of η i by using a zero-one loss function can be derived by solving the following equation: where g * (η i ) = µ i , η i is defined as in (6) and (16). The estimated coefficientsβ * are given as in (31). The IW-BReg model in this case is given bŷ Proof. Suppose that y i ∼ f (y i ; α, γ) is as it is in (3), the parameter γ is assumed to be known, and the density function of y i is given by Consider a gamma prior for α i , which can be written as in (34). The posterior distribution of α i is given by Using Jacobian transformation from α i to η i , we have Taking the derivative of the log posterior, we have hence, we get the equation as in (35), and the posterior mode of η i is given by solving it.
Lemma 6 (The IW-BReg model based on MGE loss function). Let the response variable Y have an IW distribution, and let the link function of the form be as given in (2). Consider that α i has a gamma prior with a density function as given in (34). Thus, the posterior Bayes estimates of η i , by using an MGE loss function, can be derived by solving the equation where g * (η i ) = µ i , η i is defined as in (6) and (16). The estimated coefficientsβ * are given as in (31). The IW-BReg model in this case is given bŷ Proof. Suppose that y i ∼ f (y i ; α, γ) is as it is in (3). The parameter γ is assumed to be known, and the density function of y i is given as in (37). Consider the gamma prior for α i , which can be written as given in (34). Using the posterior distribution of η i that is given in (38), we have hence, Using Lemma (4), we have Thus, the posterior Bayes estimates of η i by using the MGE loss function can be derived by solving the Equation (39).

Data Analysis
In this section, we show the usefulness and performance of the IW-Reg and IW-BReg models by applying the theoretical findings in Sections 2 and 3 to some real datasets. For simplicity, we use the following notations for the proposed models used throughout the applications.

Model Description
Model Dataset in this application was collected from the meteorology station at King Khalid International Airport, Saudi Arabia during (2014)(2015)(2016)(2017)(2018). This data contains 54 observations (monthly data), in which the response variable Y be the minimum of dry bulb temperatures in Celsius. The explanatory variables are; X 1 ; mean of relative humidity, X 2 ; mean of vapor pressure (mm), X 3 ; mean of sky cover oktes, X 4 ; maximum of station-level pressure (mm).
In order to aid in distributional assessment of the response variable Y, the empirical cumulative distribution function (ECDF) plot was proposed. Kolmogorov-Smirnov goodness of fit test (K-S) was calculated based on IW, Gaussian, and gamma distributions. The IW-Reg and IW-BReg models based on log and identity link, and loss functions were fitted using the proved Lemmas in Sections 2 and 3. Bayes coefficients were obtained using a gamma prior G(ν, λ) with some known values of the hyperparameters ν and λ. In addition, Huber's function was suggested to avoid such distortions due to an outlier in X 2 ; see Appendix A. In this case, under regularity conditions, estimatorβ * has asymptotically normal distributionβ * ≡ N β * , (X WX) −1 [39,42]. The performance of all these models were compared. Modeling performance is measured in terms of some criteria, such as AIC, D, D/df, and MSE [4]. We also used Thiel's inequality coefficient TIC = to compare the prediction accuracy of the selected models [44,45]. The backward-selection method was used in the IW-BReg model to select the best fit in view of the covariates.
To check the adequacy for the selected models, we consider Pearson residuals [36,40]. R software was used to carry out calculations. In order to compare with known distributions, the glm() function in "stats" was used to fit the GLMs [46]. Functions qqPlot(), ecdf(), boxplot, and ks.test() in R package "stats" were used for the assessment distributions [47]. To solves n roots of n nonlinear equations in Section 3, the function multiroot() in R package "rootSolve" was used [48]. The fitting results and the relative errors (RE) of the selected model, and other numerical results are shown in Tables 1-3.
Based on the results obtained from K-S test, the p-value = 0.315 for the test indicates that the IW distribution fits the response variable in the given data quite well. Figure 1 provides the ECDF plot, and it is clear that the IW distribution fits these data well.
To compare between the Bayesian fitting results, we observe that the results based on MGE loss function are better than zero-one loss function. Table 1 shows that the IW-BReg models based on MGE loss function (Model V and VI) are good in terms of MSE, AIC, and D statistics. Table 1 also shows that the D/d f of IW-Reg and IW-BReg models (I, II, III, IV, V, and VI) are less than 1, indicating that the fitting degree is very good. If the model is correct, the Pearson residuals and Pearson statistics Q = ∑ 54 i=1 r 2 i have an approximately normal distribution with mean 0 and chi-square distribution χ 2 n−k , respectively. For the IW-BReg model based on identity link and MGE loss function (Model V), the Pearson statistics is Q = 4.0396, the p-value for Anderson-Darling is 0.0001, and the Cox Stuart test is 1, so the Pearson residuals are not normal but randomly scattered around zero at the level of significant α = 0.05. For the IW-BReg model based on log link and MGE loss function (Model VI), the Pearson statistics is Q = 0.4460, the p-value for Anderson-Darling is 0.06379, and the Cox Stuart test is 1, so the Pearson residuals are normal and randomly scattered around zero.  Based on this analysis, we conclude that the Model VI is more appropriate for fitting these data, leading to the following equation For the backward selection method results, Table 2, we can conclude that the predictive model is given as follows: We also can see that, this model has AIC = 364.3539 and a low MSE = 2.3463, and there was also a significant relationship among variables when using level of significance α = 0.05. For the residuals, the Pearson statistics is Q = 0.4520, p-value for Anderson-Darling is 0.0443 and for the Cox Stuart test is 1.
Because of the presence of an outlier, we can conclude that the Model VI based on Huber's function is the best for our data, and it is given as follows: From Table 2, we can see, this model has AIC = 363.2006 and a low MSE = 2.3451, and there was also a significant relationship among variables when using the level of significance α = 0.05. For the residuals, the Pearson statistics is Q = 0.4532, the p-value for Anderson-Darling is 0.052, and the Cox Stuart test is 1. Hence, the Pearson residuals are normal randomly scattered around zero; see Figure 2. The fitting results for this model during the year 2014 are shown in Table 3. We can also see that the fitting accuracy is good because the TIC value is closer to 0 than 1.  The dataset in this application was taken again from the meteorology station at King Khalid International Airport, Saudi Arabia, in 2016. This data contains 91 observations, during 7 June and 5 September, (summer season), in which the response variable Y be the mean wind speed (km/h). The explanatory variables are; X 1 ; maximum's wind direction, X 2 ; maximum of station-level pressure (mm), X 3 ; mean of sea-level pressure (mm), X 4 ; mean of dry bulb temperatures of air (Celsius), X 5 ; mean of wet bulb temperatures (Celsius), X 6 ; mean of relative humidity, X 7 ; mean of vapor pressure (mm), X 8 ; mean of sky cover oktes, X 9 ; maximum of station-level pressure (mm), X 10 ; maximum of sea-level pressure (mm), X 11 ; maximum of dry bulb temperatures (Celsius), X 12 ; maximum of the wet bulb temperatures (Celsius), X 13 ; maximum of relative humidity, X 14 ; minimum of station-level pressure (mm), X 15 ; minimum of sea-level pressure (mm), X 16 ; minimum of dry bulb temperatures, X 17 ; minimum of the wet bulb temperatures, X 18 ; minimum of relative humidity, X 19 ; time of maximum daily wind (HH:MM).
Proceeding similarly, as in Application 1 to aid in the distributional assessment. In this dataset, we identify the outliers, different plots as the quantile-quantile (Q-Q) plot, ECDF, and box plot were proposed. Again, Lemmas in Sections 2 and 3 were applied to these data to fit the IW-Reg based on log and identity link functions were used. Besides being an alternative analysis, the IW-BReg models were obtained using a log, identity link, and a gamma prior with known hyperparameters ν and λ parameters. We also compare the performance of all these models. In addition, biweight function was suggested to avoid such distortions due to outliers; see Appendix A. In this case, under regularity conditions, estimatorβ * has asymptotically normal distributionβ * ≡ N β * , (X WX) −1 [39,42]. The modeling performance was measured in terms of some criteria, such as AIC, D, D/df, and MSE [4]. We also used Theil's Inequality coefficient (TIC) to measure the prediction accuracy of the selected models [44,45]. To compare the residual for all models, we consider Pearson residuals to check the adequacy of the regression model fitted to the data [36,40].
Furthermore, to detect the influential cases, we use the Cook's distance measure using in the case of Bayesian analysis [40,49]. The backward selection method was used in the IW-Reg model to remove the input variable; see Table 4. R software was used to carry out the calculations. In order to compare with known distributions, the function glm in "stats" is used to fit the GLMs. The functions qqPlot, ecdf, boxplot, and ks.test in the R package "stats" are used for the assessment distributions [47]. To solves n roots of n nonlinear equations in Section 3, the function multiroot() in R package "rootSolve" was used [48]. The fitting, predictive results of these models and the other numerical results are shown on the Tables 4-8.
Based on the results obtained from K-S test, the p-value = 0.139 for the test indicates that the IW distribution fits the response variable in the given data quite well. Figure 3 provides the Q-Q plot and ECDF, and it is clear that the IW distribution fits these data well. Figure 4 provides box plot corresponding to the mean wind speed variable Y, and this chart mapped one outlier (leverage point) that exceeds the values of Q 3 .  From Table 4, we can observe that the variables X 2 , X 5 , X 8 , and X 16 are significant for the model, so there is a significant relationship among variables. In these models, β (s) is stabilizes when the Fisher's scoring procedure is converged at s = 6 and s = 15, respectively, because of |β (s) −β (s−1) | < ε. To compare the Bayesian fitting results we observe that the results based on MGE loss function (Model V and VI) better than zero-one loss function (Model III and IV); see Table 5. Table 5 also shows that the D/d f of the models I, II, III, IV, V, and VI are less than 1, indicating that the fitting degree is very good.  Based on this analysis, we also conclude that the Model VI is more appropriate for fitting these data, leading to the following equation For the residuals, the Pearson statistics is Q = 0.1032, p-value for Anderson-Darling is 0.0496, and for the Cox Stuart test is 1; see Table 5. This residuals have a large positive residual at the observation 91. However, for the model, this case is non-influential according to C 91 = 0.1065 < F (0.05,1,n−k) where F (0.05,1,n−k) corresponding to upper α-percentile from the F distribution [50]. Because of the presence of an outlier, we can conclude that the Model VI based on biweight function is the best for our data, and it is given as follows: From Table 6, we can see that this model has AIC = 446.515 and a low MSE = 3.046, and there was also a significant relationship among variables when using the level of significance α = 0.05. For the residuals, the Pearson statistics is Q = 0.1049, the p-value for Anderson-Darling is 0.0612, and the Cox Stuart test is 1. Hence, the Pearson residuals are normal randomly scattered around zero at the level of significant α = 0.05; see Table 7 and  Table 8. We can also see that the prediction accuracy is good because the TIC value is closer to 0 than 1.   Table 6.

Conclusions
In this paper, the regression models IW-Reg and IW-BReg for modeling Saudi datasets are considered. Zero-one and MGE loss functions were used to attain the Bayesian estimates based on a log and identity functions. In the classical approach, parameter estimation is done by the Fisher's scoring technique, and closed-form expressions are provided for the score function, and for Fisher's information matrix and its inverse. In the Bayesian approach, parameter estimation is performed using a gamma prior distribution, Jacobian transformation, and least-squares estimates. The IW-Reg and IW-BReg models were compared to find which model predicted better. To deal with outlier problems, IW-BReg based on Huber's and biweight functions, and the adopted algorithm based on IRLS to find the estimates, were proposed. For distributional assessment, Q-Q, ECDF, box plots, and the K-S test were applied. Some criteria, namely AIC, D, D/df, and MSE, were also computed for all regression models.
According to the results of the Application (1), the IW-BReg model based on Huber's and MGE loss with a log link function, performed the best in terms of the AIC, D, D/df, and MSE statistics, so it is recommended for these data. In contrast, the IW-Reg model showed poor results compared with those of the other models. Results indicated that the IW-BReg model based on Huber's and MGE loss is highly capable of improving regression models' performance to a greater extent in predicting the minimum of dry bulb temperatures (Celsius) in Saudi Arabia. It is found the following regressors are significant for the model: Explanatory variables are: X 1 , the mean of relative humidity, X 2 , the mean of vapor pressure (mm); and X 4 , the maximum of station-level pressure (mm). Application (2), the IW-BReg model based on biweight function and MGE loss with a log link function, performed the best in terms of the AIC, D, D/df, and MSE statistics, so it is recommended for these data. In contrast, IW-Reg and IW-BReg based on zero-one loss function showed poor results than those of the other models. Finally, the results in this application indicated that the IW-BReg model based on biweight function and MGE loss with a log link function is highly capable of improving regression models' performance to a greater extent in predicting the mean wind speed (km/h) in Saudi Arabia. It is found the following regressors are significant for the model: Explanatory variables are: X 2 , the mean of station-level pressure (mm); X 5 , the mean of wet-bulb temperatures (Celsius); X 8 , the mean of sky cover oktes; and X 16 , the minimum of dry bulb temperatures. From these discussions, we conclude that IW-BReg model based on log link and MGE loss has good performance for the response variables in the considered applications.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this article was taken from the National Center for Meteorology in Saudi Arabia on the link https://ncm.gov.sa/Ar/About/Branches/Pages/default. aspx, accessed on 7 December 2020.

Acknowledgments:
The authors would like to thank the editor and referees for their helpful comments, which improved the presentation of the paper. In addition, the authors would like to extend their sincere appreciation to the Deanship of Scientific Research at King Saud University for its funding this Research Group (RG-1435-056).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Robust IW-BReg Models
M-estimation is considered to be the most common method of robust regression. It was proposed by [51,52] in the presence of outliers, and it is more efficient than ordinary least squares (OLS) [53,54]. The Huber's function takes the following form: where k is the tuning constant, r is the residual corresponding to the observation in OLS, and ρ(·) is the objective function that satisfies certain properties. Often, ρ(·) can be formed by using a linear combination of the residuals. Defining function ∂ ∂r ρ(r) and the corresponding weight function in this case is as follows: ψ(r) r = w(r) = 1, |r| ≤ k, k |r| , |r| > k.
Another M-estimation function is the Tukey bisquare (biweight) function. This is based on Tukey's function, taking the form of that in Reference [28] ρ(r) = k 2 where k is the tuning constant, and r is the residual corresponding to the observation in OLS. Defining function ∂ ∂r ρ(r) = ψ(r) and the corresponding weight function in this case is given as follows: To make the IW-BReg models are robust, we suggest Huber's and biweight functions for these models. There are also many other versions of the M-estimation function that could be used here. Let the response variable Y have an IW distribution, and let the link function of the form be as given in (2). Consider that α has a gamma prior with density function is given as in (34). Using the Jacobian transformation from α i to η i and using the link function, we have the posterior distribution of η i is given as in (38). Thus, the estimated coefficientŝ β * = (β * 0 ,β * 1 , ...,β * p ) are given aŝ β * (q) = X W * β * (q−1) X −1 X W * β * (q−1) η, q = 1, 2, 3, ..., whereη i = h(y) andη = (η 1 ,η 2 , ...,η n ) are the posterior Bayes estimates of η i using the zero-one or MGE loss functions, and W * = diag(w * 1 , w * 2 , . . . , w * n ), w i are the selected weights depending on M-estimation functions. In this case, coefficients are estimated using an adopted IRLS algorithm [27,[55][56][57] as follows: i.
ii. The initial residuals r * (q) ) are based on the link function that is given as in (2), and calculate an initial scale estimate s * (q) = 1.4826(median|r * (q) i |).
iii. An initial standardized residuals u * (q) i = r * (q) i s * (q) are calculated and used to calculate initial estimates for the weight function. Preliminary weights are w * (q) i = w(u * (q) i ).
Using weights from Steps i-iii and Step iv to find estimators in (A5).