4.1. Application 1: (The Minimum Temperatures)
Dataset in this application was collected from the meteorology station at King Khalid International Airport, Saudi Arabia during (2014–2018). This data contains 54 observations (monthly data), in which the response variable Y be the minimum of dry bulb temperatures in Celsius. The explanatory variables are; ; mean of relative humidity, ; mean of vapor pressure (mm), ; mean of sky cover oktes, ; maximum of station-level pressure (mm).
In order to aid in distributional assessment of the response variable
Y, the empirical cumulative distribution function (ECDF) plot was proposed. Kolmogorov–Smirnov goodness of fit test (K–S) was calculated based on IW, Gaussian, and gamma distributions. The IW-Reg and IW-BReg models based on log and identity link, and loss functions were fitted using the proved Lemmas in
Section 2 and
Section 3. Bayes coefficients were obtained using a gamma prior
with some known values of the hyperparameters
and
. In addition, Huber’s function was suggested to avoid such distortions due to an outlier in
; see
Appendix A. In this case, under regularity conditions, estimator
has asymptotically normal distribution
[
39,
42]. The performance of all these models were compared. Modeling performance is measured in terms of some criteria, such as AIC, D, D/df, and MSE [
4]. We also used Thiel’s inequality coefficient
to compare the prediction accuracy of the selected models [
44,
45]. The backward-selection method was used in the IW-BReg model to select the best fit in view of the covariates.
To check the adequacy for the selected models, we consider Pearson residuals [
36,
40]. R software was used to carry out calculations. In order to compare with known distributions, the glm() function in “stats” was used to fit the GLMs [
46]. Functions qqPlot(), ecdf(), boxplot, and ks.test() in R package “stats” were used for the assessment distributions [
47]. To solves
n roots of
n nonlinear equations in
Section 3, the function multiroot() in R package “rootSolve” was used [
48]. The fitting results and the relative errors (RE) of the selected model, and other numerical results are shown in
Table 1,
Table 2 and
Table 3.
Based on the results obtained from K–S test, the
p-value = 0.315 for the test indicates that the IW distribution fits the response variable in the given data quite well.
Figure 1 provides the ECDF plot, and it is clear that the IW distribution fits these data well.
To compare between the Bayesian fitting results, we observe that the results based on MGE loss function are better than zero-one loss function.
Table 1 shows that the IW-BReg models based on MGE loss function (Model V and VI) are good in terms of MSE, AIC, and D statistics.
Table 1 also shows that the
of IW-Reg and IW-BReg models (I, II, III, IV, V, and VI) are less than 1, indicating that the fitting degree is very good. If the model is correct, the Pearson residuals
and Pearson statistics
have an approximately normal distribution with mean 0 and chi-square distribution
, respectively. For the IW-BReg model based on identity link and MGE loss function (Model V), the Pearson statistics is
, the
p-value for Anderson-Darling is 0.0001, and the Cox Stuart test is 1, so the Pearson residuals are not normal but randomly scattered around zero at the level of significant
. For the IW-BReg model based on log link and MGE loss function (Model VI), the Pearson statistics is
, the
p-value for Anderson-Darling is 0.06379, and the Cox Stuart test is 1, so the Pearson residuals are normal and randomly scattered around zero.
Based on this analysis, we conclude that the Model VI is more appropriate for fitting these data, leading to the following equation
For the backward selection method results,
Table 2, we can conclude that the predictive model is given as follows:
We also can see that, this model has AIC = 364.3539 and a low MSE = 2.3463, and there was also a significant relationship among variables when using level of significance
. For the residuals, the Pearson statistics is
,
p-value for Anderson-Darling is 0.0443 and for the Cox Stuart test is 1.
Because of the presence of an outlier, we can conclude that the Model VI based on Huber’s function is the best for our data, and it is given as follows:
From
Table 2, we can see, this model has AIC = 363.2006 and a low MSE = 2.3451, and there was also a significant relationship among variables when using the level of significance
. For the residuals, the Pearson statistics is
, the
p-value for Anderson-Darling is 0.052, and the Cox Stuart test is 1. Hence, the Pearson residuals are normal randomly scattered around zero; see
Figure 2. The fitting results for this model during the year 2014 are shown in
Table 3. We can also see that the fitting accuracy is good because the TIC value is closer to 0 than 1.
4.2. Application 2: (Wind Speed Data)
The dataset in this application was taken again from the meteorology station at King Khalid International Airport, Saudi Arabia, in 2016. This data contains 91 observations, during 7 June and 5 September, (summer season), in which the response variable Y be the mean wind speed (km/h). The explanatory variables are; ; maximum’s wind direction, ; maximum of station-level pressure (mm), ; mean of sea-level pressure (mm), ; mean of dry bulb temperatures of air (Celsius), ; mean of wet bulb temperatures (Celsius), ; mean of relative humidity, ; mean of vapor pressure (mm), ; mean of sky cover oktes, ; maximum of station-level pressure (mm), ; maximum of sea-level pressure (mm), ; maximum of dry bulb temperatures (Celsius), ; maximum of the wet bulb temperatures (Celsius), ; maximum of relative humidity, ; minimum of station-level pressure (mm), ; minimum of sea-level pressure (mm), ; minimum of dry bulb temperatures, ; minimum of the wet bulb temperatures, ; minimum of relative humidity, ; time of maximum daily wind (HH:MM).
Proceeding similarly, as in Application 1 to aid in the distributional assessment. In this dataset, we identify the outliers, different plots as the quantile-quantile (Q-Q) plot, ECDF, and box plot were proposed. Again, Lemmas in
Section 2 and
Section 3 were applied to these data to fit the IW-Reg based on log and identity link functions were used. Besides being an alternative analysis, the IW-BReg models were obtained using a log, identity link, and a gamma prior with known hyperparameters
and
parameters. We also compare the performance of all these models. In addition, biweight function was suggested to avoid such distortions due to outliers; see
Appendix A. In this case, under regularity conditions, estimator
has asymptotically normal distribution
[
39,
42]. The modeling performance was measured in terms of some criteria, such as AIC, D, D/df, and MSE [
4]. We also used Theil’s Inequality coefficient (TIC) to measure the prediction accuracy of the selected models [
44,
45]. To compare the residual for all models, we consider Pearson residuals to check the adequacy of the regression model fitted to the data [
36,
40].
Furthermore, to detect the influential cases, we use the Cook’s distance measure using the formula
and
in the case of Bayesian analysis [
40,
49]. The backward selection method was used in the IW-Reg model to remove the input variable; see
Table 4. R software was used to carry out the calculations. In order to compare with known distributions, the function glm in “stats” is used to fit the GLMs. The functions qqPlot, ecdf, boxplot, and ks.test in the R package “stats” are used for the assessment distributions [
47]. To solves
n roots of
n nonlinear equations in
Section 3, the function multiroot() in R package “rootSolve” was used [
48]. The fitting, predictive results of these models and the other numerical results are shown on the
Table 4,
Table 5,
Table 6,
Table 7 and
Table 8.
Based on the results obtained from K–S test, the
p-value = 0.139 for the test indicates that the IW distribution fits the response variable in the given data quite well.
Figure 3 provides the Q-Q plot and ECDF, and it is clear that the IW distribution fits these data well.
Figure 4 provides box plot corresponding to the mean wind speed variable
Y, and this chart mapped one outlier (leverage point) that exceeds the values of
.
From
Table 4, we can observe that the variables
,
,
, and
are significant for the model, so there is a significant relationship among variables. In these models,
is stabilizes when the Fisher’s scoring procedure is converged at
and
, respectively, because of
. To compare the Bayesian fitting results we observe that the results based on MGE loss function (Model V and VI) better than zero-one loss function (Model III and IV); see
Table 5.
Table 5 also shows that the
of the models I, II, III, IV, V, and VI are less than 1, indicating that the fitting degree is very good.
Based on this analysis, we also conclude that the Model VI is more appropriate for fitting these data, leading to the following equation
For the residuals, the Pearson statistics is
,
p-value for Anderson-Darling is 0.0496, and for the Cox Stuart test is 1; see
Table 5. This residuals have a large positive residual at the observation 91. However, for the model, this case is non-influential according to
where
corresponding to upper
-percentile from the F distribution [
50].
Because of the presence of an outlier, we can conclude that the Model VI based on biweight function is the best for our data, and it is given as follows:
From
Table 6, we can see that this model has AIC = 446.515 and a low MSE = 3.046, and there was also a significant relationship among variables when using the level of significance
. For the residuals, the Pearson statistics is
, the
p-value for Anderson-Darling is 0.0612, and the Cox Stuart test is 1. Hence, the Pearson residuals are normal randomly scattered around zero at the level of significant
; see
Table 7 and
Figure 5. This Figure shows no large positive residual. The fitting and predicted results for this model during 2016 and 2017 are shown in
Table 8. We can also see that the prediction accuracy is good because the TIC value is closer to 0 than 1.