Modified Chi-Squared Goodness-of-Fit Tests for Continuous Right-Skewed Response Generalized Linear Models

Vilijandas Bagdonavičius; Rūta Levulienė

doi:10.3390/math13162659

and

Institute of Applied Mathematics, Vilnius University, 03225 Vilnius, Lithuania

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics2025, 13(16), 2659;https://doi.org/10.3390/math13162659

This article belongs to the Special Issue Computational Statistics and Data Analysis, 2nd Edition

Version Notes

Order Reprints

Abstract

Generalized linear models are applied for data analysis in various areas. One of the most important steps in fitting the model is to check the goodness-of-fit; however, there is a lack of such tests. Modified chi-squared goodness-of-fit tests for generalized linear models were constructed. Models with continuous right-skewed, possibly censored responses were considered. Explicit formulas of test statistics are provided in the case of gamma and inverse Gaussian models. The test power was investigated by simulation. The article presents real data examples to illustrate the application of tests.

Keywords:

generalized linear models; gamma regression; inverse Gaussian regression; chi-squared test; goodness-of-fit; censoring

MSC:

62G10; 62J12; 62N01

1. Introduction

Generalized linear models (GLMs) [1] are among the most commonly used regression models in practice. The most frequently applied continuous GLM models are normal (Gaussian), gamma, and inverse Gaussian. The gamma and inverse Gaussian regression models are used to model right-skewed response variables, for example, modeling the lifetime distribution in reliability theory [2,3], claims prediction and premium computations in insurance [4,5,6], healthcare costs analysis [7,8], and estimation of outcomes in psychology [9].

The Gaussian GLM coincides with the most applied normal regression model, and the theory of normal regression can be found in an enormous number of articles and books on theoretical or applied statistics.

The fitting of a regression model consists of a set of steps, and one of the key steps is to check the goodness-of-fit (GOF). However, there is a lack of such tests, especially for continuous GLM such as gamma and inverse Gaussian regression. Just a few articles consider formal tests.

In many textbooks, the chi-squared approximation of Pearson and deviance statistics is recommended to test the gamma and inverse Gaussian regression models’ fit. It can lead to erroneous conclusions because this approximation is true if the shape parameter is large. It is clearly demonstrated in [10,11], for example. In [11], the authors propose approximations of Pearson and deviance statistics quantiles for the gamma regression model. Unfortunately, these approximations are given only in the case of a known shape parameter

ν

. The case of unknown

ν

is not investigated, so these results can not be used for goodness-of-fit. In [10], GOF tests for gamma and inverse Gaussian models are proposed by applying modifications of Cramer–von Mises and Anderson–Darling statistics. These statistics are computed using transformations of the responses via parametric estimates of their cumulative distribution functions (c.d.f.) and the inverse of the c.d.f. of the standard normal distribution. The theory is not developed rigorously: the asymptotic distributions of the test statistics are not found, and approximations of the distributions of the test statistics for finite sample sizes are not given. These tests can not be applied if the data are censored.

The score test for inverse Gaussian regression against inverse Gaussian mixture was constructed in [12]. The authors considered cases with complete and censored data, and critical values were obtained using the bootstrap. The disadvantage is that this test is not the omnibus test. It is recommended only in the case when only one of the possible alternatives (mixture) is suspected.

The current paper is a natural continuation of our paper [13]. In [13], modified chi-square tests were constructed for parametric accelerated failure time (AFT) models (see also [14]). To obtain these tests, at first, some asymptotic results for general parametric regression models (the AFT models being particular cases of these models) were rigorously obtained. In particular, asymptotic properties of the random vector of differences between the numbers of observed and “expected” failures in the intervals of a data partition (the partition is received using a uniquely defined rule) are derived. Application of general results for the following AFT models was considered: exponential, shape-scale (Weibull, log-normal, and log-logistics).

In the current article, we apply the general theorems of our paper [13] to obtain modified chi-squared goodness-of-fit tests for continuous right-skewed possibly censored GLM models.

The inverse Gaussian regression is GLM but not the AFT; thus, new tests are needed. The gamma regression is GLM, and it is also the AFT model; however, the article [13] on GOF for AFT models did not consider this model. Thus, we are currently investigating it.

Tests for gamma and inverse Gaussian regression models were investigated in detail. The Gaussian GLM by exponential transformation is transferred to log-normal AFT, which is considered in [13]. We did not write the formulas for this model because GOF tests for the normal regression are well-known and investigated in many papers.

Some authors consider diagnostic plots based on residuals for the gamma and inverse Gaussian models, but they are not formal GOF tests, so they can not be compared with the proposed tests because their significance and power can not be investigated. However, diagnostic graphs are useful at the initial stage of analysis, and in conjunction with formal GOFs, provide a broader view of data. The authors [15] proposed two new methods for the detection of influential observations in the case of the inverse Gaussian regression, and also presented a review of existing methods. In the article [16], adjusted deviance residuals for the gamma regression model were proposed and used for influence diagnostics. The construction of partial residuals for the inverse Gaussian regression was carried out in [17] for graphical model diagnostics.

The structure of the article is as follows: firstly (see Section 2), continuous GLMs are discussed; furthermore, in Section 3, the methodology of the modified chi-squared test is provided, the approach of choosing grouping intervals is explained, and the limit distribution of the test statistic is obtained. The results of the simulation study and the application for real data are presented in Section 4 and Section 5, respectively.

2. Gamma and Inverse Gaussian Regression Models

Let us consider the parametrization of the gamma distribution, denoted by

Γ (ν, μ)

,

ν > 0

, and

μ > 0

, with the following probability density function (p.d.f.):

f (t, ν, μ) = \frac{ν^{ν}}{μ^{ν} Γ (ν)} t^{ν - 1} exp {- (ν / μ) t}, t > 0,

(1)

where

ν

is the shape parameter.

If T is a random variable with distribution

Γ (ν, μ)

, then the mean and the variance are

E (T) = μ, Var (T) = μ^{2} / ν,

and the cumulative distribution function (c.d.f.) is

F (t, ν, μ) = F_{χ_{2 ν}^{2}} (2 ν t / μ) = \frac{1}{Γ (ν)} γ (ν, \frac{ν t}{μ}),

where

F_{χ_{2 ν}^{2}}

is the c.d.f. of the chi-squared distribution with

2 ν

degrees of freedom;

γ (s, x) = \int_{0}^{x} u^{s - 1} e^{- u} d u,

i.e., the lower incomplete gamma function.

Let us consider the parametrization of the inverse Gaussian distribution (also known as Wald distribution), denoted by

I G (ν, μ)

,

ν > 0

,

μ > 0

, with the following p.d.f.:

f (t, ν, μ) = \sqrt{\frac{ν}{2 π t^{3}}} exp \{- \frac{ν {(t - μ)}^{2}}{2 μ^{2} t}\}, t > 0,

(2)

where

ν

is the shape parameter. If T is a random variable with distribution

I G (ν, μ)

, then the mean and the variance are

E (T) = μ, Var (T) = μ^{3} / ν,

and the c.d.f. is

F (t, ν, μ) = Φ (\sqrt{\frac{ν}{t}} (\frac{t}{μ} - 1)) + e^{\frac{2 ν}{μ}} Φ (- \sqrt{\frac{ν}{t}} (\frac{t}{μ} + 1)), t > 0,

where

Φ

is the c.d.f. of the standard normal distribution.

The gamma and the IG distributions belong to the exponential family with a p.d.f. of the following form:

f (t, θ, ϕ) = exp \{\frac{θ t - b (θ)}{a (ϕ)} + c (t, ϕ)\}, t > 0 .

(3)

For the gamma distribution,

θ = - \frac{1}{μ}, ϕ = 1 / ν, b (θ) = - ln (- θ), a (ϕ) = ϕ,

c (t, ϕ) = ϕ^{- 1} ln (t / ϕ) - ln t - ln Γ (ϕ^{- 1}),

and for the IG distribution,

θ = - \frac{1}{2 μ^{2}}, ϕ = 1 / ν, b (θ) = - \sqrt{- 2 θ}, a (ϕ) = ϕ, c (t, ϕ) = \frac{1}{2} ln \frac{ϕ^{- 1}}{2 π t^{3}} - \frac{ϕ^{- 1}}{2 t} .

Gamma regression model: The distribution of response T given the shape parameter

ν

and a vector of covariates

z = {(1, z_{1}, \dots, z_{m})}^{T}

is

G (ν, μ (z))

and the link function is logarithmic:

log (μ (z)) = β^{T} z = β_{0} + β_{1} z_{1} + \dots + β_{m} z_{m} ⟺ μ (z) = e^{β^{T} z} .

Thus, the p.d.f., c.d.f., mean, and variance given the vector of covariates are as follows:

f (t, ν, β) = \frac{ν^{ν}}{e^{ν β^{T} z} Γ (ν)} t^{ν - 1} exp {- (ν / e^{β^{T} z}) t}, t > 0,

(4)

F (t, ν, β) = F_{χ_{2 ν}^{2}} (2 ν t / e^{β^{T} z}) = \frac{1}{Γ (ν)} γ (ν, ν X_{i} / e^{β^{T} z}),

where

γ

is the lower incomplete gamma function and

E (T | z) = μ (z) = e^{β^{T} z}, Var (T | z) = μ^{2} (z) / ν = e^{2 β^{T} z} / ν .

Sometimes the canonical (inverse) link function is used:

\frac{1}{μ (z)} = β^{T} z = β_{0} + β_{1} z_{1} + \dots + β_{m} z_{m} .

Inverse Gaussian regression model: The distribution of the response T given the shape parameter

ν

and a vector of covariates

z = {(1, z_{1}, \dots, z_{m})}^{T}

is

I G (ν, μ (z))

and the link function is logarithmic:

log (μ (z)) = β^{T} z = β_{0} + β_{1} z_{1} + \dots + β_{m} z_{m} .

Thus, the p.d.f, c.d.f, mean, and variance given the vector of covariates are as follows:

f (t, ν, β) = \sqrt{\frac{ν}{2 π t^{3}}} exp \{- \frac{ν {(t - e^{β^{T} z})}^{2}}{2 t e^{2 β^{T} z}}\}, t > 0,

(5)

F (t, ν, β) = Φ (\sqrt{\frac{ν}{t}} (\frac{t}{e^{β^{T} z}} - 1)) + exp \{\frac{2 ν}{e^{β^{T} z}}\} Φ (- \sqrt{\frac{ν}{t}} (\frac{t}{e^{β^{T} z}} + 1)), t > 0,

E (T | z) = μ (z) = e^{β^{T} z}, Var (T | z) = μ^{3} (z) / ν = e^{3 β^{T} z} / ν .

Sometimes, the canonical (inverse squared) link function is used:

\frac{1}{μ^{2} (z)} = β^{T} z = β_{0} + β_{1} z_{1} + \dots + β_{m} z_{m} .

The gamma regression model is also an AFT model, and the IG model is not an AFT model.

3. Chi-Squared GOF Tests for Gamma and Inverse Gaussian Regression

3.1. Parameter Estimation

Let us consider the possibility of right-censored regression data:

(X_{1}, δ_{1}, z_{1}), \dots, (X_{n}, δ_{n}, z_{n}),

where

X_{i} = Y_{i} \land C_{i},

δ_{i} = 1_{{Y_{i} \leq C_{i}}},

Y_{i}

are responses and

C_{i}

are censorings.

Denote

λ (x, θ) = f (x, θ) / (1 - F (x, θ)) = f (x, θ) / S (x, θ), Λ (x, θ) = - ln S (x, θ)

as the hazard and the cumulative hazard functions, respectively, depending on a finite-dimensional parameter

θ

. In the case of gamma and inverse Gaussian regression models,

θ = {(μ, ν)}^{T}

.

The parametric log-likelihood function is the following:

ℓ (θ) = \sum_{i = 1}^{n} (δ_{i} ln λ_{i} (X_{i}, θ) - Λ_{i} (X_{i}, θ)),

where

ln λ_{i}

and

\frac{\partial}{\partial θ} ln λ_{i} (t, θ)

are presented in Section 3.2 and Section 3.3.

3.2. Gamma Regression

In the case of gamma regression, the following results were obtained:

ln λ_{i} (X_{i}, θ) = ν ln ν + (ν - 1) ln X_{i} - \frac{ν}{μ_{i}} X_{i} - ν ln μ_{i} - ln Γ (ν) - ln (1 - F_{i} (X_{i}, ν, μ_{i})),

μ_{i} = μ_{i} (z_{i}, β) = \{\begin{matrix} e^{β^{T} z_{i}} & i f l i n k i s l o g a r i t h m i c, \\ 1 / β^{T} z_{i} & i f l i n k i s i n v e r s e, \end{matrix}

(6)

\frac{\partial}{\partial ν} ln λ_{i} (X_{i}, θ) = 1 + ln ν + ln X_{i} - X_{i} / μ_{i} - ln μ_{i} - ψ (ν) + \frac{\partial F_{i} (X_{i}, ν, μ_{i}) / \partial ν}{1 - F_{i} (X_{i}, ν, μ_{i})},

(7)

where

F_{i}

is the c.d.f. of the ith response, and

ψ (ν)

is the digamma function.

Note that the derivative of the probability density function of the ith response with respect to

ν

is

\frac{\partial f_{i} (t, ν, β)}{\partial ν} = \frac{ν^{ν}}{μ^{ν} Γ (ν)} t^{ν - 1} exp {- ν t / μ_{i}} (1 + ln ν - ln μ_{i} - ψ (ν) + ln t - t / μ_{i}), t > 0 .

(8)

This implies that the derivative of the c.d.f.

F_{i}

with respect to

ν

is the integral of the derivative of the p.d.f.:

\frac{\partial F_{i} (X_{i}, ν, μ)}{\partial ν} = \frac{ν^{ν}}{μ_{i}^{ν} Γ (ν)} \{(1 + ln ν - ln μ_{i} - ψ (ν)) \int_{0}^{X_{i}} u^{ν - 1} exp {- ν u / μ_{i}} d u +

\int_{0}^{X_{i}} (ln u - u / μ_{i}) u^{ν - 1} exp {- ν u / μ_{i}} d u\} = \frac{1}{Γ (ν)} γ (ν, ν X_{i} / μ_{i}) (1 + ln ν - ln μ_{i} - ψ (ν)) +

+ \frac{1}{Γ (ν)} (γ_{1}^{'} (ν, ν_{i} X_{i} / μ_{i}) + ln (μ_{i} / ν) γ (ν, ν X_{i} / μ_{i})) - \frac{1}{ν Γ (ν)} γ (ν + 1, ν X_{i} / μ_{i}),

where

γ_{1}^{'}

is the derivative of the lower incomplete gamma function with respect to the first argument (could be obtained using the function pgamma.deriv.unscaled of the R software version 4.4.3 package VGAM); the function gammainc of the R software version 4.4.3 package pracma computes the lower incomplete gamma functions, and functions gamma and digamma of the R version 4.4.3 package base return values of gamma and diggama functions, respectively.

\frac{\partial}{\partial β_{i}} ln λ_{i} (X_{i}, θ) = \frac{ν}{μ_{i}} (\frac{X_{i}}{μ_{i}} - 1) \frac{\partial μ_{i}}{\partial β_{i}} + \frac{\partial F_{i} (X_{i}, ν, μ_{i}) / \partial β_{i}}{1 - F_{i} (X_{i}, ν, μ_{i})},

(9)

\frac{\partial F_{i} (X_{i}, ν, μ_{i})}{\partial β_{i}} = (\frac{1}{μ_{i} Γ (ν)} γ (ν + 1, ν X_{i} / μ_{i}) - \frac{ν}{μ_{i}} \frac{1}{Γ (ν)} γ (ν, ν X_{i} / μ_{i})) \frac{\partial μ_{i}}{\partial β_{i}},

where

μ_{i}

is defined by (6).

3.3. Inverse Gaussian Regression

In the case of inverse Gaussian regression, the following was obtained:

ln λ_{i} (X_{i}, θ) = 0.5 (ln ν - ln (2 π X_{i}^{3})) - \frac{ν {(X_{i} - μ_{i})}^{2}}{2 μ_{i}^{2} X_{i}} -

- ln \{1 - Φ (\sqrt{\frac{ν}{X_{i}}} (\frac{X_{i}}{μ_{i}} - 1)) - e^{2 ν / μ_{i}} Φ (- \sqrt{\frac{ν}{X_{i}}} (\frac{X_{i}}{μ_{i}} + 1))\},

μ_{i} = μ_{i} (z, β) = \{\begin{matrix} e^{β^{T} z_{i}} & i f l i n k i s l o g a r i t h m i c, \\ 1 / \sqrt{β^{T} z_{i}} & i f l i n k i s i n v e r s e s q u a r e d . \end{matrix}

(10)

\frac{\partial}{\partial ν} ln λ_{i} (X_{i}, θ) = \frac{1}{2 ν} - \frac{{(X_{i} - μ_{i})}^{2}}{2 μ_{i}^{2} X_{i}} + {(1 - Φ (a_{i}) - e^{2 ν / μ_{i}} Φ (b_{i}))}^{- 1} \times

\{\frac{a φ (a_{i})}{2 ν} + e^{2 ν / μ_{i}} (\frac{2 Φ (b_{i})}{μ_{i}} + \frac{b φ (b_{i})}{2 ν})\};

(11)

\frac{\partial}{\partial β_{i}} ln λ_{i} (X_{i}, θ) = \frac{1}{μ_{i}^{2}} (\frac{ν (t - μ_{i})}{μ_{i}} - {(1 - Φ (a_{i}) - e^{2 ν / μ_{i}} Φ (b_{i}))}^{- 1} \times

\{φ (a_{i}) \sqrt{ν t} + e^{2 ν / μ_{i}} (2 ν Φ (b_{i}) - φ (b_{i}) \sqrt{ν t})\}) \frac{\partial μ_{i}}{\partial β_{i}},

(12)

where

μ_{i}

is defined by (10) and

a_{i} = \sqrt{\frac{ν}{X_{i}}} (\frac{X_{i}}{μ_{i}} - 1), b_{i} = - \sqrt{\frac{ν}{X_{i}}} (\frac{X_{i}}{μ_{i}} + 1) .

3.4. Grouping Intervals

Let

\hat{θ}

be the ML estimator of

θ .

Set

N_{i} (t) = 1_{{X_{i} \leq t, δ_{i} = 1}}, N (t) = \sum_{i = 1}^{n} N_{i} (t),

where

N (t)

is the number of responses in the interval

[0, t]

. The mean of

N (t)

is

E N (t) = E \sum_{i = 1}^{n} Λ_{i} (t \land X_{i}, θ) .

So,

\sum_{i = 1}^{n} Λ_{i} (t \land X_{i}, \hat{θ})

may be interpreted as the expected number of responses in the interval

[0, t]

when the parametric model is true.

If the parametric model is true, then the difference

N (t) - \sum_{i = 1}^{n} Λ_{i} (t \land X_{i}, \hat{θ})

should take smaller values than in the case when the model is false.

Denote

X_{(1)} \leq \dots \leq X_{(n)}

as the ordered

X_{1}, \dots, X_{n}

. Set

Λ_{(i)} (t, θ) = Λ (t, z^{((i))}, θ)

; here,

z^{((i))}

is the vector of covariates corresponding to

X_{(i)}

in the sample. Define

E_{k} = \sum_{i = 1}^{n} Λ_{i} (X_{i}, \hat{θ}) = \sum_{i = 1}^{n} Λ_{(i)} (X_{(i)}, \hat{θ}), E_{j} = \frac{j}{k} E_{k}, j = 1, \dots, k,

(13)

where

E_{k}

is the estimator (under the true model) of the expected number of responses in the interval

[0, X_{(n)}]

. If the model is true, then this value should not be far from n.

Divide the interval

[0, X_{(n)}]

into k smaller intervals

I_{j} = (a_{j - 1}, a_{j}]

.

a_{0} = 0

and

a_{k} = X_{(n)}

(do not identify

a_{j}

with

a_{j}

from Section 3.3) have the same expected number of responses in each interval. More precisely, the point

a_{j}

is defined in the following way:

g (a_{j}) = E_{j}, g (a) = \sum_{l = 1}^{n} Λ_{(l)} (a \land X_{(l)}) .

The function

g (a)

is strictly increasing:

g (0) = 0

and

g (X_{(n)}) = E_{k}

.

Set

X_{(0)} = 0

. Let us use notation

\sum_{l = 1}^{0} c_{l} = 0

. Define

b_{i} = g (X_{(i)}) = \sum_{l = 1}^{n} Λ_{(l)} (X_{(i)} \land X_{(l)}, \hat{θ}) = \sum_{l = i + 1}^{n} Λ_{(l)} (X_{(i)}, \hat{θ}) + \sum_{l = 1}^{i} Λ_{(l)} (X_{(l)}, \hat{θ}) .

Note that

b_{0} = 0

,

b_{n} = E_{k}

, and

E_{j} \in (0, E_{k}]

,

j = 1, . . ., k

. Hence, there exists

i_{j}

such that

E_{j} \in (b_{i_{j} - 1}, b_{i_{j}}]

, which implies that

a_{j} \in (X_{(i_{j} - 1)}, X_{(i_{j})}]

. So, at first,

i_{j}

is found. Then,

a_{j}

is obtained, which is the unique root of the function

h_{j} (a) = g (a) - E_{j}

in the interval

(X_{(i_{j} - 1)}, X_{(i_{j})}]

, and is easily found by the bisection method because

h_{j} (X_{(i_{j} - 1)}) < 0

,

h_{j} (X_{(i_{j})}) > 0

, and

h_{j} (a)

is strictly increasing. Note that in the interval

(X_{(i_{j} - 1)}, X_{(i_{j})}]

the function

g (a)

may be written as follows:

g (a) = \sum_{l = 1}^{n} Λ_{(l)} (a \land X_{(l)}, \hat{θ}) = \sum_{l = i_{j}}^{n} Λ_{(l)} (a, \hat{θ}) + \sum_{l = 1}^{i_{j} - 1} Λ_{(l)} (X_{(l)}, \hat{θ}) .

3.5. Test Statistic

The numbers of observed and expected responses in the interval

I_{j} = (a_{j - 1}, a_{j}]

are as follows:

U_{j} = \sum_{i : X_{i} \in I_{j}} δ_{i}, e = E_{k} / k, j = 1, . . ., k .

The chi-squared test is based on the random vector

Z = {(Z_{1}, \dots, Z_{k})}^{T}, Z_{j} = \frac{1}{\sqrt{n}} (U_{j} - e),

i.e., on the differences between observed and expected values under the GLM model, including the number of responses in the intervals

I_{j}

.

Set

{\hat{A}}_{j} = U_{j} / n, C_{j} = \frac{1}{n} \sum_{i : X_{i} \in I_{j}} δ_{i} \frac{\partial}{\partial θ} ln λ_{i} (X_{i}, \hat{θ}), \hat{C} = ({\hat{C}}_{1}, \dots, {\hat{C}}_{k}) .

If s is a dimension of

θ

, then

C_{j}

are

s \times 1

vectors and C is a

s \times k

matrix. Denote by

\hat{A}

the diagonal matrix with diagonal elements

{\hat{A}}_{j}

.

The limit distribution of the random vector Z is found by applying the results of Theorems 3.1 and 3.2 of our article [13] (these theorems are also provided in [18] and Appendix A of this article). Note that these theorems can be applied not only for AFT models but also in the case of GLM models, because we can choose various forms of parametric hazard functions

λ_{i} (u, θ)

for different i (see Appendix A).

The proof of the above-mentioned theorems in [13] was obtained by the following steps. At first, the asymptotic properties of the stochastic process,

H_{n} (t) = \frac{1}{\sqrt{n}} (N (t) - \sum_{i = 1}^{n} \int_{0}^{t} λ_{i} (u, \hat{θ}) Y_{i} (u) d u),

were investigated ([13], Lemma 3.1; Appendix A, and Lemma A1) by applying the central limit theorem (CLT) for martingales under well-known assumptions (see [19]) on the asymptotic properties (consistency and asymptotic normality) of the ML estimator

\hat{θ}

and the assumptions of CLT. Lemma 3.1 implies that the limit distribution of the random vector is Z (see [13], Theorem 3.1; Appendix A, and Theorem A1 ). This distribution is approximated by the normal distribution

N_{k} (0, V) .

Theorem 3.2 (see Appendix A and Theorem A2) implies that the covariance matrix V is consistently estimated by the matrix:

\hat{V} = \hat{A} - {\hat{C}}^{T} {\hat{i}}^{- 1} \hat{C},

where

\hat{i}

is a

s \times s

matrix (see [18]) that can be written in the following form:

{\hat{i}}_{l l^{'}} = \frac{1}{n} \sum_{i = 1}^{n} δ_{i} \frac{\partial}{\partial θ_{l}} ln λ_{i} (X_{i}, \hat{θ}) {(\frac{\partial}{\partial θ_{l^{'}}} ln λ_{i} (X_{i}, \hat{θ}))}^{T},

where derivatives of

ln λ_{i}

are provided in (7) and (9) for the gamma regression, and in (11) and (12) for inverse Gaussian regression.

The chi-squared test for the hypothesis

H_{0}

is based on the following statistic:

Y^{2} = Z^{T} {\hat{V}}^{-} Z,

(14)

where

{\hat{V}}^{-}

is the general inverse of the matrix

\hat{V}

. The hypothesis is rejected with an approximate significance level of

α

if

Y^{2} > χ_{α}^{2} (r)

, where r is the rank of the matrix V.

Note that in the case of the gamma regression, V is a full rank matrix (

r = k

); thus,

{\hat{V}}^{-} = {\hat{V}}^{- 1} .

In the case of inverse Gaussian regression model,

r a n k (V) = k - 1 .

4. Simulation Study

The data are simulated by taking two covariates:

z_{1}

—dichotomous (0—for half of the observations and 1—for the remaining observations) and

z_{2} \sim U (20, 30) .

Different sample sizes n are considered. The Rice rule (see [20]) is used to determine the number of grouping intervals (see Table 1):

k = [2 \sqrt[3]{n}] .

(15)

Table 1. The number of grouping intervals using the Rice rule.

In the assumptions of the limit distribution of the test statistic, it is supposed that k is fixed and the limit distribution is obtained to be chi-squared with k or

k - 1

degrees-of-freedom. Is the approximation accurate if

k = [2 \sqrt[3]{n}]

? Note that if n is fixed, then

k = [2 \sqrt[3]{n}]

is also fixed. We know that if the size of the sample

n^{*} > k

is sufficiently large and

k = [2 \sqrt[3]{n}]

, then the chi-squared approximation is accurate. But, taking into account that n is much larger than k (see Table 1), the approximation should also be good for sample size n. Simulations confirm this.

4.1. Simulation Under Hypotheses

The estimated significance levels are obtained using 5000 iterations. Tests with significance levels

α = 0.05

and

α = 0.1

are applied. Table 2 and Table 3 present the results for gamma and inverse Gaussian regression, respectively. Grouping intervals are computed using the Rice rule (15); moreover, different numbers of grouping intervals are considered to see how the convergence speed depends on the number of grouping intervals. The simulation results under the hypothesis demonstrate that the estimated significance levels approach the true value as the number of observations increases.

Table 2. Estimates of the significance level

α

under the hypothesis, inverse Gaussian regression with log link,

β_{0} = - 5,

β_{1} = 2,

β_{2} = 0.1,

ν = 3

.

Table 3. Estimates of the significance level

α

under the hypothesis, gamma regression with log link,

β_{0} = - 7,

β_{1} = 4,

β_{2} = 0.3,

ν = 0.45

.

4.2. Simulation Under Alternatives

The data are simulated under various alternatives and values of parameters. For each of the sample sizes considered, we simulate 1000 replications and compute values of the test power. The significance level is 0.05.

In the case of inverse Gaussian regression, the test power under the following alternatives is investigated (see Table 4): gamma regression, log-normal, log-logistic, and Weibull AFT models. For gamma regression, the following alternatives are considered: inverse Gaussian and normal regression, log-normal, and log-logistic and Weibull AFT models, i.e., gamma regression models with shape and scale depending on covariates.

Table 4. Definitions of alternative models.

The results in the case of gamma regression are presented in Table 5. It has become evident that the test power under the IG regression alternative is large even for small sample sizes. The smallest test power values are in the case of the Weibull AFT model alternative, which is reasonable because gamma and Weibull models are very similar for some sets of parameters.

Table 5. Gamma regression. Powers against various alternatives. n: number of observations; k: optimal number of grouping intervals.

The results in the case of IG regression are presented in Table 6. It turned out that the test power under all considered alternatives is large even for small sample sizes. The smallest test power values are obtained when the alternative is the log-logistic AFT model.

Table 6. Inverse Gaussian regression. Powers against various alternatives. n: number of observations; k: optimal number of grouping intervals.

Moreover, the simulation study suggested that in the case of the gamma and inverse Gaussian regression, the Rice rule (15) provides optimal grouping intervals (

k_{o p t} = k_{R i c e}

) for sample sizes

n \geq 60

, and for smaller samples the number of grouping intervals is

k_{o p t} = k_{R i c e} - 1 .

5. Real Data Examples

Example 1: Failure times (see Table 7) of 76 electrical insulating fluids tested at voltages, ranging from 26 to 38 kV ([21]), are considered.

Table 7. Failure times

T_{i}

for 76 electrical insulating fluids tested at voltages

v_{i}

.

The diagnostic methods (see [2]) suggest that the Weibull AFT–power rule model, i.e.,

l o g (v_{i}),

should be used. The results of applying the modified chi-squared test are presented in Table 8. The analysis demonstrated that the Weibull AFT–power rule and gamma regression models are not rejected; however, AIC and BIC are smaller in the case of the gamma regression model. The inverse Gaussian regression model is strongly rejected.

Table 8. Modified chi-squared test,

k = 8

. Electrical insulating fluids data.

Example 2: Hospital cost data (the dataset hospcosts from R package robmixglm) consist of a sample of 100 patients hospitalized at the Centre Hospitalier Universitaire Vaudois in Lausanne during 1999 for “medical back problems”. The response is the cost of stay, and the covariates are as follows: length of stay (in days; the logarithmic transformation was applied), admission type (0: planned; 1: emergency), insurance type (0: regular; 1: private), age (in years), sex (0: female; 1: male) and discharge destination (1: home; 0: another health institution). Data were analyzed in [8] considering the gamma regression and [22] in the Weibull model context.

The results of applying the modified chi-squared test are presented in Table 9. It is clear that the Weibull AFT–power rule and gamma regression models are not rejected. However, AIC is smaller in the case of the Weibull AFT–power rule model. The inverse Gaussian regression model is strongly rejected.

Table 9. Modified chi-squared test,

k = 9

. Hospital cost data.

Example 3: Table 10 presents the results of an experiment designed to compare the performances of high-speed turbine engine bearings made out of five different compounds (see [2]). Data were fitted using a three-parameter Weibull distribution. The experiment tested 10 bearings of each type, and the times to fatigue failure were measured in units of millions of cycles.

Table 10. Failure times of bearing specimens.

The results using the modified chi-squared test are presented in Table 11. The gamma, Weibull AFT, and inverse Gaussian regression models are rejected. The results do not contradict the results in [2].

Table 11. Chi-squared test,

k = 6

. Bearing specimens data.

6. Conclusions

The modified chi-squared goodness-of-fit tests were constructed for gamma and inverse Gaussian regression models with possibly censored data. The methodology for grouping intervals was proposed, and practical recommendations based on the simulation results were presented. The results indicated that the test power under various considered alternatives is large even for small sample sizes. Moreover, in the case of the gamma and inverse Gaussian regression the Rice rule (15) provides optimal grouping intervals (

k_{o p t} = k_{R i c e}

) for sample sizes

n \geq 60

, and for smaller samples, the number of grouping intervals is

k_{o p t} = k_{R i c e} - 1 .

The application of tests was shown using real data. The proposed tests are important in the data modeling process. They are robust to the model structure because in the case of misspecification of the model, the ”expected” number of responses will be far from the observed number of responses, and the test statistic will take large values; therefore, the hypothesis will be rejected. Thus, another model structure could be taken into consideration. The article fills the gap of formal omnibus tests for gamma and inverse Gaussian regression.

Author Contributions

Conceptualization, V.B. and R.L.; methodology, V.B. and R.L.; investigation, V.B. and R.L.; writing—original draft preparation, V.B. and R.L.; writing—review and editing, V.B. and R.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

In [13], the following results were obtained.

Condition A (consistency and asymptotic normality of the ML estimator $\hat{θ}$ ):

\hat{θ} \overset{P}{\to} θ_{0}, \frac{1}{\sqrt{n}} \dot{ℓ} (θ_{0}) \overset{d}{\to} N_{m} (0, i (θ_{0})), - \frac{1}{n} \ddot{ℓ} (θ_{0}) \overset{P}{\to} i (θ_{0});

\sqrt{n} (\hat{θ} - θ_{0}) = i^{- 1} (θ_{0}) \frac{1}{\sqrt{n}} \dot{ℓ} (θ_{0}) + o_{P} (1),

where

\dot{ℓ} (θ) = \sum_{i = 1}^{n} \int_{0}^{τ} \frac{\partial}{\partial θ} ln λ_{i} (u, θ) {d N_{i} (u) - Y_{i} (u) λ_{i} (u, θ) d u} .

Set

S^{(0)} (t, θ) = \sum_{i = 1}^{n} Y_{i} (t) λ_{i} (t, θ), S^{(1)} (t, θ) = \sum_{i = 1}^{n} Y_{i} (t) \frac{\partial ln λ_{i} (t, θ)}{\partial θ} λ_{i} (t, θ),

S^{(2)} (t, θ) = \sum_{i = 1}^{n} Y_{i} (t) \frac{\partial^{2} ln λ_{i} (t, θ)}{\partial θ^{2}} λ_{i} (t, θ) .

Condition B: This paragraph is with indentation. There exist a neighborhood

Θ_{0}

of

θ_{0}

and continuous bounded on

Θ_{0} \times [0, τ]

: functions

s^{(0)} (t, θ), s^{(1)} (t, θ) = \frac{\partial s^{(0)} (t, θ)}{\partial θ}, s^{(2)} (t, θ) = \frac{\partial^{2} s^{(0)} (t, θ)}{\partial θ^{2}},

such that for

j = 0, 1, 2

,

sup_{t \in [0, τ], θ \in Θ} | | \frac{1}{n} S^{(j)} (t, θ) - s^{(j)} (t, θ) | | \overset{P}{\to} 0 a s n \to \infty .

Condition B implies that uniformly for

t \in [0, τ]

,

\frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{t} λ_{i} (u, θ_{0}) Y_{i} (u) d u \overset{P}{\to} A (t)

\frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{t} {\dot{λ}}_{i} (u, θ_{0}) Y_{i} (u) d u \overset{P}{\to} C (t),

where A and C are finite.

Lemma A1.

Under Conditions A and B, the following convergence holds:

H_{n} \overset{d}{\to} V o n D [0, τ],

where

D [0, τ]

is space of cadlag functions with the Skorokhod metric; V is a zero-mean Gaussian martingale, such that for all

0 \leq s \leq t

,

c o v (V (s), V (t)) = A (s) - C^{T} (s) i^{- 1} (θ_{0}) C (t) .

Theorem A1.

Under Conditions A and B,

Z \overset{d}{\to} Y \sim N_{k} (0, V) a s n \to \infty,

where

V = A - C^{T} i^{- 1} (θ_{0}) C .

Theorem A2.

Under conditions A and B, the following estimators of

A_{j}

,

C_{j}

,

i (θ_{0})

, and V are consistent:

{\hat{A}}_{j} = U_{j} / n, {\hat{C}}_{j} = \frac{1}{n} \sum_{i = 1}^{n} \int_{I_{j}} \frac{\partial}{\partial θ} ln λ_{i} (u, \hat{θ}) d N_{i} (u),

and

\hat{i} = \frac{1}{n} \sum_{i = 1}^{n} \int_{0}^{τ} \frac{\partial ln λ_{i} (u, \hat{θ})}{\partial θ} {(\frac{\partial ln λ_{i} (u, \hat{θ})}{\partial θ})}^{T} d N_{i} (u),

\hat{V} = \hat{A} - {\hat{C}}^{T} {\hat{i}}^{- 1} \hat{C} .

References

McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Chapman and Hall: London, UK, 1989. [Google Scholar] [CrossRef]
Lawless, J.F. Statistical Models and Methods for Lifetime Data; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar] [CrossRef]
Meeker, W.Q.; Escobar, L.A. Statistical Methods for Reliability Data; John Wiley & Sons: Hoboken, NJ, USA, 1998; ISBN 978-1-118-62597-2. [Google Scholar]
De Jong, P.; Heller, G.Z. Generalized Linear Models for Insurance Data; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar] [CrossRef]
Haberman, S.; Renshaw, A.E. Generalized linear models and actuarial science. J. R. Stat. Soc. Ser. D (The Stat.) 1996, 45, 407–436. [Google Scholar] [CrossRef]
Baione, F.; Biancalana, D. An individual risk model for premium calculation based on quantile: A comparison between generalized linear models and quantile regression. N. Am. Actuar. J. 2019, 23, 573–590. [Google Scholar] [CrossRef]
Blough, D.K.; Ramsey, S.D. Using generalized linear models to assess medical care costs. Health Serv. Outcomes Res. Methodol. 2000, 1, 185–202. [Google Scholar] [CrossRef]
Cantoni, E.; Ronchetti, E. A robust approach for skewed and heavy-tailed outcomes in the analysis of health care expenditures. J. Health Econ. 2006, 25, 198–213. [Google Scholar] [CrossRef]
Ng, V.K.; Cribbie, R.A. Using the gamma generalized linear model for modeling continuous, skewed and heteroscedastic outcomes in psychology. Curr. Psychol. 2017, 36, 225–235. [Google Scholar] [CrossRef]
Klar, B.; Meintanis, S.G. Specification tests for the response distribution in generalized linear models. Comput. Stat. 2012, 27, 251–267. [Google Scholar] [CrossRef]
Shayib, M.A.; Young, D.H. Modified goodness of fit tests in gamma regression. J. Stat. Comput. Simul. 1989, 33, 125–133. [Google Scholar] [CrossRef]
Desmond, A.F.; Yang, Z. Asymptotically refined score and GOF tests for inverse Gaussian models. J. Stat. Comput. Simul. 2016, 86, 3243–3269. [Google Scholar] [CrossRef]
Bagdonavičius, V.B.; Levuliene, R.J.; Nikulin, M.S. Chi-squared goodness-of-fit tests for parametric accelerated failure time models. Commun.-Stat.-Theory Methods 2013, 42, 2768–2785. [Google Scholar] [CrossRef]
Bagdonavičius, V.; Nikulin, M.S. Accelerated Life Models: Modeling and Statistical Analysis; Chapman & Hall/CRC: New York, NY, USA, 2002. [Google Scholar] [CrossRef]
Amin, M.; Ullah, M.A.; Qasim, M. Diagnostic techniques for the inverse Gaussian regression model. Commun. Stat.-Theory Methods 2022, 51, 2552–2564. [Google Scholar] [CrossRef]
Amin, M.; Amanullah, M.; Cordeiro, G.M. Influence diagnostics in the gamma regression model with adjusted deviance residuals. Commun. Stat.-Simul. Comput. 2017, 46, 6959–6973. [Google Scholar] [CrossRef]
Imran, M.; Akbar, A. Diagnostics via partial residual plots in inverse Gaussian regression. J. Chemom. 2020, 34, e3203. [Google Scholar] [CrossRef]
Bagdonavicius, V.; Kruopis, J.; Nikulin, M.S. Non-parametric Tests for Censored Data; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar] [CrossRef]
Andersen, P.K.; Borgan, O.; Gill, R.D.; Keiding, N. Statistical Models Based on Counting Processes; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1993. [Google Scholar] [CrossRef]
De La Rubia, J.M. Rice university rule to determine the number of bins. Open J. Stat. 2024, 14, 119–149. [Google Scholar] [CrossRef]
Nelson, W. Hazard plotting methods for analysis of life data with different failure modes. J. Qual. Technol. 1970, 2, 126–149. [Google Scholar] [CrossRef]
Marazzi, A.; Yohai, V.J. Adaptively truncated maximum likelihood regression with asymmetric errors. J. Stat. Plan. Inference 2004, 122, 271–291. [Google Scholar] [CrossRef]

Table 1. The number of grouping intervals using the Rice rule.

n	30	50	60	70	80	100	150	200
k	6	7	7	8	8	9	10	11
n	300	400	500	600	800	1000	1500	2000
k	13	14	15	16	18	20	22	25

Table 2. Estimates of the significance level

α

under the hypothesis, inverse Gaussian regression with log link,

β_{0} = - 5,

β_{1} = 2,

β_{2} = 0.1,

ν = 3

.

Table 2. Estimates of the significance level

α

under the hypothesis, inverse Gaussian regression with log link,

β_{0} = - 5,

β_{1} = 2,

β_{2} = 0.1,

ν = 3

.

n
200	500	1000	1500	2000	$+ \infty$
$k_{R i c e} = 11$	$k_{R i c e} = 15$	$k_{R i c e} = 20$	$k_{R i c e} = 22$	$k_{R i c e} = 25$
0.1150	0.0860	0.0670	0.0678	0.0584	0.05
0.1638	0.1417	0.1204	0.1180	0.1098	0.10
	$k = 11$	$k = 11$	$k = 11$	$k = 11$
	0.0735	0.0638	0.0620	0.0520	0.05
	0.1290	0.1139	0.1138	0.1060	0.10
		$k = 15$	$k = 15$	$k = 15$
		0.0684	0.0636	0.0600	0.05
		0.1280	0.1166	0.1190	0.10
			$k = 20$	$k = 20$
			0.0656	0.0670	0.05
			0.1294	0.1180	0.10

Table 3. Estimates of the significance level

α

under the hypothesis, gamma regression with log link,

β_{0} = - 7,

β_{1} = 4,

β_{2} = 0.3,

ν = 0.45

.

Table 3. Estimates of the significance level

α

under the hypothesis, gamma regression with log link,

β_{0} = - 7,

β_{1} = 4,

β_{2} = 0.3,

ν = 0.45

.

n
200	500	1000	1500	2000	$+ \infty$
$k_{R i c e} = 11$	$k_{R i c e} = 15$	$k_{R i c e} = 20$	$k_{R i c e} = 22$	$k_{R i c e} = 25$
0.0910	0.0722	0.0692	0.0672	0.0720	0.05
0.1510	0.1362	0.1304	0.1264	0.1302	0.10
	$k = 11$	$k = 11$	$k = 11$	$k = 11$
	0.0788	0.0634	0.0556	0.0562	0.05
	0.1418	0.1206	0.1098	0.1064	0.10
		$k = 15$	$k = 15$	$k = 15$
		0.0726	0.0602	0.0610	0.05
		0.1278	0.1176	0.1128	0.10
			$k = 20$	$k = 20$
			0.0736	0.0620	0.05
			0.1264	0.1198	0.10

Table 4. Definitions of alternative models.

Model	CDF
gamma regression with log link	(4)
IG regression with log link	(5)
Weibull AFT–log-linear	$1 - exp \{- {(t / e^{β^{T} z})}^{ν}\}$
log-logistic AFT–log-linear	$Φ ((ln t - β^{T} z) / σ); σ = 1 / ν$
log-normal AFT–log-linear	$1 - {(1 + {(t / e^{β^{T} z})}^{ν})}^{- 1}$
gamma regression model with shape	(4) with
and scale depending on covariates	$ν = ν_{0} + ν_{1} z_{1} + \dots + ν_{j} z_{j}$

Table 5. Gamma regression. Powers against various alternatives. n: number of observations; k: optimal number of grouping intervals.

Alternative	n; k
Alternative	30; 5	50; 6	60; 7	80; 8	100; 9	150; 10	200; 11	250; 12	300; 13
$β_{0} = 1, β_{1} = 1, β_{2} = 0.01, ν = 3$
IG with log link	0.706	0.902	0.943	0.987	1	1	1	1	1
Weibull AFT–log-linear	0.282	0.317	0.332	0.346	0.360	0.427	0.493	0.585	0.674
log-logistic AFT–log-linear	0.542	0.569	0.571	0.646	0.682	0.742	0.837	0.897	0.923
log-normal AFT–log-linear	0.361	0.377	0.418	0.439	0.459	0.481	0.487	0.518	0.581
$β_{0} = 1, β_{1} = 1, β_{2} = 0.01, ν = 2$
IG with log link	0.813	0.951	0.973	0.995	1	1	1	1	1
Weibull AFT–log-linear	0.241	0.257	0.267	0.272	0.272	0.283	0.300	0.344	0.372
log-logistic AFT–log-linear	0.583	0.651	0.705	0.780	0.838	0.923	0.974	0.986	0.994
log-normal AFT–log-linear	0.415	0.444	0.489	0.513	0.578	0.653	0.709	0.768	0.823
$β_{0} = 1, β_{1} = 1, β_{2} = 0.01$ , gamma with shape $ν$
$ν = 1 + 2 z_{1}$	0.535	0.533	0.569	0.606	0.680	0.781	0.873	0.923	0.968
$ν = 0.7 + 2 z_{1}$	0.545	0.630	0.651	0.703	0.772	0.910	0.954	0.987	0.994

Table 6. Inverse Gaussian regression. Powers against various alternatives. n: number of observations; k: optimal number of grouping intervals.

Alternative	n; k
Alternative	30; 5	50; 6	60; 7	80; 8	100; 9	150; 10	200; 11	250; 12
$β_{0} = 1, β_{1} = 1, β_{2} = 0.01, ν = 2$
gamma with log link	0.708	0.816	0.848	0.909	0.946	0.982	0.999	1
Weibull AFT–log-linear	0.739	0.819	0.896	0.922	0.970	0.995	0.999	1
log-logistic AFT–log-linear	0.433	0.544	0.549	0.594	0.664	0.746	0.842	0.891
log-normal AFT–log-linear	0.503	0.616	0.677	0.717	0.805	0.887	0.950	0.978
$β_{0} = 1, β_{1} = 1, β_{2} = 0.01, ν = 1.5$
gamma with log link	0.761	0.895	0.903	0.959	0.987	0.996	0.999	1
Weibull AFT–log-linear	0.795	0.878	0.919	0.968	0.993	0.999	0.993	1
log-logistic AFT–log-linear	0.575	0.616	0.665	0.736	0.774	0.849	0.920	0.966
log-normal AFT–log-linear	0.475	0.504	0.534	0.617	0.681	0.802	0.872	0.939

Table 7. Failure times

T_{i}

for 76 electrical insulating fluids tested at voltages

v_{i}

.

Table 7. Failure times

T_{i}

for 76 electrical insulating fluids tested at voltages

v_{i}

.

$v_{i}$ (kV)	Frequency	Failure Times $T_{i}$
26	3	5.79 159.52 2323.70
28	5	68.85 108.29 110.29 426.07 1067.60
30	11	7.74 17.05 20.46 21.02 22.66 43.40
		47.30 139.07 144.12 175.88 194.90
32	15	0.27 0.40 0.69 0.79 2.75 3.91 9.88 13.95
		15.93 27.80 53.24 82.85 89.29 100.58 215.10
34	19	0.19 0.78 0.96 1.31 2.78 3.16 4.15 4.67 4.85 6.50
		7.35 8.01 8.27 12.06 31.75 32.52 33.91 36.71 72.89
36	15	36 0.35 0.59 0.96 0.99 1.69 1.97 2.07
		2.58 2.71 2.90 3.67 3.99 5.35 13.77 25.50
38	8	0.09 0.39 0.47 0.73 0.74 1.13 1.40 2.38

Table 8. Modified chi-squared test,

k = 8

. Electrical insulating fluids data.

Table 8. Modified chi-squared test,

k = 8

. Electrical insulating fluids data.

Model	$Y^{2}$	p-Value	AIC	BIC
gamma regression; log link	9.335	0.3149	604.9	611.9
Weibull AFT–power rule model	6.704	0.4604	607.6	614.6
IG regression; log link	88.4	<0.0001	651.3	658.3

Table 9. Modified chi-squared test,

k = 9

. Hospital cost data.

Table 9. Modified chi-squared test,

k = 9

. Hospital cost data.

Model	$Y^{2}$	p-Value	AIC	BIC
gamma regression; log link	14.57	0.1035	1817.9	1838.8
Weibull AFT	8.82	0.3574	1817.0	1837.8
IG regression with log link	43.42	<0.0001	1866.3	1887.1

Table 10. Failure times of bearing specimens.

Compound	Failures
I	3.03 5.53 5.60 9.30 9.92 12.51 12.95 15.21 16.04 16.84
II	3.19 4.26 4.47 4.53 4.67 4.69 5.78 6.79 9.37 12.75
III	3.46 5.22 5.69 6.54 9.16 9.40 10.19 10.71 12.58 13.41
IV	5.88 6.74 6.90 6.98 7.21 8.14 8.59 9.80 12.28 25.46
V	6.43 9.97 10.39 13.55 14.45 14.72 16.81 18.39 20.84 21.51

Table 11. Chi-squared test,

k = 6

. Bearing specimens data.

Table 11. Chi-squared test,

k = 6

. Bearing specimens data.

Model	$Y^{2}$	p-Value
gamma regression; log link	13.38	0.0373
Weibull AFT	13.20	0.0216
IG regression with log link	17.37	0.0080

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Modified Chi-Squared Goodness-of-Fit Tests for Continuous Right-Skewed Response Generalized Linear Models

Abstract

1. Introduction

2. Gamma and Inverse Gaussian Regression Models

3. Chi-Squared GOF Tests for Gamma and Inverse Gaussian Regression

3.1. Parameter Estimation

3.2. Gamma Regression

3.3. Inverse Gaussian Regression

3.4. Grouping Intervals

3.5. Test Statistic

4. Simulation Study

4.1. Simulation Under Hypotheses

4.2. Simulation Under Alternatives

5. Real Data Examples

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics