Generalized Linear Model (GLM) Applications for the Exponential Dispersion Model Generated by the Landau Distribution

Shaul K. Bar-Lev; Xu Liu; Ad Ridder; Ziyu Xiang

doi:10.3390/math12132021

,

and

¹

Faculty of Industrial Engineering and Technology Management, Holon Institute of Technology, Holon 5810201, Israel

²

School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai 200433, China

³

Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, Kunming 650091, China

⁴

School of Business and Economics, Vrije University of Amsterdam, 1081 HV Amsterdam, The Netherlands

Mathematics2024, 12(13), 2021;https://doi.org/10.3390/math12132021

This article belongs to the Section D1: Probability and Statistics

Version Notes

Order Reprints

Abstract

The exponential dispersion model (EDM) generated by the Landau distribution, denoted by EDM-EVF (exponential variance function), belongs to the Tweedie scale with power infinity. Its density function does not have an explicit form and, as of yet, has not been used for statistical aspects. Out of all EDMs belonging to the Tweedie scale, only two EDMs are steep and supported on the whole real line: the normal EDM with constant variance function and the EDM-EVF. All other absolutely continuous steep EDMs in the Tweedie scale are supported on the positive real line. This paper aims to accomplish an overall picture of all generalized linear model (GLM) applications belonging to the Tweedie scale by including the EDM-EVF. This paper introduces all GLM ingredients needed for its analysis, including the respective link function and total and scaled deviance. We study its analysis of deviance, derive the asymptotic properties of the maximum likelihood estimation (MLE) of the covariate parameters, and obtain the asymptotic distribution of deviance, using saddlepoint approximation. We provide numerical studies, which include estimation algorithm, simulation studies, and applications to three real datasets, and demonstrate that GLM using the EDM-EVF performs better than the linear model based on the normal EDM. An R package accompanies all of these.

Keywords:

exponential dispersion model; generalized linear model; exponential variance function; small-dispersion asymptotics; saddlepoint approximation; analysis of the deviance

MSC:

62J12; 62F03; 62F12

1. Introduction

The (reproductive)

{TBE}_{γ}

,

γ \in \{(- \infty, 0), 0, 1, [2, \infty]\}

class (known as the Tweedie class, cf., [1]) is composed of all exponential dispersion models (EDMs) with variance functions (VFs) of the form

V (m) = φ m^{γ}, φ > 0, m \in M_{γ},

where m is the mean,

M_{γ}

is the mean parameter space,

φ

is the dispersion parameter, and

γ

is the power parameter (cf., [2,3] and the references cited therein).

Let

F_{γ} \in {TBE}_{γ}

be an EDM belonging to the TBE class. Also, let

C_{γ}

and

M_{γ}

denote, respectively, the convex support and mean parameter space of

F_{γ}

. Among the

{TBE}_{γ}

class, the subclasses containing all absolutely continuous (with respect to the Lebesgue measure) models comprise the following cases (cf., [2]):

When $γ < 0$ , $F_{γ}$ is generated by a stable distribution with a stable index in $(1, 2)$ supported on $C_{γ} = R$ with $M_{γ} = R^{+},$ i.e., $M_{γ} = R^{+} \subset i n t$ $C_{γ} = R$ , a proper subset of $i n t$ $C_{γ}$ (interior of $C_{γ}$ ), for all $γ < 0 .$
When $γ = 0$ , $F_{0}$ is the normal EDM with $M_{0} = i n t$ $C_{0} = R$ .
When $γ = 2$ , $F_{2}$ is the gamma EDM with $M_{2} = i n t$ $C_{2} = R^{+} .$
When $2 < γ < \infty$ , $F_{γ}$ is generated by a positive stable distribution with a stable index in $(0, 1)$ supported on $C_{γ} = R^{+}$ with $M_{γ} = R^{+}$ for all $γ > 2 .$
When $γ = \infty$ , $F_{\infty}$ is the EDM generated by the Landau distribution, supported on $C_{\infty} = R$ with $M_{\infty} = R$ . It is absolutely continuous with respect to the Lebesgue measure on $R$ and is the limit of EDMs having power VFs (see [2,3] for further details).

Two important aspects related to the above TBE models should be remarked at this point:

Complexity of the density function. Except for the normal ( $γ = 0$ ), gamma ( $γ = 2$ ) and inverse Gaussian ( $γ = 3$ ) EDMs, no other ${TBE}_{γ}$ possesses an explicit density, in terms of algebraic functions. All such densities can only be expressed in terms of integral form or power series; hence, their evaluation becomes rather complicated. To resolve the problem, several studies have directly employed the saddlepoint approximation for density estimation on the ${TBE}_{γ}$ scale for $2 < γ < \infty$ , as discussed in [4,5,6,7,8]. The saddlepoint approximation can do so by substituting the part of the density that lacks a closed-form representation with a simple analytic expression. Additionally, the saddlepoint approximation can be utilized instead of traditional likelihood methods to derive the maximum likelihood estimation (MLE) of $φ$ (cf., [6,9]). Dunn created and maintained the Tweedie R package [10], while [11] contributed to and maintained the statmod R package. In this frame, the function tweedie.profile in the tweedie R package practically enables the fit of TBE models. These packages can be extended to include the ${TBE}_{\infty}$ as well.
Steepness. The model $F_{γ} \in {TBE}_{γ}$ is called steep if $M_{γ} = i n t$ $C_{γ}$ . Steepness is an essential property in two aspects: (1) First, it is related to the existence of the MLE of m. Indeed, if $F_{γ}$ is steep and $Y_{1}, \dots, Y_{n}$ are n i.i.d. random variables drawn from $F_{γ}$ then the MLE of $m,$ denoted by $\hat{m} = {\bar{Y}}_{n}$ (sample average), exists with probability one, and is given by the gradient of the log-likelihood (cf., [12], Theorem 9.29). (2) Second, steepness is a necessary condition for applying generalized linear models (GLMs) methodology to EDMs (cf., [2,6,13]). Consequently, out of the absolutely continuous ${TBE}_{γ}$ models described above, only those with $γ \in \{0, [2, \infty), \infty\}$ are steep, as their mean parameter space $M_{γ}$ equals the interior of their convex support (i.e., $M_{γ} = R$ for $γ = 0, \infty$ and $M_{γ} = R^{+}$ for $γ \in [2, \infty)$ ). For any $γ < 0,$ ${TBE}_{γ}$ is not steep, as its mean parameter space $M_{γ} = R^{+}$ is a proper subset of its interior convex support $R$ .

GLM applications for

{TBE}_{γ}, γ = 2, 3,

are straightforward and have been analyzed in various references (cf., [13] and the references cited therein). GLM applications for

{TBE}_{γ}

(

γ > 2, γ \neq 3

), are discussed and presented by [6], who also maintained an R package (see [10]) for these EDMs. Consequently, we are left with absolutely continuous

{TBE}_{γ}

models supported on the whole real line (

γ < 0, γ = 0, γ = \infty) .

As already noted, the

{TBE}_{γ}

models for

γ < 0

are not steep—a fact which precludes them from being candidates for GLM analysis. This is quite unfortunate, as such a subclass comprises an infinite set of absolutely continuous EDMs (with respect to the Lebesgue measure) that are supported on the whole real line. Thus, the only remaining steep EDMs supported on the whole real line are the normal (

{TBE}_{0}

) and the EDM generated by the Landau distribution (

{TBE}_{\infty}

), where both are suitable for GLM applications. The normal EDM constitutes the classical linear regression model, whereas the

{TBE}_{\infty}

requires further analysis by GLM methodology, an analysis that establishes the core of this paper. Such an analysis would complement the results of [6] and accomplish a complete analysis of all absolutely continuous TBE models.

The paper is organized as follows. Section 2 presents some preliminaries on natural exponential families (NEFs), additive, and reproductive EDMs. Section 3 introduces the

{TBE}_{\infty}

—the EDM generated by the Landau distribution and the GLM ingredients needed for its analysis. Mainly, we present its link function and total and scaled deviance. In Section 4, we study its analysis of deviance, derive the asymptotic properties of the MLEs of the covariate parameters

β

, and obtain the asymptotic distribution of deviance, using the saddlepoint approximation. Section 5 includes the estimation algorithm, a brief description of our R package, and simulation studies. In Section 6, we provide the analysis of real data, including applications to three real datasets. It is demonstrated there that the GLM using the

{TBE}_{\infty}

performs better than the linear model based on the normal distribution. Some concluding remarks are presented in Section 7. Proofs of statements (propositions, corollaries, and theorems) in this paper are relegated to Appendix A.

2. Preliminaries: NEFs, Mean Value Representation, and Additive and Reproductive EDMs

NEFs. The preliminaries in the sequel hold for any positive Radon measure

μ (d x)

on

R

. Without loss of generality, we confine our introduction to

μ (d x) = h (x) d x,

where

h (x) d x

is an absolutely continuous positive Radon measure with respect to the Lebesgue measure on the real line. The Laplace transform of

h (x)

and its effective domain are defined, respectively, by

L (θ) = \int_{R} h (x) e^{θ x} d x and D_{h} = \{θ \in R : L (θ) < \infty\} .

Let

Θ = i n t

D_{h}

, and assume

Θ

is non-empty. Then, the NEF generated by h is defined by the densities of the form

h (x; θ) = h (x) exp \{θ x - k (θ)\}, θ \in Θ \subset R,

(1)

where

k (θ) = ln L (θ)

is the cumulant transform of L. The cumulant transform

k (θ)

is a real analytic on

Θ

, implying that the r-th cumulant of

h (x; θ)

is given by

κ_{r} (θ) = d^{r} k (θ) / d θ^{r} .

In particular, the mean, mean parameter space, and variance corresponding to (1) are given, respectively, by

m =

k^{'} (θ)

,

M =

k^{'} (Θ)

, and

k^{″} (θ), θ \in Θ

. As

k^{'}

is strictly increasing, its inverse mapping

ψ : M \to

Θ

is well-defined. So, we denote by

V (m) = k^{″} (ψ (m))

the variance function (VF) corresponding to (1). The pair

(V, M)

uniquely defines the NEF generated by h within the class of NEFs (cf., [14]). Also, V is called the unit VF.

Mean value parameterization. For GLM applications and various other statistical aspects, it is necessary to express the NEF with densities (1) in terms of its mean rather than in terms of the artificial parameter

θ

(for details, see [3]). Indeed, given a VF

(V, M)

then

θ (m)

and

k (θ (m))

are the primitives of

1 / V (m)

and

m / V (m)

and, thus, are given by

θ (m) = \int \frac{d m}{V (m)} and k (θ (m)) = \int \frac{m d m}{V (m)},

(2)

implying that the mean value representation of (1) is given by

h (x; θ (m)) = h (x) exp \{θ (m) x - k (θ (m))\}, m \in M .

Additive EDMs. The Jorgensen set related to (1) is defined by (cf., [2])

Λ = \{p \in R^{+} : p k (θ) is a cumulant transform of some density h (; p) on R\} .

The set

Λ

is not empty, due to convolution. Moreover,

Λ = R^{+}

if h is infinitely divisible, a valid property for all TBE members. Accordingly, the additive EDM (cf., [2]) is defined by densities of the form

h (x; θ, p) = h (x; p) exp \{θ x - p k (θ)\}, θ \in Θ, p \in Λ = R^{+},

(3)

where the VF corresponding to the additive EDM is given by (

p V (m / p), p M) .

Reproductive EDMs. In general, for various statistical aspects, particularly for GLM applications, it is more effective to represent (3) by resembling the normal structure. Such a representation, called the reproductive EDM, is obtained by mapping

x \to y = φ x

, where

φ = 1 / p

. Then, the densities of this mapping have the form (cf., [2,6,15])

f_{φ} (y; θ) = φ^{- 1} h (y φ^{- 1}; θ, φ^{- 1}) = h (y φ^{- 1}; φ^{- 1}) exp \{φ^{- 1} [θ y - k (θ)]\},

(4)

where

y \in φ C_{h}, θ \in Θ, φ \in R^{+}

, and

C_{h}

is the support of h. It is crucial to note that the structure in (4) is not suitable for the discrete case (counting measures on

N

). This is because, for different

φ

’s, it alters the support

C_{h}

of h. In contrast, for the absolutely continuous case, the structure in (4) is appropriate. The VF of the reproductive EDM (4) is given by

(φ V (m), φ M),

(5)

where if

i n t

C_{h} = R

,

R^{+}

or

R^{-}

then

φ M = R

,

R^{+}

or

R^{-},

respectively.

3. GLM Applications for the EDM Generated by the Landau Distribution—Some Basics

In this section, we provide some required components for GLM applications. We first give an expression for the

{TBE}_{\infty}

density. We then present the related link function and scaled deviance.

3.1. Density Function

The

{TBE}_{\infty}

distribution is the EDM generated by the Landau distribution, known as a Tweedie model with power infinity, and it possesses a simple unit VF of the form

(e^{m}, R) .

It is steep (

C = M = R

), infinitely divisible, skewed to the right, leptokurtic (i.e., has fatter tails), and absolutely continuous, supported on the whole real line. It was surveyed in detail and further developed by [3], and was named there as EDM-EVF (exponential VF). Its reproductive EDM density, of the form (4), is

f (y; θ, φ) = h_{φ} (y) exp \{φ^{- 1} [θ y - k (θ)]\}, y \in R, (θ, φ) \in R^{-} \times R^{+},

(6)

where

h_{φ} (y) = \frac{1}{π} \int_{0}^{\infty} e^{(1 - y) t - t log t} φ^{- t} sin (π t) d t,

(7)

k (θ) = θ - θ ln (- θ),

(8)

and VF

(V, M) = (φ e^{m}, R) .

The expressions for

θ (m)

and

k (θ (m))

, needed for its mean value parameterization, are

θ (m) = \int \frac{d m}{V (m)} = - e^{- m} and k (θ (m)) = \int \frac{m d m}{V (m)} = - e^{- m} (m + 1) .

Thus, the density (6) can be written as

f (y; m, φ) = h_{φ} (y) exp \{φ^{- 1} [- e^{- m} y + e^{- m} (m + 1)]\}, y \in R, (m, φ) \in R \times R^{+} .

(9)

If

Y \sim f (y; m, φ)

then we write this as

Y \sim

{TBE}_{\infty}

or we use the standard EDM notation and write

Y \sim EDM - EVF (m, φ)

. The mean, variance, and cumulants of such a Y are

E (Y) = m, Var (Y) = φ e^{m}, κ_{r} (m) = (r - 2)! φ^{r - 1} e^{(r - 1) m}, r \geq 3 .

(10)

3.2. Scaled Deviance and Link Function

We shall now consider two essential ingredients needed for GLM applications of EDM-EVFs (9): namely, the scaled deviance and the link function. These were introduced by [2,15] (see also [6,16]). For GLM applications, we need the following ingredients. Consider

t (y, m) = y θ (m) - k (θ (m)) = - y e^{- m} + e^{- m} (m + 1) .

It is evident that

arg {max}_{m} f (y; m, φ) = arg {max}_{m} t (y, m)

. By taking the partial derivative of

t (y, m)

and setting it to zero, we obtain

\frac{\partial t (y, m)}{\partial m} = y e^{- m} - m e^{- m} \overset{s e t}{=} 0,

implying that

m = y

maximizes

t (y, m)

(since

{TBE}_{\infty}

is steep). Hence, the unit deviance

d (y, m) = 2 [t (y, y) - t (y, m)] = 2 [e^{- y} + e^{- m} (y - m - 1)]

(11)

can be considered as a distance measure with two properties:

d (y, y) = 0

and

d (y, m) > 0

for

y \neq m .

GLMs assume a systematic component with the linear predictor

η = β_{0} + \sum_{j = 1}^{p} β_{j} x_{j} .

This is linked to the mean m through a link function g, such that

g (m) = η

. For

{TBE}_{\infty}

, we choose the canonical (and simple) link function

η = g (m) = θ (m) = - e^{- m} .

Let

y = {(y_{1}, \dots, y_{n})}^{⊤}

be a set of independent observations, where

y_{i} \sim EDM - EVF (m_{i}, φ)

(assuming a single dispersion parameter) is associated with the link function

η_{i} = β_{0} + \sum_{j = 1}^{p} x_{i j} β_{j}, i = 1, . . ., n .

Write

X \in R^{n \times (p + 1)}

for the set of covariates, in which case

η = X β

and the total and scaled deviances are given, respectively, by

D (y, m) = \sum_{i = 1}^{n} d (y_{i}, m_{i})

and

D^{*} (y, m) = \frac{D (y, m)}{φ} = \frac{2}{φ} \sum_{i = 1}^{n} e^{- y_{i}} + e^{- m_{i}} (y_{i} - m_{i} - 1)) .

(12)

Consequently, the log-likelihood is

l (m, φ; y) = \sum_{i = 1}^{n} ln f (y_{i}; y_{i}, φ) - \frac{1}{2} D^{*} (y, m) .

(13)

Let

\hat{β}

be the MLE of

β = {(β_{0}, β_{1}, \dots, β_{p})}^{⊤}

. As in linear models, we aim to estimate

β

and obtain its asymptotic behavior.

4. Asymptotic Properties

This section deals with the saddlepoint approximation and asymptotic behavior of the MLEs of the parameters involved. The section establishes the central core of the asymptotic behavior of all the statistics required for the appropriate analysis of the deviance.

4.1. Asymptotic Properties of MLE

Let us start with the saddlepoint approximation (14) below, which is essential in the asymptotic theory of GLMs. The exact distribution (9) is challenging to handle, due to the cumbersome form of

h_{φ} (y)

(7). The saddlepoint approximation neatly gets rid of it. For more details on this point, see Sections 1.5.3 and 3.5.1 in [2] and Section 5.4.3 in [6]. The following proposition presents the saddlepoint approximation for

{TBE}_{\infty} .

Proposition 1.

Let

Y \sim

{TBE}_{\infty}

. Then, for sufficiently small

φ,

the saddlepoint approximation for the density of Y is given by

f (y; m, φ) \approx \frac{1}{\sqrt{2 π φ V (y)}} exp \{- \frac{1}{2 φ} d (y, m)\},

(14)

where

V (y) = e^{y}

and

d (y, m) = 2 [e^{- y} + e^{- m} (y - m - 1)]

.

Proof.

See Appendix A. □

The following corollary, an immediate consequence of Proposition 1, implies convergence to normality:

Corollary 1.

Let

Y \sim

{TBE}_{\infty}

; then,

\frac{Y - m}{\sqrt{φ}} \overset{d}{\to} N (0, V (m)), a s φ \to 0,

(15)

where

V (m) = e^{m}

and

\overset{d}{\to}

denotes convergence in distribution.

Proof.

See Appendix A. □

Corollary 1 provides the asymptotic normality for a single observation y. For

y = {(y_{1}, . . ., y_{n})}^{⊤}

with

y_{i} \sim

{TBE}_{\infty}

, we have

\frac{Y - m}{\sqrt{φ}} \overset{d}{\to} N (0_{n}, C), a s φ \to 0,

(16)

where

C = d i a g \{V (m_{1}), V (m_{2}), \dots, V (m_{n})\} = d i a g \{e^{m_{1}}, e^{m_{2}}, \dots, e^{m_{n}}\} .

Using (16), the following theorem shows that the MLE of

β

is asymptotically normally distributed.

Theorem 1.

Let

\hat{β}

be the MLE of β and let

X \in R^{n \times (p + 1)}

be the design matrix. If

X X^{⊤}

has bounded eigenvalues, then

\frac{\hat{β} - β_{0}}{\sqrt{φ}} \overset{d}{\to} N (0_{p + 1}, {(X^{⊤} C X)}^{- 1}), a s φ \to 0,

(17)

where

β_{0}

is the true parameter.

Proof.

See Appendix A. □

4.2. Analysis of the Deviance

With

m

and

φ

known, we consider the distribution of deviance. We claim that when the saddlepoint approximation holds (and it does for

{TBE}_{\infty}

) then the scaled deviance distribution follows an approximate chi-square distribution.

Theorem 2.

For the scaled deviance (12), we have

D^{*} (y, m) \overset{d}{\to} χ_{n}^{2}, a s φ \to 0

at the true values of

m = {(m_{1}, \dots, m_{n})}^{⊤}

.

Proof.

See Appendix A. □

When

m

is unknown, it is replaced by its MLE

\hat{m}

. Thus, we define the residual and scaled residual deviances as

D (y, \hat{m}) = \sum_{i = 1}^{n} d (y_{i}, {\hat{m}}_{i})

and

D^{*} (y, \hat{m}) = \frac{D (y, \hat{m})}{φ} .

As the GLM considered in Section 3 is involved with

p + 1

regression parameters, it follows that

D^{*} (y, \hat{m}) \overset{d}{\to} χ_{n - p - 1}^{2}, a s φ \to 0 .

(18)

Generally, the deviance is most useful not as an absolute measure of goodness-of-fit, but rather for comparing two nested models. For example, one may want to test whether incorporating an additional covariate significantly improves the model fit. In this case, the deviance can be employed to compare two nested GLMs that are based on the same EDM but have different fitted systematic components:

Model A : g ({\hat{m}}_{A}) = - e^{- {\hat{m}}_{A}} = {\hat{β}}_{0}^{A} 1_{n} + {\hat{β}}_{1}^{A} x_{1} + \dots + {\hat{β}}_{p_{A}}^{A} x_{p_{A}},

and

Model B : g ({\hat{m}}_{B}) = - e^{- {\hat{m}}_{B}} = {\hat{β}}_{0}^{B} 1_{n} + {\hat{β}}_{1}^{B} x_{1} + \dots + {\hat{β}}_{p_{A}}^{B} x_{p_{A}} + \dots + {\hat{β}}_{p_{B}}^{B} x_{p_{B}},

where

{\hat{β}}_{i}^{A}

denotes the MLE of

β_{i}

under Model A,

{\hat{β}}_{j}^{B}

denotes the MLE of

β_{j}

under Model B, and

x_{j}

is a covariate,

i = 0, 1, \dots, p_{A}, j = 0, 1, \dots, p_{B}

. Note that Model A is a special case of Model B, with

p_{B} > p_{A}

. Accordingly, we consider the following hypotheses, to determine if the simpler Model A is adequate to model the data:

H_{0} : β_{p_{A} + 1} = \dots = β_{p_{B}} = 0 versus H_{1} : β_{j} \neq 0 \exists j \in {p_{A} + 1, \dots, p_{B}} .

(19)

We have previously observed that the total deviance captures that part of the log-likelihood that depends on

m

. Therefore, the following theorem holds, from which it can be seen that (18) is a special case of Theorem 3:

Theorem 3.

If φ is known, the likelihood ratio test (LRT) statistic for comparing Models A and B is

L = 2 {ℓ_{B} - ℓ_{A}} = \frac{D (y, {\hat{m}}_{A}) - D (y, {\hat{m}}_{B})}{φ} .

(20)

Then, under the null hypothesis in (19),

L \overset{d}{\to} χ_{p_{B} - p_{A}}^{2}

as

φ \to 0

.

Proof.

See Appendix A. □

Consider the two models in Theorem 3 with both

m

and

φ

unknown. Then, an estimate of

φ

is required. This is done in Theorem 4, which is deduced from Theorem 3:

Theorem 4.

If φ is unknown, the appropriate statistic for comparing Model A with Model B is

F = \frac{[D (y, {\hat{m}}_{A}) - D (y, {\hat{m}}_{B})] / (p_{B} - p_{A})}{D (y, {\hat{m}}_{B}) / (n - p_{B} - 1)},

(21)

where

D (y, {\hat{m}}_{B}) / (n - p_{B} - 1)

is an estimate of φ based on Model B. Then, under the null hypothesis in (19),

F \overset{d}{\to} F (p_{B} - p_{A}, n - p_{B} - 1), a s φ \to 0 .

Proof.

It suffices to prove the asymptotic independence of

D (y, {\hat{m}}_{A}) - D (y, {\hat{m}}_{B})

and

D (y, {\hat{m}}_{B})

. The proof is similar to Theorem 4.3 in reference [17]. □

Note that our above statements about asymptotic distributions are all based on the assumption that

φ \to 0

. These results are called small-dispersion asymptotics, regardless of the sample size n. Large-sample asymptotics are also well-known and, hence, no further explanations are provided.

5. Simulation Studies

5.1. Implementation

Herein, we discuss the estimation of the unknown parameters in the

{TBE}_{\infty}

GLM: the covariate coefficients

β

and the dispersion parameter

φ

. For the estimation of

β

, we use iteratively reweighted least squares (IRLS). The score vector

U

for

β

is

U (β) = \frac{1}{φ} X^{⊤} W M (y - m),

where

W = d i a g (W_{1}, \dots, W_{n})

and

W_{i} = {[V (m_{i}) {(d η_{i} / d m_{i})}^{2}]}^{- 1}

are called the working weights, and M is the diagonal matrix of the link derivatives

d η_{i} / d m_{i} = e^{- m_{i}}

. The Fisher information matrix for

β

is

I (β) = \frac{1}{φ} X^{⊤} W X .

Thus, an iterative technique using the Newton–Raphson method yields

{\hat{β}}^{(r + 1)} = {\hat{β}}^{(r)} + I {({\hat{β}}^{(r)})}^{- 1} U ({\hat{β}}^{(r)}),

where the Fisher information matrix

I (\cdot)

is used in place of the observed information matrix, and the superscript

(r)

denotes the r-th iterate. The iteration can be re-organized as IRLS (cf., [6]):

{\hat{β}}^{(r + 1)} = {(X^{⊤} W X)}^{- 1} X^{⊤} W z,

where

z

, the working response vector, is given by

z = \hat{η} + M (y - \hat{m}),

and all other quantities on the right-hand side are evaluated at

{\hat{β}}^{(r)}

.

For the estimation of

φ

, we use the mean deviance estimator in [6]. It is easy to show that the MLE of

φ

is the simple mean deviance

D (y, \hat{m}) / n

with the saddlepoint approximation density. When taking into account the estimation of

β

and the residual degrees of freedom, we obtain the mean deviance estimator of

φ

as

\hat{φ} = D (y, \hat{m}) / (n - p - 1) .

We summarize all of the above as Algorithm 1.

Algorithm 1 Estimating

β

and

φ

Based on Iteratively Reweighted Least Squares Estimation (IRLSE) and Mean Deviance

1:: Input: Data ${(y, X)}$ , initial value of $β$ , threshold $τ = 10^{- 8}$ .
2:: Repeat:
3:: Step 1. Obtain $η = X β$ and $m = - log (- η)$ .
4:: Step 2. Calculate

$\begin{matrix} β_{n e w} = {(X^{⊤} W X)}^{- 1} X^{⊤} W z, \end{matrix}$

where $W = d i a g \{e^{m_{1}}, e^{m_{2}}, \dots, e^{m_{n}}\}$ and $z = {(z_{1}, \dots, z_{n})}^{⊤}$ with $z_{i} = η_{i} + (y_{i} - m_{i}) / e^{m_{i}}$ , $i = 1, \dots, n$ .
5:: Step 3. Set $β = β_{n e w}$ .
6:: Until: $∥ β_{n e w} {- β ∥}_{\infty} \leq τ$ .
7:: Step 4. Calculate the total deviance

$\begin{matrix} D (y, m) = 2 \sum_{i = 1}^{n} [e^{- y_{i}} + e^{- m_{i}} (y_{i} - m_{i} - 1)], \end{matrix}$

and $φ = D (y, m) / (n - p - 1)$ .
8:: Output: $β$ and $φ$ .

We have developed an R package named TBEinf [18], which is used in our numerical experiments and is publicly available at https://github.com/xliusufe/TBEinf (accessed on 28 April 2024).

The package includes a program for computing the density of

{TBE}_{\infty}

by a direct calculation (cf., [3]), saddlepoint approximation (cf., [6]), Fourier inversion (cf., [2,16]), and modified W-Transformation (cf., [16,19]). Specifically, the function dTBEinf in the package calculates density by real density when method = “real”, saddlepoint approximation when method = “saddle”, Fourier inversion when method = “finverse”, and modified W-transformation when method = “mWtrans”.

Also, the package applies GLMs methodology to

{TBE}_{\infty}

for estimation and prediction. The estimates of the covariate coefficients

β

and the dispersion parameter

φ

are obtained through Algorithm 1.

5.2. Simulation Studies

Firstly, we generated simulated data using (16). We let the sample sizes be

n = 100,

200, 400,

800, respectively, and the true values be

β_{0} = [- 0.1, - 0.2, - 0.3, - 0.4, - 0.5, - 0.6, - 0.7, - 0.8,

- 0.9, - 1.0]

,

φ_{0} = 0.01

. The first column of

X

was a 1-vector, and all the other elements were random numbers sampled from

U (0, 1)

. We set 1000 repetitions and then generated 1000

y

for each n. We estimated

β

and

φ

according to Algorithm 1.

By applying Algorithm 1, the average value of estimated

φ

was

\bar{φ} = 3.59 \times 10^{- 5}

for

n = 100

,

\bar{φ} = 3.77 \times 10^{- 5}

for

n = 200

,

3.66 \times 10^{- 5}

for

n = 400

, and

3.73 \times 10^{- 5}

for

n = 800

. It can be seen that they were relatively small compared to the true value

φ_{0} = 0.01

. This is because the total deviance was very small.

Table 1 lists the simulation results calculated by applying Algorithm 1 with varying sample sizes

n = 100,

200, 400, 800

. Herein, sd denotes the standard deviation (SD), which is computed by

sd ({\hat{β}}_{0, k}) = \sqrt{\sum_{j = 1}^{1000} {({\hat{β}}_{0, k}^{(j)} - {\bar{\hat{β}}}_{0, k})}^{2} / 999}

for

k = 1, \dots, 10

, where

{\hat{β}}_{0, k}^{(j)}

is the estimate of

β_{0, k}

at the j-th repetition and

{\bar{\hat{β}}}_{0, k}

is the average of

{{\hat{β}}_{0, k}^{(1)}, \dots, {\hat{β}}_{0, k}^{(1000)}}

. Also, se denotes the standard error (SE) of

\hat{β}

, which can be calculated using (17) by

se ({\hat{β}}_{0, k}) = \sqrt{\bar{φ} b_{k}}

, where

\bar{φ}

is the average value of estimated

φ

, and

b_{k}

is the k-th diagonal element of

{(X^{⊤} C X)}^{- 1}

. From Table 1, we can see that the average bias was around

10^{- 5}

and SD was around

10^{- 3}

, which demonstrated the estimation procedure performs well and stably. It is observed that the SDs were all close to the SEs, and they all decreased as n increased, which implies that the simulation results verify that the asymptotic properties are reasonable.

Table 1. Average of bias, sd, and se of estimated

β

for

n = 100,

200, 400, 800

.

β_{0}

is the true value of

β

.

β_{0, 1}

is the intercept term.

6. Real Data Analysis

We present the proposed estimation procedure through applications to three real datasets. The first and second datasets, grazing and hcrabs, are both from the R package ‘GLMsData’ (see [6,20]). The last one is Boston housing data.

6.1. Dataset “Grazing”

This dataset reveals the density of understorey birds across a series of sites located on either side of a stockproof fence, in two distinct areas. It has the potential to provide insights into the impact of habitat fragmentation on bird populations (cf., [20]):

Sample size: $n = 62$ ;
The number of variables: $p = 3$ ;
Variables description; see Table 2.

Table 2. Variables description of grazing dataset.

To verify the appropriateness of the

{TBE}_{\infty}

GLM for the data, we evaluated the prediction performance of

{TBE}_{\infty}

GLM and compared it with a linear model. We conducted 500 random splits of the 62 observations. In each split, we randomly selected 80% as the training set and the rest as the testing set

{(y_{t e s t, i}, x_{t e s t, i}), i = 1, \dots, 13}

, where 13 was the result of multiplying 62 by 20% and rounding up. We applied both

{TBE}_{\infty}

GLM and the linear model to the training set and estimated the coefficients.

By applying Algorithm 1 for the

{TBE}_{\infty}

GLM, the estimates of

β

and

φ

were

\begin{matrix} \hat{β} = {(- 0.392, 0.013, - 0.017)}^{⊤} \end{matrix}

and

\hat{φ} = 0.040

. Then, we predicted

y_{t e s t, i}

by

{\hat{y}}_{i} = - log (- {\hat{η}}_{i})

, where

{\hat{η}}_{i} = x_{t e s t, i}^{⊤} \hat{β}

. We let

{\hat{ϵ}}_{i} = y_{t e s t, i} - {\hat{y}}_{i}

and calculated the mean squared error (MSE) as MSE

(\hat{y}) = \frac{1}{13} \sum_{i = 1}^{13} {\hat{ϵ}}_{i}^{2}

, where

\hat{y} = ({\hat{y}}_{1}, \dots, {\hat{y}}_{13})

. In the linear model, we estimated

β

as

\tilde{β}

, using least squares (without

φ

). Then, we predicted

y_{t e s t, i}

by

{\tilde{y}}_{i} = x_{t e s t, i}^{⊤} \tilde{β}

. We let

{\tilde{ϵ}}_{i} = y_{t e s t, i} - {\tilde{y}}_{i}

and calculated MSE

(\tilde{y}) = \frac{1}{13} \sum_{i = 1}^{13} {\tilde{ϵ}}_{i}^{2}

, where

\tilde{y} = ({\tilde{y}}_{1}, \dots, {\tilde{y}}_{13})

.

Thus, we could compute the average and sd of the predictions’ MSEs of both models under 500 random splits. For

{TBE}_{\infty}

GLM, we obtained the average and sd of the MSEs to be 0.111 and 0.017, respectively. For the linear model, we obtained the average and sd of the MSEs to be 0.760 and 0.238, respectively. Thus, it can be seen that the results for the

{TBE}_{\infty}

GLM performed much better than the linear model for both average and sd. Additionally, we calculated the Bayesian information criterion (BIC) for both models, resulting in

{BIC (TBE}_{\infty}) = - 267.028

and

BIC (LM) = 385.949

, where LM denotes the linear model. It is evident that the BIC for

{TBE}_{\infty}

GLM was significantly lower than that for the linear model, indicating a better model fit.

6.2. Dataset “Hcrabs”

This dataset describes the number of male crabs attached to female horseshoe crabs (cf., [20]):

Sample size: $n = 173$ ;
The number of variables: $p = 5$ ;
Variables description; see Table 3 below.

Table 3. Variables description of hcrabs dataset.

As we did with processing the first dataset above, we meticulously conducted 500 random splits of the 173 observations. In each split, we carefully selected 80% as the training set and the rest as the testing set

{(y_{t e s t, i}, x_{t e s t, i}), i = 1, \dots, 35}

, where 35 was the result of multiplying 173 by 20% and rounding up. We applied both

{TBE}_{\infty}

GLM and the linear model, ensuring the validity of our approach.

The estimates of

β

and

φ

for the

{TBE}_{\infty}

GLM were

\begin{matrix} \hat{β} = {(- 1.276, 0.001, 0.002, 0.885, 0.009)}^{⊤} \end{matrix}

and

\hat{φ} = 0.004

. Then, we predicted

y_{t e s t, i}

by

{\hat{y}}_{i} = - log (- {\hat{η}}_{i})

, where

{\hat{η}}_{i} = x_{t e s t, i}^{⊤} \hat{β}

. We let

{\hat{ϵ}}_{i} = y_{t e s t, i} - {\hat{y}}_{i}

and calculated MSE

(\hat{y}) = \frac{1}{35} \sum_{i = 1}^{35} {\hat{ϵ}}_{i}^{2}

, where

\hat{y} = ({\hat{y}}_{1}, \dots, {\hat{y}}_{35})

. In the linear model, we estimated

β

as

\tilde{β}

using least squares. Then, we predicted

y_{t e s t, i}

by

{\tilde{y}}_{i} = x_{t e s t, i}^{⊤} \tilde{β}

. We let

{\tilde{ϵ}}_{i} = y_{t e s t, i} - {\tilde{y}}_{i}

and calculated MSE

(\tilde{y}) = \frac{1}{35} \sum_{i = 1}^{35} {\tilde{ϵ}}_{i}^{2}

, where

\tilde{y} = ({\tilde{y}}_{1}, \dots, {\tilde{y}}_{35})

.

For

{TBE}_{\infty}

GLM, we ascertained the average and sd of the MSEs to be 0.011 and 0.004, respectively. For the linear model, we ascertained the average and sd of the MSEs to be 0.837 and 0.276, respectively. Here, again, it can be seen that the results for the

{TBE}_{\infty}

GLM performed much better than the linear model for both average and sd. We calculated the BIC for both models, resulting in

{BIC (TBE}_{\infty}) = 48.704

and

BIC (LM) = 150.366

. It is clear that the BIC for

{TBE}_{\infty}

GLM was lower than that for the linear model, indicating a superior model fit.

6.3. Dataset “Boston Housing”

This dataset is taken from Harrison Jr and Rubinfeld 1978, including 14 variables that were measured across 506 census tracts in the Boston area. The response variable can be the logarithm of the median value of the houses in those census tracts of the Boston Standard Metropolitan Statistical Area:

Sample size: $n = 506$ ;
The number of variables: $p = 14$ ;
Variables description; see Table 4.

Table 4. Variables description of Boston housing dataset.

Again, we conducted 500 random splits of the 506 observations. In each split, we carefully selected 80% as the training set and the rest as the testing set

{(y_{t e s t, i}, x_{t e s t, i}),

i = 1, \dots, 102}

, where 102 was the result of multiplying 506 by 20% and rounding up. We applied both

{TBE}_{\infty}

GLM and the linear model, comparing the performance.

For the

{TBE}_{\infty}

GLM, the estimates of

β

and

φ

were

\begin{matrix} \hat{β} = & (- 0.142, - 0.027, 0.010, 0.009, 0.008, - 0.148, 0.270, 0.007, - 0.078, 0.068, \\ {- 0.084, - 0.241, 0.062, - 0.155)}^{⊤} \end{matrix}

and

\hat{φ} = 0.010

. Then, we predicted

y_{t e s t, i}

by

{\hat{y}}_{i} = - log (- {\hat{η}}_{i})

, where

{\hat{η}}_{i} = x_{t e s t, i}^{⊤} \hat{β}

. We let

{\hat{ϵ}}_{i} = y_{t e s t, i} - {\hat{y}}_{i}

and calculated MSE

(\hat{y}) = \frac{1}{102} \sum_{i = 1}^{102} {\hat{ϵ}}_{i}^{2}

, where

\hat{y} = ({\hat{y}}_{1}, \dots, {\hat{y}}_{102})

. In the linear model, we estimated

β

as

\tilde{β}

, using least squares. Then, we predicted

y_{t e s t, i}

by

{\tilde{y}}_{i} = x_{t e s t, i}^{⊤} \tilde{β}

. We let

{\tilde{ϵ}}_{i} = y_{t e s t, i} - {\tilde{y}}_{i}

and calculated MSE

(\tilde{y}) = \frac{1}{102} \sum_{i = 1}^{102} {\tilde{ϵ}}_{i}^{2}

, where

\tilde{y} = ({\tilde{y}}_{1}, \dots, {\tilde{y}}_{102})

.

For

{TBE}_{\infty}

GLM, we ascertained the average and sd of the MSEs to be 0.031 and 0.009, respectively. For the linear model, we ascertained the average and sd of the MSEs to be 0.041 and 0.010, respectively. It can be seen that this dataset was appropriate for the linear model, and that our

{TBE}_{\infty}

GLM could also fit well, which, to some extent, reflects the wide application of

{TBE}_{\infty}

GLM. It can also be seen that the results for the

{TBE}_{\infty}

GLM were slightly better than the linear model for both average and sd. We computed the BIC for both models, yielding

{BIC (TBE}_{\infty}) = - 332.293

and

BIC (LM) = - 140.174

. The lower BIC of

TBE \infty

GLM compared to the linear model indicated a superior model fit.

7. Conclusions

In this paper, we were interested in GLM methodology applied to the

{TBE}_{\infty}

—the EDM generated by the Landau distribution, an EDM supported on the real line. We introduced its density function, deviance, and link function. We considered the saddlepoint approximation approach for

Y \sim

{TBE}_{\infty}

and then deduced the convergence of Y to normality. Based on the small dispersion and saddlepoint approximation, we derived that the asymptotic distribution of MLE for

\hat{β}

was normal. The analysis of deviance was also studied, considering different situations of

φ

and

m

. In numerical studies, we first estimated

β

and

φ

using Algorithm 1 and then evaluated its estimation performance. We reported averages of bias, standard deviations (SDs), and standard errors (SEs) in a simulation study. We demonstrated that the biases and SDs were relatively small and that the SDs were close to the SEs. As for applications to the three datasets of real data, the results for

{TBE}_{\infty}

GLM showed much better performance than the linear models. To some extent, this indicated the widespread applications of

{TBE}_{\infty}

. We also composed an R package for GLM applications of

{TBE}_{\infty}

.

We trust that the proposed

{TBE}_{\infty}

GLM will be well utilized in modeling more real data and various statistical purposes.

Author Contributions

Conceptualization, S.K.B.-L.; methodology, S.K.B.-L., X.L. and Z.X.; software, X.L. and Z.X.; validation, X.L. and Z.X.; formal analysis, X.L. and Z.X.; investigation, S.K.B.-L., X.L., A.R. and Z.X.; resources, S.K.B.-L. and X.L.; data curation, S.K.B.-L., X.L. and Z.X.; writing—original draft preparation, S.K.B.-L., X.L. and Z.X.; writing—review and editing, S.K.B.-L., X.L., A.R. and Z.X.; visualization, X.L. and Z.X.; supervision, S.K.B.-L. and X.L.; project administration, S.K.B.-L.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

The research of Liu and Xiang was funded by the National Natural Science Foundation of China (12271329, 72331005), the Program for Innovative Research Team of SUFE, the Shanghai Research Center for Data Science and Decision Technology, the Open Research Fund of the Yunnan Key Laboratory of Statistical Modeling and Data Analysis, Yunnan University, and the Open Research Fund of the Key Laboratory of Analytical Mathematics and Applications (Fujian Normal University), Ministry of Education, P. R. China. The research of Bar-Lev and Ridder was funded by STAR (Stochastics—Theoretical and Applied Research), one of the four mathematics clusters within the Dutch Research Council (NWO).

Data Availability Statement

All real datasets used in this manuscript are explicitly displayed in the paper.

Acknowledgments

We thank two reviewers for helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

EDM	exponential dispersion model
EVF	exponential variance function
VF	variance function
TBE	Tweedie, Bar-Lev, and Enis
NEF	natural exponential family
LRT	likelihood ratio test
IRLS	iteratively reweighted least squares
SD	standard deviation
SE	standard error

Appendix A. Proofs

Proof of Proposition 1.

For

Y \sim

{TBE}_{\infty}

, by steepness and (6), the characteristic function of Y is

\begin{matrix} Φ (t; θ, φ) & = E [exp (i t y)] \\ = \int_{- \infty}^{\infty} exp (i t y) h_{φ} (y) exp \{φ^{- 1} [θ y - k (θ)]\} d y \\ = \int_{- \infty}^{\infty} h_{φ} (y) exp \{φ^{- 1} [(θ + i t φ) y - k (θ)]\} d y \\ = exp \{\frac{k (θ^{'}) - k (θ)}{φ}\} \int_{- \infty}^{\infty} h_{φ} (y) exp \{\frac{θ^{'} y - k (θ^{'})}{φ}\} d y \\ = exp \{\frac{k (θ^{'}) - k (θ)}{φ}\}, \end{matrix}

where

θ^{'} = θ + i t φ

. The last equation holds, since the integrand is an EDM density function written in terms of

θ^{'}

rather than

θ

. If

Φ (t; θ, φ)

is absolutely integrable, then by the Fourier inversion theorem the probability density function of Y is

\begin{matrix} f (y; m, φ) & = \frac{1}{2 π} \int_{- \infty}^{\infty} Φ (t; θ, φ) exp (- i t y) d t \\ = \frac{1}{2 π} \int_{- \infty}^{\infty} exp \{\frac{k (θ + i t φ) - k (θ)}{φ} - i t y\} d t \\ = \frac{1}{2 π φ} \int_{- \infty}^{\infty} exp \{\frac{k (θ + i s) - k (θ) - i s y}{φ}\} d s, \end{matrix}

where i is the complex imaginary unit and

s = t φ

.

Since

m = k^{'} (θ)

and

ψ

is the inverse mapping of

k^{'}

, we have

θ = ψ (m)

. Let

\tilde{θ} = ψ (y)

; then,

y = k^{'} (\tilde{θ})

. Since the integrand is analytic, we may move the path of integration from

(- \infty, \infty)

to

i (θ - \tilde{θ}) + (- \infty, \infty)

. The density then becomes

f (y; m, φ) = \frac{1}{2 π φ} \int_{- \infty}^{\infty} exp \{\frac{k (\tilde{θ} + i s) - (\tilde{θ} + i s) y + θ y - k (θ)}{φ}\} d s .

(A1)

We introduce the unit deviance,

\begin{matrix} d (y, m) & = 2 [t (y, y) - t (y, m)] \\ = 2 [y θ (y) - k (θ (y)) - y θ (m) + k (θ (m))] \\ = 2 [y ψ (y) - k (ψ (y)) - y θ + k (θ)] \\ = 2 [y \tilde{θ} - k (\tilde{θ}) - y θ + k (θ)] . \end{matrix}

Let

φ \to 0

; then, for every fixed t,

s = t φ \to 0

. By expanding k around

\tilde{θ}

, we obtain

\begin{matrix} k (\tilde{θ} + i s) & = k (\tilde{θ}) + i s k^{'} (\tilde{θ}) + \frac{1}{2} {(i s)}^{2} k^{″} (\tilde{θ}) + o ({(i s)}^{2}) \\ = k (\tilde{θ}) + i s y - \frac{1}{2} s^{2} k^{″} (\tilde{θ}) + o (s^{2}) \\ = k (\tilde{θ}) + i s y - \frac{1}{2} s^{2} V (y) + o (s^{2}), \end{matrix}

where

k^{″} (\tilde{θ}) = V (y)

, since

V (m) = k^{″} (ψ (m))

and

ψ (y) = \tilde{θ}

.

We now consider the term in curly brackets in (A1). By introducing the unit deviance and expanding k around

\tilde{θ}

, this term becomes

\begin{matrix} φ^{- 1} [k (\tilde{θ} + i s) - (\tilde{θ} + i s) y + θ y - k (θ)] \\ = φ^{- 1} [k (\tilde{θ} + i s) - (\tilde{θ} + i s) y + \tilde{θ} y - k (\tilde{θ}) - d (y, m) / 2] \\ \approx φ^{- 1} [- \frac{1}{2} s^{2} V (y) - \frac{1}{2} d (y, m)], \end{matrix}

where high-order terms of

s^{2}

are discarded. From the result

\int_{- \infty}^{\infty} exp \{- \frac{V (y) s^{2}}{2 φ}\} d s = \sqrt{\frac{2 π φ}{V (y)}},

we obtain the approximation for

φ > 0

small enough,

\begin{matrix} f (y; m, φ) & \approx \frac{1}{2 π φ} \int_{- \infty}^{\infty} exp \{φ^{- 1} [- \frac{1}{2} s^{2} V (y) - \frac{1}{2} d (y, m)]\} d s \\ = \frac{1}{2 π φ} \sqrt{\frac{2 π φ}{V (y)}} exp \{- \frac{1}{2 φ} d (y, m)\} \\ = \frac{1}{\sqrt{2 π φ V (y)}} exp \{- \frac{1}{2 φ} d (y, m)\} . \end{matrix}

This completes the proof of Proposition 1 (cf., [2]). □

Proof of Corollary 1.

Before proving this, we prove a lemma: the unit scaled deviance behaves approximately as the normal unit deviance near its minimum

m = y

(cf., [2]). Let

d^{*} (y, m) = \frac{d (y, m)}{φ} = \frac{2 [e^{- y} + e^{- m} (y - m - 1)]}{φ} .

For

Y \sim

{TBE}_{\infty}

, we first show that

\frac{\partial^{2} d^{*}}{\partial y^{2}} (m, m) = \frac{\partial^{2} d^{*}}{\partial m^{2}} (m, m) = - \frac{\partial^{2} d^{*}}{\partial m \partial y} (m, m) .

(A2)

By a simple calculation, we have

\frac{\partial d^{*}}{\partial y} (y, m) = \frac{2}{φ} (e^{- m} - e^{- y}), i . e ., \frac{\partial d^{*}}{\partial y} (m, m) = 0 .

\frac{\partial d^{*}}{\partial m} (y, m) = \frac{2}{φ} (- e^{- m} (y - m)), i . e ., \frac{\partial d^{*}}{\partial m} (m, m) = 0 .

\frac{\partial^{2} d^{*}}{\partial y^{2}} (y, m) = \frac{2}{φ} e^{- y}, i . e ., \frac{\partial^{2} d^{*}}{\partial y^{2}} (m, m) = \frac{2}{φ} e^{- m} .

\frac{\partial^{2} d^{*}}{\partial m^{2}} (y, m) = \frac{2}{φ} e^{- m} (y + 1 - m), i . e ., \frac{\partial^{2} d^{*}}{\partial m^{2}} (m, m) = \frac{2}{φ} e^{- m} .

\frac{\partial^{2} d^{*}}{\partial m \partial y} (y, m) = - \frac{2}{φ} e^{- m}, i . e ., - \frac{\partial^{2} d^{*}}{\partial m \partial y} (m, m) = \frac{2}{φ} e^{- m} .

Thus, (A2) holds. Then, the unit variance function satisfies the relationship

φ V (m) = φ e^{m} = \frac{2}{\frac{\partial^{2} d^{*}}{\partial y^{2}} (m, m)} = \frac{2}{\frac{\partial^{2} d^{*}}{\partial m^{2}} (m, m)} = - \frac{2}{\frac{\partial^{2} d^{*}}{\partial m \partial y} (m, m)} .

(A3)

Furthermore, (A3) implies the following second-order Taylor expansion of

d^{*}

near its minimum (

δ^{2} \to 0

):

d^{*} (m_{0} + a δ, m_{0} + b δ) = \frac{δ^{2}}{φ V (m_{0})} {(a - b)}^{2} + o (δ^{2}) .

(A4)

This expansion shows that the unit deviance behaves approximately as does the normal unit deviance near its minimum.

Let the mean of Y be

m_{0}

and

Z = \frac{Y - m_{0}}{\sqrt{φ}}

. From the saddlepoint approximation (14) of Y, we ascertain, for

φ > 0

small enough, that

f (y; m_{0}, φ) \approx \frac{1}{\sqrt{2 π φ V (y)}} exp \{- \frac{1}{2} d^{*} (y, m_{0})\} .

Then, substituting (A4) and taking

δ = \sqrt{φ}, b = 0

, we ascertain, for

φ > 0

small enough, that

\begin{matrix} f (z; φ) & = \sqrt{φ} f (y; m_{0}, φ) \\ \approx \sqrt{φ} \frac{1}{\sqrt{2 π φ V (\sqrt{φ} z + m_{0})}} exp \{- \frac{1}{2} d^{*} (\sqrt{φ} z + m_{0}, m_{0})\} \\ = \frac{1}{\sqrt{2 π V (\sqrt{φ} z + m_{0})}} exp \{- \frac{1}{2} \frac{φ}{φ V (m_{0})} z^{2} + o (φ)\} \\ \approx \frac{1}{\sqrt{2 π V (m_{0})}} exp \{- \frac{1}{2 V (m_{0})} z^{2}\} . \end{matrix}

The last “≈” holds because of the continuity of

V (x) = e^{x}

. □

Proof of Theorem 1.

First, we prove the consistency of

\hat{β}

, i.e.,

\hat{β} \overset{P}{\to} β_{0}

as

φ \to 0

, where

β_{0}

is the true parameter. We shall consider the behavior of the log-likelihood

l (β)

on the sphere

Q_{h}

with center at the true point

β_{0}

and radius h. We will show that for any sufficiently small h, the probability of

l (β) < l (β_{0})

tends to 1 at all points

β

on the surface of

Q_{h}

, i.e.,

∥ β - β_{0} ∥_{2} = h

(cf., [17]). Note that this method also handles the proof of the MLE’s consistency in large-sample asymptotics.

By (13), we have

l (β) = l (m (β); φ, y) = \sum_{i = 1}^{n} ln f (y_{i}; y_{i}, φ) - \frac{1}{2} D^{*} (y, m) .

Thus, we obtain

\begin{matrix} \frac{\partial l}{\partial β_{j}} & = \sum_{i = 1}^{n} \frac{\partial l}{\partial m_{i}} \frac{\partial m_{i}}{\partial η_{i}} \frac{\partial η_{i}}{\partial β_{j}} \\ = \sum_{i = 1}^{n} φ^{- 1} (y_{i} - m_{i}) e^{- m_{i}} (- η_{i}^{- 1}) x_{i j} \\ = \sum_{i = 1}^{n} φ^{- 1} (y_{i} - m_{i}) e^{- m_{i}} e^{m_{i}} x_{i j} \\ = \sum_{i = 1}^{n} φ^{- 1} (y_{i} - m_{i}) x_{i j} . \end{matrix}

This can be written in a matrix form as

\frac{\partial l}{\partial β} = φ^{- 1} X^{⊤} (y - m) .

(A5)

Additionally, we have

\begin{matrix} \frac{\partial^{2} l}{\partial β^{⊤} \partial β} & = \frac{\partial}{\partial β^{⊤}} (\frac{\partial l}{\partial β}) = \frac{\partial}{\partial m^{⊤}} (\frac{\partial l}{\partial β}) \frac{\partial m}{\partial β^{⊤}} \\ = \frac{\partial}{\partial m^{⊤}} (\frac{\partial m^{⊤}}{\partial β} \frac{\partial l}{\partial m}) \frac{\partial m}{\partial β^{⊤}} \\ = \frac{\partial m^{⊤}}{\partial β} \frac{\partial^{2} l}{\partial m^{⊤} \partial m} \frac{\partial m}{\partial β^{⊤}} . \end{matrix}

We denote

m_{0} {= m |}_{β = β_{0}},

M (β) = \frac{\partial m}{\partial β^{⊤}}, M (β_{0}) = M (β) |_{β = β_{0}},

K (m) = \frac{\partial^{2} l}{\partial m^{⊤} \partial m}, K (m_{0}) = K (m) {|_{β = β_{0}} = K (m) |}_{m = m_{0}} .

Through differential operations, we obtain

K (m) = d i a g \{φ^{- 1} (m_{1} - y_{1} - 1) e^{- m_{1}}, \dots, φ^{- 1} (m_{n} - y_{n} - 1) e^{- m_{n}}\} .

Let

W (m_{0}) = K (m_{0}) |_{y = m_{0}} = d i a g \{- φ^{- 1} exp (- m_{1}^{0}), \dots, - φ^{- 1} exp (- m_{n}^{0})\},

where

m_{0} = (m_{1}^{0}, \dots, m_{n}^{0})

. Obviously,

W (m_{0})

is negative definite. By (16), we know

Y \overset{P}{\to} m_{0}

as

φ \to 0

(by use of the Chebyshev inequality); then

K (m_{0}) \overset{P}{\to} W (m_{0}) .

(A6)

For sufficiently small h, by expanding

l (β)

around the true point

β_{0}

and multiplying by

φ

, we have

\begin{matrix} φ [l (β) - l (β_{0})] \\ = φ {(β - β_{0})}^{⊤} \frac{\partial l}{\partial β} |_{β = β_{0}} + \frac{1}{2} φ {(β - β_{0})}^{⊤} \frac{\partial^{2} l}{\partial β^{⊤} \partial β} |_{β = β_{0}} (β - β_{0}) + o (φ ∥ β - β_{0} ∥_{2}^{2}) \\ = φ {(β - β_{0})}^{⊤} \frac{\partial l}{\partial β} |_{β = β_{0}} + \frac{1}{2} φ {(β - β_{0})}^{⊤} M {(β_{0})}^{⊤} K (m_{0}) M (β_{0}) (β - β_{0}) + o (h^{2}) \\ : = S_{1} + S_{2} + o (h^{2}) . \end{matrix}

(A7)

We now consider

S_{1}

. Suppose that

X X^{⊤}

has bounded eigenvalues and its maximum eigenvalue is

λ_{m a x}

. By (A5) and the Cauchy–Schwarz inequality, we have

\begin{matrix} | S_{1} | & = | {(β - β_{0})}^{⊤} X^{⊤} (y - m_{0}) | \\ \leq ∥ β - β_{0} ∥_{2} {∥ X^{⊤} (y - m_{0}) ∥}_{2} \\ = h ∥ X^{⊤} (y - m_{0}) ∥_{2} \\ = h {[{(y - m_{0})}^{⊤} X X^{⊤} (y - m_{0})]}^{1 / 2} \\ \leq h λ_{m a x}^{1 / 2} {∥ y - m_{0} ∥}_{2} \\ \leq h λ_{m a x}^{1 / 2} {∥ y - m_{0} ∥}_{1} . \end{matrix}

Consider

∥ y - m_{0} ∥_{1}

. For each

i \in \{1, 2, \dots, n\}

, by (16) and the Chebyshev inequality we obtain, by letting

φ \to 0

,

P (φ^{- 1 / 2} | y_{i} - m_{i}^{0} | \geq \frac{h^{2}}{n}) \leq \frac{n^{2} exp (m_{i}^{0})}{h^{4}};

that is,

P (| y_{i} - m_{i}^{0} | \geq \frac{h^{2}}{n}) \leq \frac{φ n^{2} exp (m_{i}^{0})}{h^{4}},

implying that

\begin{matrix} P (∥ y - m_{0} ∥_{1} \geq h^{2}) & = P (\sum_{i = 1}^{n} | y_{i} - m_{i}^{0} | \geq h^{2}) \\ \leq P (⋃_{i = 1}^{n} \{| y_{i} - m_{i}^{0} | \geq \frac{h^{2}}{n}\}) \\ \leq \sum_{i = 1}^{n} P (| y_{i} - m_{i}^{0} | \geq \frac{h^{2}}{n}) \\ \leq \frac{φ n^{2}}{h^{4}} \sum_{i = 1}^{n} exp (m_{i}^{0}) . \end{matrix}

The last term of the above inequality tends to 0, since

φ \to 0

, and the other terms are constants. Thus, we have

P (∥ y - m_{0} ∥_{1} < h^{2}) \geq 1 - \frac{φ n^{2}}{h^{4}} \sum_{i = 1}^{n} exp (m_{i}^{0}) .

That is,

∥ y - m_{0} ∥_{1} < h^{2}

with probability tending to 1. Returning to

| S_{1} |

, we have, with probability tending to 1,

| S_{1} | \leq λ_{m a x}^{1 / 2} h^{3} .

(A8)

The above argument about

∥ y - m_{0} ∥_{1}

is based on the result of convergence in the distribution of

\frac{Y - m_{0}}{\sqrt{φ}} \overset{d}{\to} N (0_{n}, C)

. In fact, the same conclusion can be achieved by the convergence in probability of

Y \overset{P}{\to} m_{0}

. It is crucial to note that the argument above demonstrates how convergence in probability can be achieved through convergence in distribution when

φ \to 0

. This allows us to directly apply convergence in probability and omit any mention of convergence in distribution in terms of

S_{2}

below.

We now consider

S_{2}

. We have

\begin{matrix} 2 S_{2} & = φ {(β - β_{0})}^{⊤} M {(β_{0})}^{⊤} (K (m_{0}) - W (m_{0})) M (β_{0}) (β - β_{0}) \\ + φ {(β - β_{0})}^{⊤} M {(β_{0})}^{⊤} W (m_{0}) M (β_{0}) (β - β_{0}) . \end{matrix}

For the first term, we note that by (A6),

φ M {(β_{0})}^{⊤} (K (m_{0}) - W (m_{0})) M (β_{0}) \overset{P}{\to} 0

. We use an argument analogous to that used for

S_{1}

but replace the Chebyshev inequality with the definition of convergence in probability. Thus, the absolute value of the first term is less than a constant multiple of

h^{4}

with probability tending to 1. The second term is a negative quadratic form in

M (β_{0}) (β - β_{0})

. Let

M (β_{0}) (β - β_{0}) = {(a_{1}, a_{2}, \dots, a_{n})}^{⊤}

. Then, by a straightforward calculation, we have

φ {(β - β_{0})}^{⊤} M {(β_{0})}^{⊤} W (m_{0}) M (β_{0}) (β - β_{0}) = - \sum_{i = 1}^{n} a_{i}^{2} exp (- m_{i}^{0}) = O (h^{2}) .

Thus,

2 S_{2}

is negative and we obtain

| S_{2} | = O (h^{2}) .

(A9)

So, with (A7)–(A9), we have

l (β) - l (β_{0}) < 0

(A10)

for sufficiently small h.

Because

l (β)

is continuous and differentiable on

Q_{h}

, there must be a local maximum point

\hat{β}

that satisfies

\frac{\partial l}{\partial β} |_{β = \hat{β}} = 0 .

Combining this with (A10), we obtain

P (∥ \hat{β} - β_{0} ∥ < h) \to 1 .

So, when

φ \to 0

, we have

\hat{β} \overset{P}{\to} β_{0} .

Now, we will show that

\hat{β} - β_{0} \overset{d}{\to} N (0_{p + 1}, S)

, where S is a covariance matrix. Denote

l^{'} (β) = \frac{\partial l}{\partial β} .

By expanding

l^{'} (β)

around the true point

β_{0}

, we obtain

l^{'} (β) = l^{'} (β_{0}) + \frac{\partial^{2} l}{\partial β^{⊤} \partial β} |_{β = β_{0}} (β - β_{0}) + \dots,

where higher-order terms are ignored. Replace

β

with

\hat{β}

(we can do this since

\hat{β} \overset{P}{\to} β_{0}

), and then note that the left side of the equation is

0_{p + 1}

. Rearranging this equation, we have

l^{'} (β_{0}) = - M {(β_{0})}^{⊤} K (m_{0}) M (β_{0}) (\hat{β} - β_{0}) .

(A11)

Consider

l^{'} (β_{0})

and

- M {(β_{0})}^{⊤} K (m_{0}) M (β_{0})

separately. From (A5) and (16) we have

\sqrt{φ} l^{'} (β_{0}) \overset{d}{\to} N (0_{p + 1}, X^{⊤} C X) .

(A12)

So, by a simple calculation, we find that

M (β_{0}) = [\begin{matrix} e^{m_{1}^{0}} & e^{m_{1}^{0}} x_{11} & \dots & e^{m_{1}^{0}} x_{1 p} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ e^{m_{n}^{0}} & e^{m_{n}^{0}} x_{n 1} & \dots & e^{m_{n}^{0}} x_{n p} \end{matrix}] = C X .

Thus, we obtain

- M {(β_{0})}^{⊤} W (m_{0}) M (β_{0}) = X^{⊤} C^{⊤} W (m_{0}) C X = φ^{- 1} X^{⊤} C X .

By (A6), we obtain

- φ M {(β_{0})}^{⊤} K (m_{0}) M (β_{0}) \overset{P}{\to} X^{⊤} C X .

(A13)

By (A11)–(A13), we obtain

\frac{\hat{β} - β_{0}}{\sqrt{φ}} \overset{d}{\to} N (0_{p + 1}, {(X^{⊤} C X)}^{- 1}) .

This completes the proof of Theorem 1. □

Proof of Theorem 2.

First, we show the unit deviance follows an approximate

χ_{1}^{2}

. By Proposition 1, the moment-generating function (MGF) of the unit deviance is approximately

\begin{matrix} M_{d (y, m)} (t) & = E [exp \{d (y, m) t\}] \\ \approx \int_{- \infty}^{\infty} exp \{d (y, m) t\} \frac{1}{\sqrt{2 π φ V (y)}} exp \{- \frac{1}{2 φ} d (y, m)\} d y \\ = \int_{- \infty}^{\infty} \frac{1}{\sqrt{2 π φ V (y)}} exp \{- \frac{1 - 2 φ t}{2 φ} d (y, m)\} d y \\ = {(1 - 2 φ t)}^{- 1 / 2} \int_{- \infty}^{\infty} \frac{{(1 - 2 φ t)}^{1 / 2}}{{[2 π φ V (y)]}^{1 / 2}} exp \{- \frac{1 - 2 φ t}{2 φ} d (y, m)\} d y \\ = {(1 - 2 φ t)}^{- 1 / 2} \int_{- \infty}^{\infty} \frac{1}{{[2 π φ^{'} V (y)]}^{1 / 2}} exp \{- \frac{1}{2 φ^{'}} d (y, m)\} d y, \end{matrix}

where

φ^{'} = φ / (1 - 2 φ t)

. Since the integrand is the (saddlepoint) density of the distribution with

φ^{'} = φ / (1 - 2 φ t)

, we have

M_{d (y, m)} (t) = {(1 - 2 φ t)}^{- 1 / 2}, i . e ., M_{d (y, m) / φ} (t) = {(1 - 2 t)}^{- 1 / 2},

which is identical to the MGF of a

χ_{1}^{2}

. So, as

φ \to 0

we have

d (y, m) / φ \overset{d}{\to} χ_{1}^{2} .

For the set of observations

y = {(y_{1}, \dots, y_{n})}^{⊤},

where the

y_{i}

’s are independent, with

y_{i} \sim

{TBE}_{\infty}

, we have

d (y_{i}, m_{i}) / φ \overset{d}{\to} χ_{1}^{2}, as φ \to 0 .

Then, by independence, we obtain

D^{*} (y, m) \overset{d}{\to} χ_{n}^{2}, as φ \to 0,

which completes the proof of Theorem 2 (cf., [6]). □

Proof of Theorem 3.

We consider the four nested hypotheses (cf., [17]):

$H_{0} : m \in M$ (the saturated hypothesis);
$H_{1} : m = m (β_{0}, β_{1}, \dots, β_{p})$ ;
$H_{2} : m = m (β_{0}, β_{1}, \dots, β_{p_{B}})$ ;
$H_{3} : m = m (β_{0}, β_{1}, \dots, β_{p_{A}})$

of dimensions

n, p + 1, p_{B} + 1, p_{A} + 1,

respectively, where

n > p + 1 \geq p_{B} + 1 > p_{A} + 1

.

Since we proved the asymptotic normality of

\hat{β}

in Theorem 1, just as in Theorems 10.3.1 and 10.3.3 of [21], we can prove that the likelihood ratio test (LRT) statistic follows asymptotically a chi-square distribution by starting from the simple hypothesis and moving on to the composite hypothesis. That is, for LRT

λ (Y)

, we have

- 2 ln λ (Y) \sim χ_{q}^{2}

, where q is the corresponding degrees of freedom.

For testing

H_{0}

vs.

H_{1}

(

H_{1}

is the null hypothesis), we ascertain that (18) holds. For testing

H_{2}

vs.

H_{3}

(

H_{3}

is the null hypothesis), we ascertain that (20) holds. Note that their degrees of freedom are the difference in their numbers of dimensions. □

References

Bar-Lev, S.K. Independent tough identical results: The Tweedie class on power variance functions and the class of Bar-Lev and Enis on reproducible natural exponential families. Int. J. Stat. Probab. 2020, 9, 30–35. [Google Scholar] [CrossRef]
Jørgensen, B. The Theory of Dispersion Models; Chapman and Hall: London, UK, 1997. [Google Scholar]
Bar-Lev, S.K. The Exponential Dispersion Model Generated by the Landau Distribution—A Comprehensive Review and Further Developments. Mathematics 2023, 11, 4343. [Google Scholar] [CrossRef]
Dunn, P.K.; Smyth, G.K. Tweedie Family Densities: Methods of Evaluation. In Proceedings of the 16th International Workshop on Statistical Modelling, Odense, Denmark, 2–6 July 2001. [Google Scholar]
Dunn, P.K.; Smyth, G.K. Series Evaluation of Tweedie Exponential Dispersion Model Densities. Stat. Comput. 2005, 15, 267–280. [Google Scholar] [CrossRef]
Dunn, P.K.; Smyth, G.K. Generalized Linear Models with Examples in R; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Hougaard, P. Nonlinear Regression and Curved Exponential Families. Improvement of the Approximation to the Asymptotic Distribution. Metrika 1995, 42, 191–202. [Google Scholar] [CrossRef]
Chen, Z.; Pan, E.; Xia, T.; Li, Y. Optimal degradation-based burn-in policy using Tweedie exponential-dispersion process model with measurement errors. Reliab. Syst. Saf. 2020, 195, 106748. [Google Scholar] [CrossRef]
Ricci, L.; Martínez, R. Adjusted R²-type measures for Tweedie models. Comput. Stat. Data Anal. 2008, 52, 1650–1660. [Google Scholar] [CrossRef]
Dunn, P.K. Tweedie: Evaluation of Tweedie Exponential Family Models, R Package Version 2.3.5; 2022. Available online: https://cran.r-project.org/web/packages/tweedie/tweedie.pdf (accessed on 12 September 2023).
Smyth, G.K. Statmod: Statistical Modeling, R Package Version 1.4.30; 2017. Available online: https://CRAN.R-project.org/package=statmod (accessed on 3 April 2024).
Barndorff-Nielsen, O. Information and Exponential Families in Statistical Theory; Wiley: New York, NY, USA, 1978. [Google Scholar]
Merz, M.; Wüthrich, M.V. Statistical Foundations of Actuarial Learning and Its Applications; Springer: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
Morris, C.N. Natural exponential families with quadratic variance functions. Ann. Statist. 1982, 10, 65–80. [Google Scholar] [CrossRef]
McCullagh, P.; Nelder, J.A. Generalized Linear Models, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 1989. [Google Scholar]
Dunn, P.K.; Smyth, G.K. Evaluation of Tweedie exponential dispersion model densities by Fourier inversion. Stat. Comput. 2008, 18, 73–86. [Google Scholar] [CrossRef]
Jørgensen, B. Small dispersion asymptotics. Braz. J. Probab. Stat. 1987, 1, 59–90. [Google Scholar]
Liu, X.; Xiang, Z.; Bar-Lev, S.K.; Ridder, A. TBEinf, R Package Version 0.0.1; 2024. Available online: https://github.com/xliusufe/TBEinf (accessed on 28 April 2024).
Sidi, A. A user-friendly extrapolation method for oscillatory infinite integrals. Math. Comput. 1988, 51, 249–266. [Google Scholar] [CrossRef]
Dunn, P.K.; Smyth, G.K. GLMsData: Generalized Linear Model Data Sets, R Package Version 1.0.0; 2017. Available online: https://CRAN.R-project.org/package=GLMsData (accessed on 12 April 2024).
Casella, G.; Berger, R.L. Statistical Inference; Thomson Learning Inc.: Duxbury, MA, USA, 2002. [Google Scholar]

Table 1. Average of bias, sd, and se of estimated

β

for

n = 100,

200, 400, 800

.

β_{0}

is the true value of

β

.

β_{0, 1}

is the intercept term.

Table 1. Average of bias, sd, and se of estimated

β

for

n = 100,

200, 400, 800

.

β_{0}

is the true value of

β

.

β_{0, 1}

is the intercept term.

$β_{0}$	$n = 100$			$n = 200$			$n = 400$			$n = 800$
$β_{0}$	bias $(\times 10^{- 5})$	sd	se	bias $(\times 10^{- 5})$	sd	se	bias $(\times 10^{- 5})$	sd	se	bias $(\times 10^{- 5})$	sd	se
$β_{0, 1} = - 0.1$	2.0	0.0056	0.0048	$- 16$	0.0036	0.0032	$- 2.8$	0.0029	0.0026	$7.7$	0.0020	0.0018
$β_{0, 2} = - 0.2$	$- 11$	0.0037	0.0036	$2.7$	0.0026	0.0026	$1.2$	0.0018	0.0018	$- 10$	0.0013	0.0012
$β_{0, 3} = - 0.3$	$- 3.8$	0.0040	0.0039	$- 12$	0.0026	0.0026	$2.0$	0.0017	0.0017	$- 0.43$	0.0013	0.0012
$β_{0, 4} = - 0.4$	4.5	0.0036	0.0036	11	0.0025	0.0024	$- 5.1$	0.0018	0.0017	$- 0.94$	0.0012	0.0012
$β_{0, 5} = - 0.5$	$- 6.6$	0.0040	0.0038	14	0.0026	0.0024	$6.6$	0.0018	0.0018	$- 4.1$	0.0012	0.0012
$β_{0, 6} = - 0.6$	$- 7.3$	0.0036	0.0034	$2.5$	0.0028	0.0026	$0.22$	0.0018	0.0017	$2.5$	0.0012	0.0012
$β_{0, 7} = - 0.7$	$- 3.1$	0.0036	0.0034	$5.8$	0.0025	0.0025	$- 2.1$	0.0018	0.0017	$- 6.2$	0.0012	0.0012
$β_{0, 8} = - 0.8$	$- 6.8$	0.0038	0.0036	14	0.0027	0.0026	$- 0.61$	0.0017	0.0018	$1.1$	0.0013	0.0012
$β_{0, 9} = - 0.9$	12	0.0033	0.0032	$- 7.6$	0.0025	0.0025	$2.8$	0.0018	0.0017	$- 2.8$	0.0013	0.0012
$β_{0, 10} = - 1.0$	9.3	0.0038	0.0036	$- 0.84$	0.0025	0.0024	$- 3.2$	0.0018	0.0017	$3.6$	0.0013	0.0012

Table 2. Variables description of grazing dataset.

Variables	Meanings
Birds	the number of understorey birds; a numeric vector
When	when the bird count was conducted; a factor with levels Before (before herbivores were removed) and After (after herbivores were removed)
Grazed	which side of the stockproof fence; a factor with levels Reference (grazed by native herbivores) and Feral (grazed by feral herbivores, mainly horses)

Table 3. Variables description of hcrabs dataset.

Variables	Meanings
Col	the color of the female; a factor with levels LM (light medium), M (medium), DM (dark medium) or D (dark)
Spine	the spine condition; a factor with levels BothOK, OneOK or NoneOK
Width	the carapace width of the female crab in cm; a numeric vector
Wt	the weight of the female crab in grams; a numeric vector
Sat	the number of male crabs attached to the female (‘satellites’); a numeric vector

Table 4. Variables description of Boston housing dataset.

Variables	Meanings
CRIM	Crime rate by town
ZN	Proportion of residential land zoned for lots over 25,000 sq. ft
INDUS	Proportion of nonretail business acres per town
CHAS	Charles River dummy variable (1 if tract bounds the river)
NOX	Concentration of nitrogen oxides in parts per 10 million
RM	Average number of rooms per dwelling
AGE	Proportion of owner-occupied units built prior to 1940
DIS	Weighted mean of distances to five Boston employment centres
RAD	Index of accessibility to radial highways
TAX	Full-value property-tax rate per $10,000
PTRATIO	Pupil/teacher ratio by town
B	The proportion of black people by town
LSTAT	Percentage of people of lower status

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Generalized Linear Model (GLM) Applications for the Exponential Dispersion Model Generated by the Landau Distribution

Abstract

1. Introduction

2. Preliminaries: NEFs, Mean Value Representation, and Additive and Reproductive EDMs

3. GLM Applications for the EDM Generated by the Landau Distribution—Some Basics

3.1. Density Function

3.2. Scaled Deviance and Link Function

4. Asymptotic Properties

4.1. Asymptotic Properties of MLE

4.2. Analysis of the Deviance

5. Simulation Studies

5.1. Implementation

5.2. Simulation Studies

6. Real Data Analysis

6.1. Dataset “Grazing”

6.2. Dataset “Hcrabs”

6.3. Dataset “Boston Housing”

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Proofs

References

Article Metrics

Citations

Article Access Statistics