Robust Statistical Inference in Generalized Linear Models Based on Minimum Renyi’s Pseudodistance Estimators

Jaenada, María; Pardo, Leandro

doi:10.3390/e24010123

Open AccessArticle

Robust Statistical Inference in Generalized Linear Models Based on Minimum Renyi’s Pseudodistance Estimators

by

María Jaenada

^†

and

Leandro Pardo

^*,†

Department of Statistics and Operation Research, Faculty of Mathematics, Complutense University of Madrid, Plaza Ciencias, 3, 28040 Madrid, Spain

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2022, 24(1), 123; https://doi.org/10.3390/e24010123

Submission received: 11 December 2021 / Revised: 8 January 2022 / Accepted: 11 January 2022 / Published: 13 January 2022

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

Minimum Renyi’s pseudodistance estimators (MRPEs) enjoy good robustness properties without a significant loss of efficiency in general statistical models, and, in particular, for linear regression models (LRMs). In this line, Castilla et al. considered robust Wald-type test statistics in LRMs based on these MRPEs. In this paper, we extend the theory of MRPEs to Generalized Linear Models (GLMs) using independent and nonidentically distributed observations (INIDO). We derive asymptotic properties of the proposed estimators and analyze their influence function to asses their robustness properties. Additionally, we define robust Wald-type test statistics for testing linear hypothesis and theoretically study their asymptotic distribution, as well as their influence function. The performance of the proposed MRPEs and Wald-type test statistics are empirically examined for the Poisson Regression models through a simulation study, focusing on their robustness properties. We finally test the proposed methods in a real dataset related to the treatment of epilepsy, illustrating the superior performance of the robust MRPEs as well as Wald-type tests.

Keywords:

generalized linear model; independent and nonidentically distributed observations; minimum Rényi’s pseudodistance estimators; robust Wald-type test statistics for GLMs; influence function for GLMs; poisson regression model

MSC:

62F35; 62J12

1. Introduction

Generalized linear models (GLMs) were first introduced by Nelder and Wedderburn [1] and later expanded upon by McCullagh and Nelder [2]. The GLMs represent a natural extension of the standard linear regression models, which enclose a large variety of response variable distributions, including distributions of count, binary, or positive values. Let

Y_{1}, \dots, Y_{n}

be independent response variables. The classical GLM assumes that the density function of each random variable

Y_{i}

belongs to the exponential family, having the form

f (y, θ_{i}, ϕ) = exp \{\frac{y θ_{i} - b (θ_{i})}{a (ϕ)} + c (y, ϕ)\},

(1)

for

i = 1, \dots, n,

where the functions

a (ϕ),

b (θ_{i})

and

c (y, ϕ)

are known. Therefore, the observations are independent but not identically distributed, depending on a location parameter

θ_{i},

i = 1, \dots, n,

and a nuisance parameter

ϕ .

Further, we denote by

μ_{i}

the expectation of the random variable

Y_{i}

and we assume that there exists a monotone differentiable function, so called link function g, verifying

g (μ_{i}) = x_{i}^{T} β,

with

β = (β_{1}, \dots, β_{k}) \in R^{k}

(k < n)

the regression parameter vector. The

k \times 1

-vector of explanatory variables,

x_{i},

is assumed to be nonrandom, i.e., the design matrix is fixed. Correspondingly, the location parameter depends on the explanatory variables

θ = θ (x^{T} β)

the density function given in (1) can be written as

f_{i} (y, β, ϕ),

empathizing its dependence of

β

and

x_{i}

.

The maximum likelihood estimator (MLE) and the quasilikelihood estimators were well studied for the GLMs, and it is well known that they are asymptotically efficient but lack robustness in the presence of outliers, which can result in a significant estimation bias. Jaenada and Pardo [3] revised the different robust estimators in the statistical literature and studied the lack of robustness of the MLE as well. Among others, Stefanski et al. [4] studied optimally bounded score functions for the GLM and generalized the results obtained by Krasker and Welsch [5] for classical LRMs. Künsch et al. [6] introduced the so-called conditionally unbiased bounded-influence estimate, and Morgenthaler [7], Cantoni and Ronchetti [8], Bianco and Yohai [9], Croux and Hesbroeck [10], Bianco et al. [11], and Valdora and Yohai [12] continued the development of robust estimators for the GLMs based on general M-estimators. Later, Ghosh and Basu [13] proposed robust estimators for the GLM, based on the density power divergence (DPD) introduced in Basu et al. [14].

There are not many papers considering robust tests for GLMs. In this sense, Basu et al. [15] considered robust Wald-type tests based on the minimum DPD estimator, but assuming random explanatory variables for the GLM. The main purpose of this paper is to introduce new robust Wald-type tests based on the MRPE under fixed (not random) explanatory variables.

Broniatowski et al. [16] presented robust estimators for the parameters of the linear regression model (LRM) with random explanatory variables and Castilla et al. [17] considered Wald-type test statistics, based on MRPE, for the LRM. Toma and Leoni–Aubin [18] defined new robustness and efficient measures based on the RP and Toma et al. [19] considered the MRPE for general parametric models, and constructed a model selection criterion for regression models. The term “Rényi pseudodistance” (RP) was adopted in Broniatowski et al. [16] because of its similarity with the Rényi divergence (Rényi [20]), although this family of divergences was considered previously in Jones et al. [21]. Fujisawa and S. Eguchi [22] used the RP under the name of

γ

-cross entropy, introduced robust estimators obtained by minimizing the empirical estimate of the

γ

-cross entropy (or the

γ

-divergence associated to the

γ

-cross entropy) and studied their properties. Further, Hirose and Masuda [23] considered the

γ

likelihood function to find robust estimation. Using the

γ

-divergence, Kawashima and Fujisawa [24,25] presented robust estimators for sparse regression and sparse GLMs with random covariates. The robustness of all the previous estimators is based on density power weight,

f {(y, θ)}^{l},

which gives a small weight to outliers observations. This idea was also developed by Basu et al. [15] for the minimum DPD estimator and was considered some years ago by Windham [26]. More concretely, Basu et al. [14] considered the density power function multiplied by the score function.

The outline of the paper is as follows: in Section 2, some results in relation to the MRPEs for GLMs, previously obtained in Jaenada and Pardo [3], are presented. Section 3 introduces and studies Wald-type tests based on the MRPE for testing linear null hypothesis for the GLMs. In Section 4, the influence function of the MRPE as well as the influence functions of the Wald-type tests are derived. Finally, we empirically examine the performance of the proposed robust estimators and Wald-type test statistics for the Poisson regression model through a simulation study in Section 5, and we illustrate its applicability with real data sets for binomial and Poisson regression.

2. Asymptotic Distribution of the MRPEs for the GLMs

In this Section, we revise some of the results presented in Jaenada and Pardo [3] in relation to the MRPE. Let

Y_{1}, \dots, Y_{n},

be INIDO random variables with density functions with respect to some common dominating measure,

g_{1}, \dots, g_{n}

respectively. The true densities

g_{i}

are modeled by the density functions given in (1), belonging to the exponential family. Such densities are denoted by

f_{i} (y, β, ϕ)

highlighting its dependence on the regression vector

β,

the nuisance parameter

ϕ

and the observation

i

,

i = 1, \dots, n .

In the following, we assume that the explanatory variables

x_{i},

are fixed, and therefore the response variables verify the INIDO set up studied in Castilla et al. [27].

For each of the response variables

Y_{i}

, the RP between the theoretical density function belonging to the exponential family,

f_{i} (y, γ),

and the true density underlying the data,

g_{i},

can be defined, for

α > 0

as

R_{α} (f_{i} (y, γ), g_{i}) = \frac{1}{α + 1} log (\int f_{i} {(y, γ)}^{α + 1} d y) - \frac{1}{α} log (\int f_{i} {(y, γ)}^{α} g_{i} (y) d y) + k,

(2)

where

k = \frac{1}{α (α + 1)} log (\int g_{i} {(y)}^{α + 1} d y)

does not depend on

γ = {(β^{T}, ϕ)}^{T} .

We consider

(y_{1}, \dots, y_{n})

a random sample of independent but nonhomogeneous observations of the response variables with fixed predictors

(x_{1}, \dots, x_{n}) .

Since only one observation of each variable

Y_{i}

is available, a natural estimate of its true density

g_{i}

is the degenerate distribution at the the observation

y_{i} .

Consequently, in the following we denote

{\hat{g}}_{i}

the density function of the degenerate variable at the point

y_{i} .

Then, substituting in (2) the theoretical and empirical densities, yields to the loss

R_{α} (f_{i} (y, γ), {\hat{g}}_{i}) = \frac{1}{α + 1} log (\int f_{i} {(y, γ)}^{α + 1} d y) - \frac{1}{α} log f_{i} {(Y_{i}, γ)}^{α} + k .

(3)

If we consider the limit when

α

tends to zero we get

R_{0} (f_{i} (y, γ), {\hat{g}}_{i}) = lim_{α ↓ 0} R_{α} (f_{i} (y, γ), {\hat{g}}_{i}) = - log f_{i} (Y_{i}, γ) + k .

(4)

Last expression coincides with the Kullback–Leibler divergence, except for the constant

k .

More details about Kullback–Leiber divergence can be seen in Pardo [28].

For the seek of simplicity, let us denote

L_{α}^{i} (γ) = {(\int f_{i} {(y, γ)}^{α + 1} d y)}^{\frac{α}{α + 1}},

and

V_{i} (Y_{i}, γ) = \frac{f_{i} {(Y_{i}, γ)}^{α}}{L_{α}^{i} (γ)} .

The expression (3) can be rewritten as

R_{α} (f_{i} (y, γ), {\hat{g}}_{i}) = - \frac{1}{α} log (\frac{f_{i} {(Y_{i}, γ)}^{α}}{{(\int f_{i} {(y, γ)}^{α + 1} d y)}^{\frac{α}{α + 1}}}) + k = - \frac{1}{α} log V_{i} (Y_{i}, γ) + k .

Based on the previous idea, we shall define an objective function averaging the RP between all the the RPs. Since minimizing

R_{α} (f_{i} (y, γ), {\hat{g}}_{i})

in

γ

is equivalent to maximizing

log V_{i} (Y_{i}, γ),

we define a loss function averaging those quantities as

T_{n}^{α} (γ) = \frac{1}{n} \sum_{i = 1}^{n} \frac{f_{i} {(Y_{i}, γ)}^{α}}{{(\int f_{i} {(y, γ)}^{α + 1} d y)}^{\frac{α}{α + 1}}} = \frac{1}{n} \sum_{i = 1}^{n} \frac{f_{i} {(Y_{i}, γ)}^{α}}{L_{α}^{i} (γ)} = \frac{1}{n} \sum_{i = 1}^{n} V_{i} (Y_{i}, γ) .

(5)

Based on (5), we can define the MRPE of the unknown parameter

γ

,

{\hat{γ}}_{α},

by

{\hat{γ}}_{α} = arg max_{γ \in Γ} T_{n}^{α} (γ),

(6)

with

T_{n}^{α} (γ)

defined in (5)

T_{n}^{0} (γ) = \frac{1}{n} \sum_{i = 1}^{n} log f_{i} (y_{i}, γ)

at

α = 0 .

The MRPE coincides with the MLE at

α = 0

, and therefore the proposed family can be considered a natural extension of the classical MLE.

Now, since the MRPE is defined as a maximum, it must annul the first derivatives of the loss function given in (5). The estimating equations of the parameters

β

and

ϕ

are given by

\{\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial V_{i} (Y_{i}, γ)}{\partial β} = 0_{k} \\ \frac{1}{n} \sum_{i = 1}^{n} \frac{\partial V_{i} (Y_{i}, γ)}{\partial ϕ} = 0 . \end{matrix}

(7)

For the first equation, we have

\begin{matrix} \frac{\partial V_{i} (Y_{i}, γ)}{\partial β} & = \frac{1}{L_{α}^{i} {(γ)}^{2}} \{α f_{i} {(Y_{i}, γ)}^{α} \frac{\partial log f_{i} (Y_{i}, γ)}{\partial β} L_{α}^{i} (γ) \\ - [α {(\int f_{i} {(y, γ)}^{α + 1} d y)}^{\frac{α}{α + 1} - 1} \int f_{i} {(y, γ)}^{α + 1} \frac{\partial log f_{i} (y, γ)}{\partial β} d y] f_{i} {(Y_{i}, γ)}^{α}\} . \end{matrix}

The previous partial derivatives can be simplified as

\frac{\partial log f_{i} (Y_{i}, γ)}{\partial β} = \frac{Y_{i} - μ_{i}}{V a r (Y_{i}) g^{'} (μ_{i})} x_{i} = K_{1 i} (Y_{i}, γ) x_{i}

and

\frac{\partial log f_{i} (Y_{i}, γ)}{\partial ϕ} = - \frac{(Y_{i} θ_{i} - b (θ_{i}))}{a {(ϕ)}^{2}} a^{'} (ϕ) + \frac{\partial c (Y_{i}, ϕ)}{\partial ϕ} = K_{2 i} (Y_{i}, γ) .

See Ghosh and Basu [13] for more details. Now using the simplified expressions, we can write the estimating equation for

β

as

\sum_{i = 1}^{n} \frac{x_{i}}{L_{α}^{i} (γ)} \{M_{i} (Y_{i}, γ) - N_{i} (Y_{i}, γ)\} = 0_{k}

(8)

being

M_{i} (Y_{i}, γ) = f_{i} {(Y_{i}, γ)}^{α} K_{1 i} (Y_{i}, γ)

and

N_{i} (Y_{i}, γ) = \frac{f_{i} {(Y_{i}, γ)}^{α}}{\int f_{i} {(y, γ)}^{α + 1} d y} \int f_{i} {(y, γ)}^{α + 1} K_{1 i} (y, γ) d y .

Subsequently, the estimating equation for

ϕ,

is given by

\begin{matrix} \frac{\partial V_{i} (Y_{i}, γ)}{\partial ϕ} & = \frac{1}{L_{α}^{i} {(γ)}^{2}} \{α f_{i} {(Y_{i}, γ)}^{α} \frac{\partial log f_{i} (Y_{i}, γ)}{\partial ϕ} L_{α}^{i} (γ) \\ - [α {(\int f_{i} {(y, γ)}^{α + 1} d y)}^{\frac{α}{α + 1} - 1} \int f_{i} {(y, γ)}^{α + 1} \frac{\partial log f_{i} (y, γ)}{\partial ϕ} d y] f_{i} {(Y_{i}, γ)}^{α}\} \\ = \frac{1}{L_{α}^{i} {(γ)}^{2}} \{α f_{i} {(Y_{i}, γ)}^{α} \frac{\partial log f_{i} (Y_{i}, γ)}{\partial ϕ} L_{α}^{i} (γ) \\ - [α \frac{L_{α}^{i} (γ)}{\int f_{i} {(y, γ)}^{α + 1} d y} \int f_{i} {(y, γ)}^{α + 1} \frac{\partial log f_{i} (y, γ)}{\partial ϕ} d y] f_{i} {(Y_{i}, γ)}^{α}\} . \end{matrix}

and thus, the estimating equation for

ϕ

is given by

\sum_{i = 1}^{n} \frac{1}{L_{α}^{i} (γ)} \{M_{i}^{*} (Y_{i}, γ) - N_{i}^{*} (Y_{i}, γ)\} = 0

(9)

being

M_{i}^{*} (Y_{i}, γ) = f_{i} {(Y_{i}, γ)}^{α} K_{2 i} (Y_{i}, γ),

and

N_{i}^{*} (Y_{i}, γ) = \frac{f_{i} {(Y_{i}, γ)}^{α}}{\int f_{i} {(y, γ)}^{α + 1} d y} \int f_{i} {(y, γ)}^{α + 1} K_{2 i} (y, γ) d y .

Under some regularity conditions, Castilla et al. [27] established the consistency and asymptotic normality of the MRPEs under the INIDO setup. Before stating the consistence and asymptotic distribution of the MRPEs for the GLM, let us introduce some useful notation. We define

\begin{matrix} S_{α}^{i} & = \int f_{i} {(y, β, ϕ)}^{α + 1} d y \\ m_{j l i} (γ) & = \frac{1}{\int f_{i} {(y, γ)}^{α + 1} d y} \int f_{i} {(y, γ)}^{α + 1} K_{j i} (y, γ) K_{l i} (y, γ) d y, \\ m_{j i} (γ) & = \frac{1}{\int f_{i} {(y, γ)}^{α + 1} d y} \int f_{i} {(y, β, ϕ)}^{α + 1} K_{j i} (y, γ) d y, \\ l_{j l i} (γ) & = \int \frac{f_{i} {(y, γ)}^{2 α + 1}}{L_{α}^{i} {(γ)}^{2}} (K_{j i} (y, γ) - m_{j i} (γ)) (K_{l i} (y, γ) - m_{l i} (γ)) d y, \end{matrix}

(10)

for all

j, l = 1, 2

and

i = 1, \dots, n .

Theorem 1.

Let

Y_{1}, \dots, Y_{n}

be a random sample from the GLM defined in (1). The MRPE

{\hat{γ}}_{α} = {({\hat{β}}_{α}^{T}, {\hat{ϕ}}_{α})}^{T}

is consistent and its asymptotic distribution is given by

\sqrt{n} Ω_{n} {(γ)}^{- \frac{1}{2}} Ψ_{n} (γ) (({\hat{β}}_{α}, {\hat{ϕ}}_{α}) - (β, ϕ)) \underset{n \to \infty}{\overset{L}{\to}} N (0_{k + 1}, I_{k + 1}),

where

X

denotes the design matrix,

I_{k}

is the k-dimensional identity matrix and the matrices

Ψ_{n}

and

Ω_{n}

are defined by

Ω_{n} (γ) = \frac{1}{n} (\begin{matrix} X^{T} D_{11} X & X^{T} D_{12} 1 \\ 1^{T} D_{12} X & 1^{T} D_{22} 1 \end{matrix}),

Ψ_{n} (γ) = \frac{1}{n} (\begin{matrix} X^{T} (D_{11}^{*} - {(D_{1}^{*})}^{T} D_{1}^{*}) X & X^{T} (D_{12}^{*} - {(D_{1}^{*})}^{T} D_{2}^{*}) 1 \\ 1^{T} (D_{12}^{*} - {(D_{1}^{*})}^{T} D_{2}^{*}) X & 1^{T} (D_{22}^{*} - {(D_{2}^{*})}^{T} D_{2}^{*}) 1 \end{matrix}),

with

D_{j k} = d i a g {(l_{j k i} (γ))}_{i = 1, \dots, n, j, k = 1, 2} D_{j k}^{*} = d i a g {(m_{j k i} (γ))}_{i = 1, \dots, n}

and

D_{j}^{*} = d i a g {(m_{j i} (γ))}_{i = 1, \dots, n},, j . k = 1, 2 .

Proof.

The consistency is proved for general statistical models in Castilla et al. [27] and the asymptotic distribution of the MRPEs for GLM is derived in Jaenada and Pardo [3]. □

3. Wald Type Tests for the GLMs

In this section, we define Wald-type tests for linear null hypothesis of the form

H_{0} : M^{T} γ = m vs H_{1} : M^{T} γ \neq m

(11)

being

γ = {(β^{T}, ϕ)}^{T},

M

a

(k + 1) \times r

full rank matrix and

m = {(m_{1}, \dots, m_{r})}^{T}

(12)

a r-dimensional vector

(r \leq k + 1)

. If the nuisance parameter

ϕ

is known, as with logistic and Poisson regression, the matrix

M = L_{k \times r} .

Additionally, choosing

M = (L_{k \times r}, O_{1 \times r})

gives rise to a null hypothesis defined by a linear combination of the regression coefficients,

β

, with

ϕ

known or unknown. Further, the simple null hypothesis is a particular case when choosing

M

as the identity matrix of rank k,

H_{0} : β = β_{0} vs H_{1} : β \neq β_{0}

with

m = β_{0} = {(β_{1}^{0}, \dots, β_{k}^{0})}^{T} .

In the following we assume that there exist a matrix

A_{α} (γ)

verifying

lim_{n \to \infty} Ψ_{n} (γ) Ω_{n} {(γ)}^{- 1} Ψ_{n} (γ) = A_{α} (γ) .

Definition 1.

Let

{\hat{γ}}_{α} = {({\hat{β}}_{α}^{T}, {\hat{ϕ}}_{α})}^{T}

be the MRPE of

γ = {(β^{T}, ϕ)}^{T}

for the GLM. The Wald-type tests, based on the MRPE, for testing (11) are defined by

W_{n} ({\hat{γ}}_{α}) = n {(M^{T} {\hat{γ}}_{α} - m)}^{T} {(M^{T} Ψ_{n} {({\hat{γ}}_{α})}^{- 1} Ω_{n} ({\hat{γ}}_{α}) Ψ_{n} {({\hat{γ}}_{α})}^{- 1} M)}^{- 1} (M^{T} {\hat{γ}}_{α} - m) .

(13)

The following theorem presents the asymptotic distribution of the Wald-type test statistics,

W_{n} ({\hat{γ}}_{α}) .

Theorem 2.

The Wald-type test

W_{n} ({\hat{γ}}_{α})

follows asymptotically, under the null hypothesis presented in (11), a chi-square distribution with degrees of freedom equal to the dimension of the vector

m

in (12)

Under the null hypothesis given in (11) the asymptotic distribution of the Wald-type test statistics is a chi-square distribution with r degrees of freedom.

Proof.

We know that

\sqrt{n} ({({\hat{β}}_{α}^{T}, {\hat{ϕ}}_{α})}^{T} - {(β^{T}, ϕ)}^{T}) \underset{n \to \infty}{\overset{L}{\to}} N (0_{k + 1}, A_{α} {(γ)}^{- 1}) .

Therefore,

\sqrt{n} (M^{T} {\hat{γ}}_{α} - m) = \sqrt{n} M ({\hat{γ}}_{α} - γ) \underset{n \to \infty}{\overset{L}{\to}} N (0_{k + 1}, M^{T} A_{α} {(γ)}^{- 1} M) .

Now, the result follows taking into account that

{\hat{γ}}_{α}

is a consistent estimator of

γ_{0} .

□

Based on the previous convergence, the null hypothesis in (11) is rejected, if

W_{n} ({\hat{γ}}_{α}) > χ_{r, α}^{2}

(14)

being

χ_{r, α}^{2}

the

100 (1 - α)

percentile of a chi-square distribution with r degrees of freedom.

Finally, let

γ_{1}

be a parameter point verifying

M^{T} γ_{1} \neq m,

i.e.,

γ_{1}

is not on the null hypothesis. The next result establishes that the Wald-type tests given in (14) are consistent (see Fraser [29]).

Theorem 3.

Let

γ_{1}

be a parameter point verifying

M^{T} γ_{1} \neq m .

Then the Wald-type tests given in (14) are consistent, i.e.,

lim_{n \to \infty} P_{γ_{1}} (W_{n} ({\hat{γ}}_{α}) > χ_{r, α}^{2}) = 1 .

Proof.

See Appendix A. □

Remark 1.

In the proof of the previous Theorem was established the approximate power function of the Wald-type tests defined in (13),

π_{W_{n} ({\hat{γ}}_{α})} (γ_{1}) \approx 1 - ϕ_{N (0, 1)} (\frac{1}{σ (γ_{1})} (\frac{χ_{r, α}^{2}}{\sqrt{n}} - \frac{W_{n} (γ_{1})}{\sqrt{n}}))

where

σ^{2} (γ_{1}) = {(\frac{\partial l_{{\hat{γ}}_{α}} (ζ)}{\partial ζ^{T}})}_{γ = γ_{1}} A_{α} {(γ_{1})}^{- 1} {(\frac{\partial l_{{\hat{γ}}_{α}} (ζ)}{\partial ζ})}_{γ = γ_{1}}

and

l_{{\hat{γ}}_{α}} (ζ) = {(M^{T} {\hat{γ}}_{α} - m)}^{T} {(M^{T} A_{α} {(ζ)}^{- 1} M)}^{- 1} (M^{T} {\hat{γ}}_{α} - m) .

From the above expression, the necessary sample size n for the Wald-type tests to have a predetermined power,

π_{0},

is given by

n = [n^{*}] + 1

, with

n^{*} = \frac{A + B + \sqrt{A (A + 2 B)}}{2 l_{γ_{1}}^{2} (γ_{1})}

being

A = σ^{2} (γ_{1}) {(ϕ^{- 1} (1 - π_{0}))}^{2}, B = 2 χ_{r, α}^{2} l_{γ_{1}} (γ_{1})

and

[\cdot]

the integer part.

In accordance with Maronna et al. [30], the breakdown point of the estimators

{\hat{γ}}_{α}

of a parameter

γ

is the largest amount of contamination that the data may contain such that

{\hat{γ}}_{α}

still gives enough information about

γ .

The derivation of a general breakdown points it is in general not easy, so it may deserve a separate paper where it may be jointly considered the replacement finite-sample breakdown point introduced by Donoho and Huber [31]. Although breakdown point is an important theoretical concept in robust statistics, perhaps is more useful the definition of breakdown point associated to a finite sample: replacement finite-sample break down point. More details can be seen in Section 3.2.5 of Maronna et al. [30].

4. Influence Function

We derive in this section the IF of the MRPEs of the parameters

γ = {(β^{T}, ϕ)}^{T}

and Wald-type statistics based on these MRPEs,

W_{n} ({\hat{γ}}_{α}) .

The influence function (IF) of an estimator quantifies the impact of an infinitesimal perturbation in the true distribution of the data on the asymptotic value of the resulting parameter estimate (in terms of the corresponding statistical functional). An estimator is said to be robust if its IF is bounded. If we denote

G = (G_{1}, \dots, G_{n})

the true distributions underlying the data, the functional

T_{α} (G)

and associated to the MRPE for the parameters

γ

is such that

\frac{1}{n} \sum_{i = 1}^{n} R_{α} (f_{i} (y, T_{α} (G)), g_{i} (y)) = min_{γ} \frac{1}{n} \sum_{i = 1}^{n} R_{α} (f_{i} (y, γ), g_{i} (y)) .

The IF of a estimator is defined as the limiting standardized bias due to infinitesimal contamination. That is, given a contaminated distribution at the point

(y_{t}, x_{t})

,

G_{ε} = (1 - ε) G + ε Δ_{(y_{t}, x_{t})}

with

Δ_{(y_{t}, x_{t})}

the degenerate distribution at

(y_{t}, x_{t})

, the IF of the estimator

{\hat{γ}}_{α}

in terms of its associated functional

T_{α} (G)

is computed as

IF ((y_{t}, x_{t}), T_{α} (G)) = lim_{ε \to 0} \frac{T_{α} (G_{ε}) - T_{α} (G)}{ε} .

In the following, let us denote

T_{α} (G) = (T_{α}^{β} (G), T_{α}^{ϕ} (G)),

where

T_{α}^{β} (G)

and

T_{α}^{ϕ} (G)

are the functionals associated the parameters

β

and

ϕ

, respectively. Then, they must satisfy the estimating equations of the MRPE given by

\begin{matrix} \sum_{i = 1}^{n} \frac{x_{i}}{L_{α}^{i} ((T_{α}^{β} (G), T_{α}^{ϕ} (G)))} \{M_{i} (y_{i}, (T_{α}^{β} (G), T_{α}^{ϕ} (G))) - N_{i} (y_{i}, (T_{α}^{β} (G), T_{α}^{ϕ} (G)))\} & = 0_{k} \\ \sum_{i = 1}^{n} \frac{1}{L_{α}^{i} ((T_{α}^{β} (G), T_{α}^{ϕ} (G)))} \{M_{i}^{*} (y_{i}, (T_{α}^{β} (G), T_{α}^{ϕ} (G))) - N_{i}^{*} (y_{i}, (T_{α}^{β} (G), T_{α}^{ϕ} (G)))\} & = 0 \end{matrix}

(15)

where the quantities

L_{α^{i} (γ)}, M_{i} (y_{i}, γ), N_{i} (y_{i}, γ), M_{i}^{*} (y_{i}, γ)

and

N_{i}^{*} (y_{i}, gamma)

are defined in Section 2. Now, evaluating the previous equation at the contaminated distribution

G_{ε}

, implicitly differentiating the estimating equations in

ε

and evaluating them at

ε = 0,

we can obtain the expression of the IF for the GLM.

We first derive the expression IF of MRPEs at the

i_{0} - t h

direction. For this purpose, we consider the contaminated distributions

G_{i_{0}, ε} = (G_{1}, \dots, G_{i_{0} - 1}, G_{i_{0}, ε}, G_{i_{0} + 1}, \dots, G_{n}),

with

G_{i_{0}, ε} = (1 - ε) G_{i_{0}} + ε Δ_{(y_{i_{0}}, x_{i_{0}})} .

Here, only the

i_{0}

-th component of the vector of distributions is contaminated. If the true density function

g_{i}

of each variable belongs to the exponential model, we have that

g_{i} (y) = \{\begin{matrix} f_{i} (y, γ) & i \neq i_{0} \\ (1 - ε) f_{i} (y, γ) + ε Δ_{(y_{i_{0}}, x_{i_{0}})} (y) & i = i_{0} . \end{matrix}

Accordingly, we define

γ_{ε}^{i_{0}} = T_{α} (G_{1}, \dots, G_{i_{0} - 1}, G_{i_{0}, ε}, G_{i_{0} + 1}, \dots, G_{n})

the MRPE when the true distribution underlying the data is

G_{i_{0}, ε} .

Based on Remark 5.2 in Castilla et al. [27] the IF of the MRPE at the

i_{0} - t h

direction with

(y_{i_{0}}, x_{i_{0}})

the point of contamination is given by

\begin{array}{l} I F ((y_{i_{0}}, x_{i_{0}}), T_{α}, G) = {(\frac{\partial T_{α} (G_{i_{0}, γ_{ε}})}{\partial ε})}_{ε = 0} \\ = Ψ_{n} {(γ)}^{- 1} \frac{f_{i_{0}} {(y_{i_{0}}, γ)}^{α}}{\int f_{i_{0}} {(y, γ)}^{α + 1} d y} (\begin{matrix} K_{1 i} (y_{i_{0}}, γ) - f_{i_{0}} {(y_{i_{0}}, γ)}^{- α} N_{i_{0}} (y_{i_{0}}, γ) \\ K_{2 i} (y_{i_{0}}, γ) - f_{i_{0}} {(y_{i_{0}}, γ)}^{- α} N_{i_{0}}^{*} (y_{i_{0}}, γ) \end{matrix}) (\begin{matrix} x_{i_{0}} & 0 \\ 0 & 1 \end{matrix}) . \end{array}

In a similar manner, the IF in all directions (i.e., all components of the vector of distributions are contaminated) has the following expression

\begin{array}{l} I F ((y_{1}, x_{1}), \dots, (y_{n}, x_{n}), T_{α}, G) = {(\frac{\partial T_{α} (G_{γ_{ε}})}{\partial ε})}_{ε = 0} \\ = Ψ_{n} {(γ)}^{- 1} \sum_{i = 1}^{n} (\frac{f_{i} {(y_{i}, γ)}^{α}}{\int f_{i} {(y, γ)}^{α + 1} d y} (\begin{matrix} K_{1 i} (y_{i}, γ) - f_{i} {(y_{i}, γ)}^{- α} N_{i} (y_{i}, γ) \\ K_{2 i} (y_{i}, γ) - f_{i} {(y_{i}, γ)}^{- α} N_{i}^{*} (y_{i}, γ) \end{matrix}) (\begin{matrix} x_{i} & 0 \\ 0 & 1 \end{matrix})), \end{array}

with

(y_{1}, x_{1}), \dots, (y_{n}, x_{n})

the point of contamination. We next derive the expression of the IF for the Wald-type tests presented in Section 3. The statistical functional associated with the Wald-type tests for the linear null hypothesis (11) at the distributions

G = (G_{1}, \dots, G_{n})

, ignoring the constant

n,

is given by

W_{α} (G) = {(M^{T} T_{α} (G) - m)}^{T} {(M^{T} A_{α} {(T_{α} (G))}^{- 1} M)}^{- 1} (M^{T} T_{α} (G) - m) .

(16)

Again, evaluating the Wald-type test functionals at the contaminated distribution

G_{ε}

and implicitly differentiating the expression, we can get the expression of it IF. In particular, the IF of the Wald-type test statistics at the

i_{0}

-th direction and the contamination point

(y_{i_{0}}, x_{0})

is given by

\begin{matrix} I F_{1} ((y_{i_{0}}, x_{0}), W_{α}, G) & = {(\frac{\partial W_{α} (G_{i_{0}, ε})}{\partial ε})}_{ε = 0} \\ = 2 {(M^{T} T_{α} (G) - m)}^{T} {(M^{T} A_{α} {(T_{α} (G))}^{- 1} M)}^{- 1} M^{T} I F ((y_{i_{0}}, x_{0}), T_{α}, G) . \end{matrix}

Evaluating the previous expression at the null hypothesis,

M^{T} T_{α} (G) = m,

the IF becomes identically zero,

I F_{1} ((y_{i_{0}}, x_{i_{0}}) W_{α}, G) = 0_{k + 1} .

Therefore, it is necessary to consider the second order IF of the proposed Wald-type tests. Twice differentiating in

W_{α} (G_{ε})

, we get

\begin{matrix} I F_{2} ((y_{i_{0}}, x_{i_{0}}), W_{α}, G) & = {(\frac{\partial^{2} W_{α} (G_{i_{0}, ε})}{\partial ε^{2}})}_{ε = 0} \\ = 2 I F {((y_{i_{0}}, x_{i_{0}}), T_{α}, F_{β})}^{T} M {(M^{T} A_{α} {(T_{α} (G))}^{- 1} M)}^{- 1} M^{T} I F ((y_{i_{0}}, x_{i_{0}}), T_{α}, G) . \end{matrix}

Finally, the second order IF of the Wald-type tests in all directions is given by

\begin{matrix} I F_{2} ((y_{1}, x_{1}), \dots, (y_{n}, x_{n}), W_{α}, G) & = {(\frac{\partial^{2} W_{α} (G_{ε})}{\partial ε^{2}})}_{ε = 0} \\ = 2 I F {((y_{1}, x_{1}), \dots, (y_{n}, x_{n}), T_{α}, G)}^{T} M {(M^{T} A_{α} {(T_{α} (G))}^{- 1} M)}^{- 1} M^{T} \\ \cdot I F ((y_{1}, x_{1}), \dots, (y_{n}, x_{n}), T_{α}, G) . \end{matrix}

To asses the robustness of the MRPEs and Wald-type test statistics we must discuss the boundedness of the corresponding IF. The boundedness of the second order IF of the Wald-type test statistics is determined by the boundedness of the IF of the MRPEs. Further, the matrix

Ψ_{n} (γ)

is assumed to be bounded, so the robustness of the estimators only depend on the second factor of the IF. Most standard GLMs enjoy such properties for positives values of

α

, but the influence function is unbounded at

α = 0,

corresponding with the MLE. As an illustrative example, Figure 1 plots the IF of the MRPEs for the Poisson regression model with different values of

α = 0, 0.5

at one direction. The model is fitted with only one covariate, the parameter

ϕ

is known for Poisson regression (

ϕ = 1

) and the true regression vector is fixed

β = 1 .

As shown, the IF of the MRPEs with positives values of

α

are bounded, whereas the IF of the MLE is not, indicating it lack of robustness.

5. Numerical Analysis: Poisson Regression Model

We illustrate the proposed robust method for the Poisson regression model. As pointed out in Section 1 the Poisson regression model belongs to the GLM with known shape parameter

ϕ = 1,

location parameter

θ_{i} = x_{i}^{T} β

and known functions

b (θ_{i}) = exp (x_{i}^{T} β)

and

c (y_{i}) = - log (y_{i}!)

. Since the nuisance parameter is known, for the seek of simplicity in the following we only use

β = γ .

In Poisson regression, the mean of the response variable is linked to the linear predictor through the natural logarithm, i.e.,

μ_{i} = exp (x_{i}^{T} β) .

Thus, we can apply the previous proposed method to estimate the vector of regression parameters

β

with objective function given in Equation (5).

The results provided are computed in the software R. The minimization of the objective function is performed using the implemented optim() function, which applies the Nelder–Mead iterative algorithm (Nelder and Mead [32]). Nelder–Mead optimization algorithm is robust although relatively slow. The corresponding objective function

T_{n}^{α} (γ)

given in (5) is highly nonlinear and requires the evaluation of nontrivial quantities. Further, the computation of the Wald-type test statistics defined in (13) requires to evaluate the covariance matrix of the MRPEs, involving nontrivial integrals. Some simplified expressions of the main quantities defined throughout the paper for the Poisson regression model, such as

L_{α}^{i} (β), K_{1 i} (y, β), N_{i} (y, β), m_{1 i} (β), m_{11 i} (β)

or

l_{11 i} (β),

are given in the Appendix B. There is no closed expression for these quantities, and they need to be approximated numerically. Since the minimization is iteratively performed, computing such expressions at each step of the algorithm and for each observation may entail an increased computational burden. Nonetheless, the complexity is not significant for low-dimensional data. On the other hand, the optimum in (5) need not to be uniquely defined, since the objective function may have several local minima. Then, the choice of the initial value of the iterative algorithm is crucial. Ideally, a good initial point should be consistent and robust. In our results the MLE is used as initial estimate for the algorithm.

We analyze the performance of the proposed methods in Poisson regression through a simulation study. We asses the behavior of the MRPE under the sparse Poisson regression model with

k = 12

covariates but only 3 significant variables. We set the 12-dimensional regression parameter

β = (1.8, 1, 0, 0, 1.5, 0, \dots 0)

and we generate the explanatory variables,

x_{i}

, from the standard uniform distribution with variance-covariance matrix having Toeplitz structure, with the

(j, l)

-th element being

0 . 5^{| j - l |}, j, l = 1, \dots, p

. The response variables are generated from the Poisson regression model with mean

μ_{i} = x_{i}^{T} β,

Y_{i} \sim P (μ_{i}) .

To evaluate the robustness of the proposed estimators, we contaminate the responses using a perturbed distribution of the form

(1 - b) P (μ_{i}) + b P (2 μ_{i}),

where b is a realization of a Bernoulli variable with parameter

ε

so called the contamination level. That is, the distribution of the contaminated responses lies in a small neighbourhood of the assumed model. We repeat the process

R = 1000

for each value of

α

.

Figure 2 presents the mean squared error of the estimate (MSE),

MSE = | | {\hat{β}}_{α} - β {| |}_{2},

(left) and the MSE on the prediction (right) against contamination level on data for different values of

α = 0, 0.1, 0.3, 0.5

and

0.7

. The sample size is fixed at

n = 200

and the MSE on the prediction is calculated using

n = 200

new observations following the true model. As shown, greater values of

α

correspond to more robust estimators, revealing the role of the tuning parameter on the robustness gain. Most strikingly, the MSE grows linearly for the MLE, while the proposed estimators manage to maintain a low error in all contaminated scenarios.

Furthermore, it is to be expected that the error of the estimate decreases with larger samples sizes. In this regard, Figure 3 shows the MSE for different values of

α = 0, 0.1, 0.3, 0.5

and

0.7,

against the sample size in the absence of contamination (left) and under

5 %

of contamination. Our proposed estimators are more robust than the classical MLE with almost all contaminated scenarios, since the MSE committed is lower for all positives values of

α

than for

α = 0

(corresponding to the MLE), except for too small sample sizes. Conversely, the MLE is, as expected, the most efficient estimator in absence of contamination, closely to our proposed estimators with

α = 0.1, 0.3

, highlighting the importance of

α

in controlling the trade-off between efficiency and robustness. In this regard, values of

α

about

0.3

perform the best taking into account the low loss of efficiency and the gain in robustness. Finally, note that small sample sizes adversely affect to greater values of

α

.

On the other hand, one could be interested on testing the significance of the selected variables. For this purpose, we simplify the true model and we examine the performance of the proposed Wald-type test statistics under different true coefficients values. In particular, let us consider a Poisson regression model with only two covariates, generated from the uniform distribution as before, and the linear null hypothesis

H_{0} : β_{2} = 0 .

(17)

That is, we are interested in assessing the significance of the second variable. The sample size if fixed at

n = 200

and the true value of the component of the regression vector is set

β_{1} = 1 .

We study the power of the tests under increasing signal of the second parameter

β_{2}

and increasing contamination level. Here, the model is contaminated by perturbing the true distribution with

(1 - b) P (μ_{i}) + b P ({\tilde{μ}}_{i}),

where

μ_{i} = x_{i}^{T} β

is the mean of the Poisson variable in the absence of contamination,

{\tilde{μ}}_{i} = x_{i}^{T} \tilde{β}

is the contaminated mean, with

\tilde{β} = (1, 0),

and b is a realization of a Bernoulli variable with probability of success

ε .

Table 1 presents the rejection rate of the Wald-type test statistics for different true values of

β_{2}

under different contaminated scenarios. As expected, stronger signals produce higher power for all Wald-type test. Moreover, the power of the Wald-type test statistics based on the MLE decreases when increasing the contamination, whereas the power of the statistics based on the MRPEs with positives values of

α

keeps sufficiently high. Then, our proposed robust estimators are able to detect the significance of the variable even in heavily contaminated scenarios.

6. Real Data Applications

6.1. Example I: Poisson Regression Regression

We finally apply our proposed estimators in a real dataset arising from Crohn’s disease. The data were first studied in Lô and Ronchetti [33] to asses the adverse events of a drug. The clinical study included 117 patients affected by the disease, for whom information was recorded for 7 explanatory variables: BMI (body mass index), HEIGHT, COUNTRY (one of the two countries where the patient lives), SEX, AGE, WEIGHT, and TREAT (the drug taken by the patient in factor form: placebo, Dose 1, Dose 2), in addition to the response variable AE (number of adverse events). Lô and Ronchetti [33] considered a Poisson regression model for the Crohn data and determined that only variables Dose 1, BMI, HEIGHT, SEX, AGE, and COUNTRY may be essentially significant. Further, they flagged observations 23rd, 49th, and 51st to be highly influential on the classical analysis. Table 2 presents the estimated coefficient of the explanatory variable when fitting the Poisson regression model. Robust methods suggest higher coefficients for the variables BMI and AGE, whereas fewer values for the coefficients of the categorical variables COUNTRY, SEX, Dose 1.

Following the discussion in Lô and Ronchetti [33], classical tests may not select variable AGE to be significant. Then, we propose testing the significance of that variable using Wald-type test statics based on different values

α

. Table 3 shows the p-values of the corresponding tests with null hypothesis

H_{0}

: AGE = 0, with the original data and after removing the outlying observations.

The MLE rejects the significance of the variable AGE when the original data are used, whereas the Wald-type test statistics with positives values of

α

indicate strong evidence against the null hypothesis. In contrast, if the influential observations are removed, all Wald-type test statistics agree in the significance of the variable. This example illustrates the robustness of the proposed statistics.

6.2. Example II: Binomial Regression

We finally illustrate the applicability of the MRPE for robust inference in the binomial regression model. We examine the damaged carrots dataset, first studied in Phelps [34] and later discussed by Cantoni and Ronchetti [8] and Ghosh and Basu [13] to illustrate robust procedures for binomial regression. The data contain 24 samples, among which the 14th observation was flagged as an outlier in the y-space but not a leverage point. The data are issued from a soil experiment and give the proportion of carrots showing insect damage in a trial with three blocks and eight dose levels of insecticide. The explanatory variables are the logarithm transform of the dose (Logdose) and two dummy variables for Blocks 1 and 2.

Binomial regression is a natural extension of the logistic regression when the response variable Y does not follow a Bernoulli distribution but a Binomial distribution counting the number of successes in a series of m independent Bernoulli trials. Binomial regression model belongs to the GLM with known shape parameter

ϕ = 1,

location parameter

θ_{i} = x_{i}^{T} β

and functions

b (θ_{i}) = m log (1 + exp (x_{i}^{T} β))

and

c (y_{i}) = log ((\binom{m}{y_{i}}))

. The mean of the response variable is then linked to the linear predictor through the logit function, i.e.,

log (\frac{μ_{i}}{m - μ_{i}}) = x_{i}^{T} β .

Table 4 presents the estimated coefficients of the regression vector for the carrots data using the MLE and robust MRPEs when the model is fitted with the original data and the model fitted without the outlying observation. The results provided are computed in the same manner as in Section 5, adapting the corresponding quantities in Equation (5) for the binomial model. All integrals involved were numerically approximated, and the MLE is used as initial estimate for the optimization algorithm. The influence of observation 14 stands out when using the MLE; the estimated coefficients are remarkably different when fitting the model with and without observation 14. In contrast, all methods estimate similar coefficients after removing the outlying observation, coinciding with the robust estimates for moderately high values of the tuning parameter

α

.

7. Conclussions

In this paper, we presented the MRPE and Wald-type test statistics for GLMs. The proposed MRPEs and statistics have appealing robustness properties where the data are contaminated due to outliers or leverage points. MRPEs are consistent and asymptotically normal and represent an attractive alternative to the classical nonrobust methods. Additionally, robust Wald-type test statistics, based on the MRPEs, were developed. Through the study of the IFs and the development of an extensive simulation study, we proved their robustness from a theoretical and practical point of view, respectively. In particular, we illustrated the superior performance of the MRPEs and the corresponding Wald-type tests for the Poisson regression model.

Author Contributions

Conceptualization, M.J. and L.P.; methodology, M.J. and L.P.; software, M.J. and L.P.; validation, M.J. and L.P.; formal analysis, M.J. and L.P.; investigation, M.J. and L.P.; resources, M.J. and L.P.; data curation, M.J. and L.P.; writing—original draft preparation, M.J. and L.P.; writing—review and editing, M.J. and L.P.; visualization, M.J. and L.P.; supervision, M.J. and L.P.; project administration, M.J. and L.P.; funding acquisition, M.J. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Spanish Grants PGC2018-095194-B-100 (L. Pardo and M. Jaenada) and FPU/018240 (M. Jaenada).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The real datasets are publicly available on the R package robustbase in CRAN under the names of CrohnD (Poisson regression example) and carrots (binomial regression example).

Acknowledgments

We are very grateful to the referees and associate editor for their helpful comments and suggestions. This research is supported by the Spanish Grants PGC2018-095194-B-100 (L. Pardo and M. Jaenada) and FPU/018240 (M. Jaenada). M.Jaenada and L. Pardo are members of the Instituto de Matematica Interdisciplinar, Complutense University of Madrid.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

DPD	Density Power Divergence
IF	Influence Function
GLM	Genelarized Linear Model
LRM	Linear Regression Model
MLE	Maximum Likelihood Estimator
MRPE	Minimum Rényi Pseudodistance Estimator
RP	Rényi Pseudodistance

Appendix A. Proof of Theorem 3

Let us define

l_{η} (ζ) = {(M^{T} η - m)}^{T} {(M^{T} A_{α} {(ζ)}^{- 1} M)}^{- 1} (M^{T} η - m)

so the Wald-type test statistic is such that

n l_{{\hat{γ}}_{α}} ({\hat{γ}}_{α}) = W_{n} ({\hat{γ}}_{α}) .

We know that

{\hat{γ}}_{α} \underset{n \to \infty}{\overset{P}{\to}} γ_{1}

and therefore

l_{{\hat{γ}}_{α}} (γ_{1})

and

l_{γ_{1}} (γ_{1})

have the same asymptotic distribution. A first order Taylor expansion of

g (ζ) = l_{{\hat{γ}}_{α}} (ζ)

at

{\hat{γ}}_{α}

around

γ_{1}

gives,

l_{{\hat{γ}}_{α}} ({\hat{γ}}_{α}) = l_{{\hat{γ}}_{α}} (γ_{1}) + {(\frac{\partial l_{{\hat{γ}}_{α}} (ζ)}{\partial ζ^{T}})}_{γ = γ_{1}} ({\hat{γ}}_{α} - γ_{1}) + o_{p} (∥{\hat{γ}}_{α} - γ_{1}∥) .

Based on the asymptotic distribution of

{\hat{γ}}_{α}

we have

\sqrt{n} o_{p} (∥{\hat{γ}}_{α} - γ_{1}∥) = o_{p} (1)

therefore

\sqrt{n} (l_{{\hat{γ}}_{α}} ({\hat{γ}}_{α}) - l_{γ_{1}} (γ_{1})) and \sqrt{n} {(\frac{\partial l_{{\hat{γ}}_{α}} (ζ)}{\partial ζ^{T}})}_{γ = γ_{1}} ({\hat{γ}}_{α} - γ_{1})

have asymptotically the same distribution, i.e.,

\sqrt{n} (l_{{\hat{γ}}_{α}} ({\hat{γ}}_{α}) - l_{γ_{1}} (θ_{1})) \underset{n \to \infty}{\overset{L}{\to}} N (0_{k + 1}, {(\frac{\partial l_{{\hat{γ}}_{α}} (ζ)}{\partial ζ^{T}})}_{γ = γ_{1}} A_{α} {(γ_{1})}^{- 1} {(\frac{\partial l_{{\hat{γ}}_{α}} (ζ)}{\partial ζ})}_{γ = γ_{1}}) .

Now, we shall denote,

σ^{2} (γ_{1}) = {(\frac{\partial l_{{\hat{γ}}_{α}} (ζ)}{\partial ζ^{T}})}_{γ = γ_{1}} A_{α} {(γ_{1})}^{- 1} {(\frac{\partial l_{{\hat{γ}}_{α}} (ζ)}{\partial ζ})}_{γ = γ_{1}} .

Then, we have,

\begin{matrix} P_{γ_{1}} (W_{n} ({\hat{γ}}_{α}) > χ_{r, α}^{2}) & = P_{γ_{1}} (W_{n} ({\hat{γ}}_{α}) - n l_{γ_{1}} (γ_{1}) > χ_{r, α}^{2} - n l_{γ_{1}} (γ_{1})) \\ = P_{γ_{1}} (\frac{\sqrt{n}}{σ (γ_{1})} (l_{{\hat{γ}}_{α}} ({\hat{γ}}_{α}) - l_{γ_{1}} (γ_{1})) > \frac{1}{σ (γ_{1})} (\frac{χ_{r, α}^{2}}{\sqrt{n}} - \sqrt{n} l_{γ_{1}} (γ_{1}))) \\ \approx 1 - ϕ_{N (0, 1)} (\frac{1}{σ (γ_{1})} (\frac{χ_{r, α}^{2}}{\sqrt{n}} - \sqrt{n} l_{γ_{1}} (γ_{1}))), \end{matrix}

where

ϕ_{N (0, 1)} (t)

represents the distribution function of a standard normal distribution evaluated at

t .

Finally,

lim_{n \to \infty} P_{γ_{1}} (W_{n} ({\hat{γ}}_{α}) > χ_{r, α}^{2}) = 1 .

Appendix B. Poisson Regression Model

We derive here some explicit expression for the particular case of the Poisson regression. Following the discussion in Section 5, we denote here

γ = β

since the nuisance parameter is known,

ϕ = 1 .

The Poisson distribution with parameter

e^{x_{i}^{T} β}

is given by

f_{i} (y, β) = \frac{1}{y!} e^{- e^{x_{i}^{T} β}} e^{y x_{i}^{T} β}, y = 0, 1, \dots .

Differentiating its logarithm with respect to the regression vector, we get

\frac{\partial log f_{i} (y, β)}{\partial β} = (y - e^{x_{i}^{T} β}) x_{i}^{T} .

so we can write

K_{1 i} (y, β) = y - e^{x_{i}^{T} β} .

Further, we have that

\begin{matrix} N_{i} (y, β) = \frac{f_{i} {(y, β)}^{α}}{\sum_{y = 0}^{\infty} f_{i} {(y, β)}^{α + 1}} \sum_{y = 0}^{\infty} f_{i} {(y, β)}^{α + 1} (y - e^{x_{i}^{T} β}) . \end{matrix}

so the estimating equations of the Poisson regression model are given by

\sum_{i = 1}^{n} \frac{1}{L_{α}^{i} (β)} (f_{i} {(y_{i}, β)}^{α} (y_{i} - e^{x_{i}^{T} β}) - N_{i} (y_{i}, β)) x_{i} = 0_{k} .

(A1)

For

α = 0,

we have

N_{i} (y_{i}, β) = 0 and L_{α}^{i} (β) = 1

so the estimating equations are given by

\sum_{i = 1}^{n} (y_{i} - e^{x_{i}^{T} β}) x_{i} = 0_{k},

yielding to the maximum likelihood estimating equations.

On the other hand, the asymptotic distribution of

{\hat{β}}_{α}

is given by

\sqrt{n} {(X^{T} D_{11} X)}^{- \frac{1}{2}} \frac{1}{n} X^{T} (D_{11}^{*} - {(D_{1}^{*})}^{T} D_{1}^{*}) X ({\hat{β}}_{α} - β) \underset{n \to \infty}{\overset{L}{\to}} N (0_{k}, I_{k})

being

D_{11} = d i a g (l_{11 i} (β)),

with

l_{11 i} (β) = \frac{1}{L_{α}^{i} {(β)}^{2}} \sum_{y = 0}^{\infty} f_{i} {(y, β)}^{2 α + 1} (K_{1 i} (y, β) - m_{1 i} (β))

and

m_{1 i} (β) = \frac{1}{\sum_{y = 0}^{\infty} f_{i} {(y, β)}^{α + 1}} \sum_{y = 0}^{\infty} f_{i} {(y, β)}^{α + 1} (y - e^{x_{i}^{T} β}) .

Finally

\begin{matrix} D_{11}^{*} & = d i a g (m_{11 i} ((β))) = \frac{1}{\sum_{y = 0}^{\infty} f_{i} {(y, β)}^{α + 1}} \sum_{y = 0}^{\infty} f_{i} {(y, β)}^{α + 1} {(y - e^{x_{i}^{T} β})}^{2} . \end{matrix}

References

Nelder, J.A.; Wedderburn, R.W.M. Generalized linear models. J. R. Stat. Soc. 1972, 135, 370–384. [Google Scholar] [CrossRef]
McCullagh, P.; Nelder, J.A. Generalized Linear Models. In Monographs on Statistics and Applied Probability; Chapman and Hall: London, UK, 1983. [Google Scholar]
Jaenada, M.; Pardo, L. The minimum Renyi’s Pseudodistances estimators for Generalized Linear Models. In Data Analysis and Related Applications: Theory and Practice; Proceeding of the ASMDA; Wiley: Athens, Greece, 2021. [Google Scholar]
Stefanski, L.A.; Carroll, R.J.; Ruppert, D. Optimally bounded score functions for generalized linear models with applications to logistic regression. Biometrika 1986, 73, 413–424. [Google Scholar] [CrossRef]
Krasker, W.S.; Welsch, R.E. Efficient bounded-influence regression estimation. J. Am. Stat. Assoc. 1982, 77, 595–604. [Google Scholar] [CrossRef]
Künsch, H.R.; Stefanski, L.A.; Carroll, R.J. Conditionally unbiased bounded-influence estimation in general regression models, with applications to generalized linear models. J. Am. Stat. Assoc. 1989, 84, 460–466. [Google Scholar]
Morgenthaler, S. Least-absolute-deviations fits for generalized linear models. Biometrika 1992, 79, 747–754. [Google Scholar] [CrossRef]
Cantoni, E.; Ronchetti, E. Robust inference for generalized linear models. J. Am. Stat. Assoc. 2001, 96, 1022–1030. [Google Scholar] [CrossRef] [Green Version]
Bianco, A.M.; Yohai, V.J. Robust estimation in the logistic regression model. In Robust Statistics, Data Analysis, and Computer Intensive Methods; Springer: New York, NY, USA, 1996; pp. 17–34. [Google Scholar]
Croux, C.; Haesbroeck, G. Implementing the Bianco and Yohai estimator for logistic regression. Comput. Stat. Data Anal. 2003, 44, 273–295. [Google Scholar] [CrossRef] [Green Version]
Bianco, A.M.; Boent, G.; Rodrigues, I.M. Robust tests in generalized linear models with missing responses. Comput. Stat. Data Anal. 2013, 65, 80–97. [Google Scholar] [CrossRef] [Green Version]
Valdora, M.; Yohai, V.J. Robust estimators for generalized linear models. J. Stat. Plan. Inference 2014, 146, 31–48. [Google Scholar] [CrossRef]
Ghosh, A.; Basu, A. Robust estimation in generalized linear models: The density power divergence approach. Test 2016, 25, 269–290. [Google Scholar] [CrossRef] [Green Version]
Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and efficient estimation by minimising a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef] [Green Version]
Basu, A.; Ghosh, A.; Mandal, A.; Martin, N.; Pardo, L. Robust Wald-type tests in GLM with random design based on minimum density power divergence estimators. Stat. Method Appl. 2021, 3, 933–1005. [Google Scholar] [CrossRef]
Broniatowski, M.; Toma, A.; Vajda, I. Decomposable pseudodistances and applications in statistical estimation. J. Stat. Plan. Inference 2012, 142, 2574–2585. [Google Scholar] [CrossRef] [Green Version]
Castilla, E.; Martín, N.; Muñoz, S.; Pardo, L. Robust Wald-type tests based on Minimum Rényi Pseudodistance Estimators for the Multiple Regression Model. J. Stat. Comput. Simul. 2020, 14, 2592–2613. [Google Scholar] [CrossRef]
Toma, A.; Leoni-Aubin, S. Optimal robust M-estimators using Rényi pseudodistances. J. Multivar. Anal. 2013, 115, 259–273. [Google Scholar] [CrossRef]
Toma, A.; Karagrigoriou, A.; Trentou, P. Robust model selection criteria based on pseudodistances. Entropy 2020, 22, 304. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rényi, A. On measures of entropy and information. In Proceeding of the 4th Symposium on Probability and Statistics; University of California Press: Berkely, CA, USA, 1961; pp. 547–561. [Google Scholar]
Jones, M.C.; Hjort, N.L.; Harris, I.R.; Basu, A. A comparison of related density-based minimum divergence estimators. Biometrika 2001, 88, 865–873. [Google Scholar] [CrossRef]
Fujisawa, H.; Eguchi, S. Robust parameter estimation with a small bias against heavy contamination. J. Multivar. Anal. 2008, 99, 2053–2081. [Google Scholar] [CrossRef] [Green Version]
Hirose, K.; Masuda, H. Robust relative error estimation. Entropy 2018, 20, 632. [Google Scholar] [CrossRef] [Green Version]
Kawashima, T.; Fujisawa, H. Robust and sparse regression via γ-divergence. Entropy 2017, 19, 608. [Google Scholar] [CrossRef] [Green Version]
Kawashima, T.; Fujisawa, H. Robust and sparse regression in generalized linear model by stochastic optimization. Jpn. J. Stat. Data Sci. 2019, 2, 465–489. [Google Scholar] [CrossRef]
Windham, M.P. Robustifying model fitting. J. R. Stat. Soc. Ser. B 1995, 57, 599–609. [Google Scholar] [CrossRef]
Castilla, E.; Jaenada, M.; Pardo, L. Estimation and testing on independent not identically distributed observations based on Rényi’s pseudodistances. arXiv 2021, arXiv:2102.12282. [Google Scholar]
Pardo, L. Statistical Inference Based on Divergence Measures; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018. [Google Scholar]
Fraser, D.A.S. Non parametric Methods in Statistics; John Wiley & Sons: New York, NY, USA, 1957. [Google Scholar]
Maronna, R.A.; Martin, R.D.; Yohai, V.J. Robust Statistics Theory and Methods; John Wiley & Sons. Inc.: Hoboken, NJ, USA, 2006. [Google Scholar]
Donoho, D.L.; Huber, P.J. The notion of breakdown point. In A Festschrift for Erich L. Lehmann; CRC Press: Boca Raton, FL, USA, 1983. [Google Scholar]
Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
Lô, S.N.; Ronchetti, E. Robust and accurate inference for generalized linear models. J. Multivar. Anal. 2009, 100, 2126–2136. [Google Scholar] [CrossRef] [Green Version]
Phelps, K. Use of the Complementary Log-Log Function to Describe Dose Response Relationships in Insecticide Evaluation Field Trials. In Lecture Notes in Statistics, No. 14.: Proceedings of the International Conference on Generalized Linear Models; Gilchrist, R., Ed.; Springer: Berlin, Germany, 1982. [Google Scholar]

Figure 1. IF of MRPEs with

α = 0

(left) and

α = 0.5

(right) of Poisson regression model.

Figure 1. IF of MRPEs with

α = 0

(left) and

α = 0.5

(right) of Poisson regression model.

Figure 2. Mean Squared Error (MSE) on estimation (left) and prediction (right) against contamination level on data.

Figure 3. MSE in estimation of

β

in absence of contamination (left) and under

5 %

of contamination level in data (right) with different values of

α

against sample size for Poisson regression model.

Figure 3. MSE in estimation of

β

in absence of contamination (left) and under

5 %

of contamination level in data (right) with different values of

α

against sample size for Poisson regression model.

Table 1. Rejection rate of Wald-type test statistics based on MRPEs with different true values of

β_{2}

and contamination levels.

Table 1. Rejection rate of Wald-type test statistics based on MRPEs with different true values of

β_{2}

and contamination levels.

$β_{2}$	$α$	Contamination Level
$β_{2}$	$α$	0	$5 %$	$10 %$	$15 %$	$20 %$	$25 %$
$0.3$	0	0.332	0.264	0.227	0.187	0.157	0.141
	0.1	0.435	0.376	0.328	0.285	0.251	0.223
	0.3	0.557	0.511	0.483	0.416	0.390	0.360
	0.5	0.617	0.563	0.533	0.493	0.467	0.427
	0.7	0.638	0.590	0.568	0.536	0.513	0.476
$0.5$	0	0.756	0.730	0.683	0.621	0.551	0.493
	0.1	0.833	0.798	0.775	0.736	0.681	0.622
	0.3	0.885	0.870	0.864	0.829	0.792	0.752
	0.5	0.895	0.891	0.886	0.867	0.842	0.814
	0.7	0.901	0.897	0.893	0.879	0.854	0.832
$0.7$	0	0.971	0.979	0.968	0.948	0.915	0.862
	0.1	0.980	0.988	0.983	0.973	0.962	0.932
	0.3	0.988	0.995	0.992	0.987	0.985	0.969
	0.5	0.989	0.995	0.995	0.992	0.992	0.977
	0.7	0.989	0.995	0.993	0.995	0.990	0.983

Table 2. Estimated coefficients for Crohn’s disease data for different values of

α

with original data and clean data (after removing influential observations).

Table 2. Estimated coefficients for Crohn’s disease data for different values of

α

with original data and clean data (after removing influential observations).

	Intercept	BMI	Height	Age	Country	Sex	Dose 1
Original Data
MLE ( $α =$ 0)	6.261	0.026	−0.037	0.012	−0.394	−0.646	−0.533
$α =$ 0.1	5.197	0.037	−0.033	0.014	−0.489	−0.800	−0.469
$α =$ 0.3	4.798	0.058	−0.036	0.021	−0.545	−1.284	−0.832
$α =$ 0.5	4.391	0.067	−0.037	0.028	−0.557	−1.535	−1.036
$α =$ 0.7	5.699	0.067	−0.047	0.036	−0.737	−1.759	−1.157

Table 3. p-values of test with null hypothesis

H_{0}

: AGE = 0 with original and clean data (after removing influential observations).

Table 3. p-values of test with null hypothesis

H_{0}

: AGE = 0 with original and clean data (after removing influential observations).

	Original Data	Clean Data
MLE ( $α =$ 0)	0.059	0.011
$α =$ 0.1	0.018	0.004
$α =$ 0.3	0.001	0.000
$α =$ 0.5	0.000	0.000
$α =$ 0.7	0.000	0.000

Table 4. Estimated coefficients for damaged carrots data for different values of

α

with original data and clean data (after outliers removal).

Table 4. Estimated coefficients for damaged carrots data for different values of

α

with original data and clean data (after outliers removal).

	Intercept	Logdose	B1	B2
Original Data
MLE ( $α = 0$ )	1.480	−1.817	0.542	0.843
$α =$ 0.1	1.729	−1.949	0.527	0.755
$α =$ 0.3	2.017	−2.100	0.479	0.652
$α =$ 0.5	2.090	−2.134	0.386	0.625
$α =$ 0.7	2.150	−2.161	0.258	0.615
Clean Data
MLE ( $α = 0$ )	2.141	−2.179	0.546	0.636
$α =$ 0.1	2.126	−2.167	0.529	0.633
$α =$ 0.3	2.105	−2.149	0.479	0.627
$α =$ 0.5	2.108	−2.144	0.385	0.621
$α =$ 0.7	2.154	−2.163	0.257	0.614

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jaenada, M.; Pardo, L. Robust Statistical Inference in Generalized Linear Models Based on Minimum Renyi’s Pseudodistance Estimators. Entropy 2022, 24, 123. https://doi.org/10.3390/e24010123

AMA Style

Jaenada M, Pardo L. Robust Statistical Inference in Generalized Linear Models Based on Minimum Renyi’s Pseudodistance Estimators. Entropy. 2022; 24(1):123. https://doi.org/10.3390/e24010123

Chicago/Turabian Style

Jaenada, María, and Leandro Pardo. 2022. "Robust Statistical Inference in Generalized Linear Models Based on Minimum Renyi’s Pseudodistance Estimators" Entropy 24, no. 1: 123. https://doi.org/10.3390/e24010123

APA Style

Jaenada, M., & Pardo, L. (2022). Robust Statistical Inference in Generalized Linear Models Based on Minimum Renyi’s Pseudodistance Estimators. Entropy, 24(1), 123. https://doi.org/10.3390/e24010123

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Statistical Inference in Generalized Linear Models Based on Minimum Renyi’s Pseudodistance Estimators

Abstract

1. Introduction

2. Asymptotic Distribution of the MRPEs for the GLMs

3. Wald Type Tests for the GLMs

4. Influence Function

5. Numerical Analysis: Poisson Regression Model

6. Real Data Applications

6.1. Example I: Poisson Regression Regression

6.2. Example II: Binomial Regression

7. Conclussions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Proof of Theorem 3

Appendix B. Poisson Regression Model

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI