Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator

Castilla, Elena; Martín, Nirian; Pardo, Leandro; Zografos, Konstantinos

doi:10.3390/e20010018

Open AccessArticle

Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator

by

Elena Castilla

^1,*,

Nirian Martín

²

,

Leandro Pardo

¹ and

Konstantinos Zografos

³

¹

Department of Statistics and O.R. I, Complutense University of Madrid, 28040 Madrid, Spain

²

Department of Statistics and O.R. II, Complutense University of Madrid, 28003 Madrid, Spain

³

Department of Mathematics, University of Ioannina, 45110 Ioannina, Greece

^*

Author to whom correspondence should be addressed.

Entropy 2018, 20(1), 18; https://doi.org/10.3390/e20010018

Submission received: 6 November 2017 / Revised: 26 December 2017 / Accepted: 28 December 2017 / Published: 31 December 2017

(This article belongs to the Special Issue New Developments in Statistical Information Theory Based on Entropy and Divergence Measures)

Download Versions Notes

Abstract

In this paper, a robust version of the Wald test statistic for composite likelihood is considered by using the composite minimum density power divergence estimator instead of the composite maximum likelihood estimator. This new family of test statistics will be called Wald-type test statistics. The problem of testing a simple and a composite null hypothesis is considered, and the robustness is studied on the basis of a simulation study. The composite minimum density power divergence estimator is also introduced, and its asymptotic properties are studied.

Keywords:

composite likelihood; maximum composite likelihood estimator; Wald test statistic; composite minimum density power divergence estimator; Wald-type test statistics

1. Introduction

It is well known that the likelihood function is one of the most important tools in classical inference, and the resultant estimator, the maximum likelihood estimator (MLE), has nice efficiency properties, although it has not so good robustness properties.

Tests based on MLE (likelihood ratio test, Wald test, Rao’s test, etc.) have, usually, good efficiency properties, but in the presence of outliers, the behavior is not so good. To solve these situations, many robust estimators have been introduced in the statistical literature, some of them based on distance measures or divergence measures. In particular, density power divergence measures introduced in [1] have given good robust estimators: minimum density power divergences estimators (MDPDE) and, based on them, some robust test statistics have been considered for testing simple and composite null hypotheses. Some of these tests are based on divergence measures (see [2,3]), and some others are used to extend the classical Wald test; see [4,5,6] and the references therein.

The classical likelihood function requires exact specification of the probability density function, but in most applications, the true distribution is unknown. In some cases, where the data distribution is available in an analytic form, the likelihood function is still mathematically intractable due to the complexity of the probability density function. There are many alternatives to the classical likelihood function; in this paper, we focus on the composite likelihood. Composite likelihood is an inference function derived by multiplying a collection of component likelihoods; the particular collection used is a conditional determined by the context. Therefore, the composite likelihood reduces the computational complexity so that it is possible to deal with large datasets and very complex models even when the use of standard likelihood methods is not feasible. Asymptotic normality of the composite maximum likelihood estimator (CMLE) still holds with the Godambe information matrix to replace the expected information in the expression of the asymptotic variance-covariance matrix. This allows the construction of composite likelihood ratio test statistics, Wald-type test statistics, as well as score-type statistics. A review of composite likelihood methods is given in [7]. We have to mention at this point that CMLE, as well as the respective test statistics are seriously affected by the presence of outliers in the set of available data.

The main purpose of the paper is to introduce a new robust family of estimators, namely, composite minimum density power divergence estimators (CMDPDE), as well as a new family of Wald-type test statistics based on the CMDPDE in order to get broad classes of robust estimators and test statistics.

In Section 2, we introduce the CMDPDE, and we provide the associated estimating system of equations. The asymptotic distribution of the CMDPDE is obtained in Section 2.1. Section 2.2 is devoted to the definition of a family of Wald-type test statistics, based on CMDPDE, for testing simple and composite null hypotheses. The asymptotic distribution of these Wald-type test statistics is obtained, as well as some asymptotic approximations to the power function. A numerical example, presented previously in [8], is studied in Section 3. A simulation study based on this example is also presented (Section 3), in order to study the robustness of the CMDPDE, as well as the performance of the Wald-type test statistics based on CMDPDE. Proofs of the results are presented in the Appendix A.

2. Composite Minimum Density Power Divergence Estimator

We adopt here the notation by [9], regarding the composite likelihood function and the respective CMLE. In this regard, let

{f (\cdot; θ), θ \in Θ \subseteq R^{p}, p \geq 1}

be a parametric identifiable family of distributions for an observation

y

, a realization of a random m-vector

Y

. In this setting, the composite density based on K different marginal or conditional distributions has the form:

CL (θ, y) = \prod_{k = 1}^{K} {(f_{A_{k}} (y_{j}, j \in A_{k}; θ))}^{w_{k}}

and the corresponding composite log-density has the form:

c ℓ (θ, y) = \sum_{k = 1}^{K} w_{k} ℓ_{A_{k}} (θ, y),

with:

ℓ_{A_{k}} (θ, y) = log f_{A_{k}} (y_{j}, j \in A_{k}; θ),

where

{A_{k}}_{k = 1}^{K}

is a family of random variables associated either with marginal or conditional distributions involving some

y_{j}

and

j \in {1, \dots, m}

and

w_{k}

,

k = 1, \dots, K

are non-negative and known weights. If the weights are all equal, then they can be ignored. In this case, all the statistical procedures produce equivalent results.

Let

y_{1}, \dots, y_{n}

also be independent and identically distributed replications of

y

. We denote by:

c ℓ (θ, y_{1}, \dots, y_{n}) = \sum_{i = 1}^{n} c ℓ (θ, y_{i})

the composite log-likelihood function for the whole sample. In complete accordance with the classical MLE, the CMLE,

{\hat{θ}}_{c}

, is defined by:

{\hat{θ}}_{c} = \underset{θ \in Θ}{arg max} \sum_{i = 1}^{n} c ℓ (θ, y_{i}) = \underset{θ \in Θ}{arg max} \sum_{i = 1}^{n} \sum_{k = 1}^{K} w_{k} ℓ_{A_{k}} (θ, y_{i}) .

(1)

It can also be obtained by solving the equations.

u (θ, y_{1}, \dots, y_{n}) = 0_{p},

(2)

where:

u (θ, y_{1}, \dots, y_{n}) = \frac{\partial c ℓ (θ, y_{1}, \dots, y_{n})}{\partial θ} = \sum_{i = 1}^{n} \sum_{k = 1}^{K} w_{k} \frac{\partial ℓ_{A_{k}} (θ, y_{i})}{\partial θ} .

We are going to see how it is possible to get the CMLE,

{\hat{θ}}_{c},

on the basis of the Kullback–Leibler divergence measure. We shall denote by

g (y)

the density generating the data with the respective distribution function denoted by G. The Kullback–Leibler divergence between the density function

g (y)

and the composite density function

CL (θ, y)

is given by:

\begin{matrix} d_{K L} (g (.), CL (θ, .)) & = & \int_{R^{m}} g (y) log \frac{g (y)}{CL (θ, y)} d y \\ = & \int_{R^{m}} g (y) log g (y) d y - \int_{R^{m}} g (y) log CL (θ, y) d y . \end{matrix}

The term:

\int_{R^{m}} g (y) log g (y) d y

can be removed because it does not depend on

θ

; hence, we can define the following estimator of

θ,

based on the Kullback–Leibler divergence:

{\hat{θ}}_{K L} = arg min_{θ} d_{K L} (g (.), CL (θ, .))

or equivalently:

\begin{matrix} {\hat{θ}}_{K L} & = & arg min_{θ} (- \int_{R^{m}} g (y) log CL (θ, y) d y) \\ = & arg min_{θ} (- \int_{R^{m}} log CL (θ, y) d G (y)) . \end{matrix}

(3)

If we replace in (3) the distribution function G by the empirical distribution function

G_{n}

, we have:

\begin{matrix} {\hat{θ}}_{K L} & = & arg min_{θ} (- \int_{R^{m}} log CL (θ, y) d G_{n} (y)) \\ = & arg min_{θ} (- \frac{1}{n} \sum_{i = 1}^{n} c ℓ (θ, y_{i})) \end{matrix}

and this expression is equivalent to Expression (1). Therefore, the estimator

{\hat{θ}}_{K L}

coincides with the CMLE. Based on the previous idea, we are going to introduce, in a natural way, the composite minimum density power divergence estimator (CMDPDE).

The CMLE,

{\hat{θ}}_{c}

, obeys asymptotic normality (see [9]) and in particular:

\sqrt{n} ({\hat{θ}}_{c} - θ) \underset{n \to \infty}{\overset{L}{⟶}} N (0, {(G_{*} (θ))}^{- 1}),

where

G_{*} (θ)

denotes the Godambe information matrix, defined by:

G_{*} (θ) = H (θ) {(J (θ))}^{- 1} H (θ),

with

H (θ)

being the sensitivity or Hessian matrix and

J (θ)

being the variability matrix, defined, respectively, by:

\begin{matrix} H (θ) & = E_{θ} [- \frac{\partial}{\partial θ} u {(θ, Y)}^{T}], \\ J (θ) & = V a r_{θ} [u (θ, Y)] = E_{θ} [u (θ, Y) u {(θ, Y)}^{T}], \end{matrix}

where the superscript T denotes the transpose of a vector or a matrix.

The matrix

J (θ)

is nonnegative definite by definition. In the following, we shall assume that the matrix

H (θ)

is of full rank. Since the component score functions can be correlated, we have

H (θ) \neq J (θ)

. If

c ℓ (θ, y)

is a true log-likelihood function, then

H (θ) = J (θ) = I_{F} (θ)

,

I_{F} (θ)

being the Fisher information matrix of the model. Using multivariate version of the Cauchy–Schwarz inequality, we have that the matrix

G_{*} (θ) - I_{F} (θ)

is non-negative definite, i.e., the full likelihood function is more efficient than any other composite likelihood function (cf. [10], Lemma 4A).

We are now going to proceed to the definition of the CMDPDE, which is based on the density power divergence measure, defined as follows. For two densities p and q associated with two m-dimensional random variables, respectively, the density power divergence (DPD) between p and q was defined in [1] by:

d_{β} (p, q) = \int_{R^{m}} \{q {(y)}^{1 + β} - (1 + \frac{1}{β}) q {(y)}^{β} p (y) + \frac{1}{β} p {(y)}^{1 + β}\} d y,

(4)

for

β > 0,

while for

β = 0

, it is defined by:

lim_{β \to 0} d_{β} (p, q) = d_{K L} (p, q) .

For

β = 1

, Expression (4) reduces to the

L_{2}

distance:

L_{2} (p, q) = \int_{R^{m}} {(q (y) - p (y))}^{2} d y .

It is also interesting to note that (4) is a special case of the so-called Bregman divergence

\int [T (p (y)) - T (q (y)) - {p (y) - q (y} T^{'} (q (y))] d y

. If we consider

T (l) = l^{1 + β}

, we get

β

times

d_{β} (p, q)

. The parameter

β

controls the trade-off between robustness and asymptotic efficiency of the parameter estimates (see the Simulation Section), which are the minimizers of this family of divergences. For more details about this family of divergence measures, we refer to [11].

In this paper, we are going to consider DPD measures between the density function

g (y)

and the composite density function

CL (θ, y)

, i.e.,

d_{β} (g (.), CL (θ, .)) = \int_{R^{m}} \{CL {(θ, y)}^{1 + β} - (1 + \frac{1}{β}) CL {(θ, y)}^{β} g (y) + \frac{1}{β} g {(y)}^{1 + β}\} d y

(5)

for

β > 0,

while for

β = 0

, we have,

lim_{β \to 0} d_{β} (g (.), CL (θ, .)) = d_{K L} (g (.), CL (θ, .)) .

The CMDPDE,

{\hat{θ}}_{c}^{β},

is defined by:

{\hat{θ}}_{c}^{β} = arg min_{θ \in Θ} d_{β} (g (.), CL (θ, .)) .

The term:

\int_{R^{m}} g {(y)}^{1 + β} d y

does not depend on

θ

, and consequently, the minimization of (5) with respect to

θ

is equivalent to minimizing:

\int_{R^{m}} (CL {(θ, y)}^{1 + β} - (1 + \frac{1}{β}) CL {(θ, y)}^{β} g (y)) d y

or:

\int_{R^{m}} CL {(θ, y)}^{1 + β} d y - (1 + \frac{1}{β}) \int_{R^{m}} CL {(θ, y)}^{β} d G (y) .

Now, we replace the distribution function G by the empirical distribution function

G_{n}

, and we get:

\int_{R^{m}} CL {(θ, y)}^{1 + β} d y - (1 + \frac{1}{β}) \frac{1}{n} \sum_{i = 1}^{n} CL {(θ, y_{i})}^{β} .

(6)

As a consequence, for a fixed value of

β,

the CMDPDE of

θ

can be obtained by minimizing the expression given in (6); or equivalently, by maximizing the expression:

\frac{1}{n β} \sum_{i = 1}^{n} CL {(θ, y_{i})}^{β} - \frac{1}{1 + β} \int_{R^{m}} CL {(θ, y)}^{1 + β} d y .

(7)

Under the differentiability of the model, the maximization of the function in Equation (7) leads to an estimating system of equations of the form:

\frac{1}{n} \sum_{i = 1}^{n} CL {(θ, y_{i})}^{β} \frac{\partial c ℓ (θ, y_{i})}{\partial θ} - \int_{R^{m}} \frac{\partial c ℓ (θ, y)}{\partial θ} CL {(θ, y)}^{1 + β} d y = 0 .

(8)

The system of Equations (8) can be written as:

\frac{1}{n} \sum_{i = 1}^{n} CL {(θ, y_{i})}^{β} u (θ, y_{i}) - \int_{R^{m}} u (θ, y) CL {(θ, y)}^{1 + β} d y = 0 .

(9)

and the CMDPDE

{\hat{θ}}_{c}^{β}

of

θ

is obtained by the solution of (9). For

β = 0

in (9), we have:

\frac{1}{n} \sum_{i = 1}^{n} u (θ, y) - \int_{R^{m}} u (θ, y) CL (θ, y) d y .

but:

\int_{R^{m}} u (θ, y) CL (θ, y) d y = \frac{\partial}{\partial θ} CL (θ, y) d y = 0

and we recover the estimating equation for the CMLE,

{\hat{θ}}_{c}

, presented in (2).

2.1. Asymptotic Distribution of the Composite Minimum Density Power Divergence Estimator

Equation (9) can be written as follows:

\frac{1}{n} \sum_{i = 1}^{n} Ψ_{β} (y_{i}, θ) = 0

with:

Ψ_{β} (y_{i}, θ) = CL {(θ, y_{i})}^{β} u (θ, y_{i}) - \int_{R^{m}} u (θ, y) CL {(θ, y)}^{1 + β} d y .

Therefore, the CMDPDE,

{\hat{θ}}_{c}^{β},

is an M-estimator. In this case, it is well known (cf. [12]) that the asymptotic distribution of

{\hat{θ}}_{c}^{β}

is given by:

\sqrt{n} ({\hat{θ}}_{c}^{β} - θ) \underset{n \to \infty}{\overset{L}{⟶}} N (0, {(H_{β} (θ))}^{- 1} J_{β} (θ) {(H_{β} (θ))}^{- 1}),

being:

H_{β} (θ) = E_{θ} [- \frac{\partial Ψ_{β} (Y, θ)}{\partial θ^{T}}]

and:

J_{β} (θ) = E_{θ} [Ψ_{β} (Y, θ) Ψ_{β} {(Y, θ)}^{T}] .

We are going to establish the expressions of

H_{β} (θ)

and

J_{β} (θ) .

In relation to

H_{β} (θ)

, we have:

\begin{matrix} \frac{\partial Ψ_{β} (y, θ)}{\partial θ^{T}} & = & β CL {(θ, y)}^{β - 1} CL (θ, y) u {(θ, y)}^{T} u (θ, y) + CL {(θ, y)}^{β} \frac{\partial u {(θ, y)}^{T}}{\partial θ} \\ - \int_{R^{m}} \frac{\partial u {(θ, y)}^{T}}{\partial θ} CL {(θ, y)}^{1 + β} d y - (1 + β) \int_{R^{m}} CL {(θ, y)}^{β} CL (θ, y) u {(θ, y)}^{T} u (θ, y) d y \end{matrix}

and:

H_{β} (θ) = E_{θ} [- \frac{\partial Ψ_{β} (Y, θ)}{\partial θ^{T}}] = \int_{R^{m}} CL {(θ, y)}^{β + 1} u {(θ, y)}^{T} u (θ, y) d y .

(10)

In relation to

J_{β} (θ)

, we have,

\begin{matrix} Ψ_{β} (Y, θ) Ψ_{β} {(Y, θ)}^{T} & = & (CL {(θ, y)}^{β} u (θ, y) - \int_{R^{m}} u (θ, y) CL {(θ, y)}^{1 + β} d y) \\ (CL {(θ, y)}^{β} u {(θ, y)}^{T} - \int_{R^{m}} u {(θ, y)}^{T} CL {(θ, y)}^{1 + β} d y) \\ = & CL {(θ, y)}^{2 β} u (θ, y) u {(θ, y)}^{T} - CL {(θ, y)}^{β} u (θ, y) \int_{R^{m}} u {(θ, y)}^{T} CL {(θ, y)}^{1 + β} d y \\ - CL {(θ, y)}^{β} u {(θ, y)}^{T} \int_{R^{m}} u (θ, y) CL {(θ, y)}^{1 + β} d y \\ + (\int_{R^{m}} u (θ, y) CL {(θ, y)}^{1 + β} d y) (\int_{R^{m}} u {(θ, y)}^{T} CL {(θ, y)}^{1 + β} d y) . \end{matrix}

Then,

\begin{matrix} J_{β} (θ) & = & E_{θ} [Ψ_{β} (Y, θ) Ψ_{β} {(Y, θ)}^{T}] = \int_{R^{m}} CL {(θ, y)}^{2 β + 1} u (θ, y) u {(θ, y)}^{T} d y \end{matrix}

(11)

\begin{matrix} - \int_{R^{m}} CL {(θ, y)}^{β + 1} u (θ, y) d y \int_{R^{m}} u {(θ, y)}^{T} CL {(θ, y)}^{1 + β} d y . \end{matrix}

(12)

Based on the previous results, we have the following theorem.

Theorem 1.

Under suitable regularity conditions, we have:

\sqrt{n} ({\hat{θ}}_{c}^{β} - θ) \underset{n \to \infty}{\overset{L}{⟶}} N (0, {(H_{β} (θ))}^{- 1} J_{β} (θ) {(H_{β} (θ))}^{- 1}),

where the matrices

H_{β} (θ)

and

J_{β} (θ)

were defined in (10) and (11), respectively.

Remark 1.

If we apply the previous theorem for

β = 0

, then we get the CMLE, and the asymptotic variance covariance matrix coincides with the Godambe information matrix because:

H_{β} (θ) = H (θ) and J_{β} (θ) = J (θ),

for

β = 0 .

2.2. Wald-Type Tests Statistics Based on the Composite Minimum Power Divergence Estimator

Wald-type test statistics based on MDPDE have been considered with excellent results in relation to the robustness in different statistical problems; see for instance [4,5,6].

Motivated by those works, we focus in this section on the definition and the study of Wald-type test statistics, which are defined by means of CMDPDE estimators instead of MDPDE estimators. In this context, if we are interested in testing:

H_{0} : θ = θ_{0} against H_{1} : θ \neq θ_{0},

(13)

we can consider the family of Wald-type test statistics:

W_{n, β}^{0} = n {({\hat{θ}}_{c}^{β} - θ_{0})}^{T} {({(H_{β} (θ_{0}))}^{- 1} J_{β} (θ_{0}) {(H_{β} (θ_{0}))}^{- 1})}^{- 1} ({\hat{θ}}_{c}^{β} - θ_{0}) .

(14)

For

β = 0

, we get the classical Wald-type test statistic considered in the composite likelihood methods (see for instance [7]).

In the following theorem, we present the asymptotic null distribution of the family of the Wald-type test statistics

W_{n, β}^{0} .

Theorem 2.

The asymptotic distribution of the Wald-type test statistics given in (14) is a chi-square distribution with p degrees of freedom.

The proof of this Theorem 2 is given in Appendix A.1.

Theorem 3.

Let

θ^{*}

be the true value of the parameter θ, with

θ^{*} \neq θ_{0} .

Then, it holds:

\sqrt{n} (l ({\hat{θ}}_{c}^{β}) - l (θ^{*})) \underset{n \to \infty}{\overset{L}{⟶}} N (0, σ_{W_{β}^{0}}^{2} (θ^{*})),

being:

l (θ) = {(θ - θ_{0})}^{T} {({(H_{β} (θ_{0}))}^{- 1} J_{β} (θ_{0}) {(H_{β} (θ_{0}))}^{- 1})}^{- 1} (θ - θ_{0})

and:

σ_{W_{β}^{0}}^{2} (θ^{*}) = 4 {(θ^{*} - θ_{0})}^{T} {({(H_{β} (θ_{0}))}^{- 1} J_{β} (θ_{0}) {(H_{β} (θ_{0}))}^{- 1})}^{- 1} (θ^{*} - θ_{0}) .

(15)

The proof of the Theorem is outlined in Appendix A.2.

Remark 2.

Based on the previous result, we can approximate the power,

β_{W_{n}^{0}},

of the Wald-type test statistics in

θ^{*}

by:

\begin{matrix} β_{W_{n, β}^{0}} (θ^{*}) & = & Pr (W_{n, β}^{0} > χ_{p, α}^{2} / θ = θ^{*}) \\ = & Pr (l ({\hat{θ}}_{c}^{β}) - l (θ^{*}) > \frac{χ_{p, α}^{2}}{n} - l (θ^{*})| θ = θ^{*}) \\ = & Pr (\sqrt{n} (l ({\hat{θ}}_{c}^{β}) - l (θ^{*})) > \sqrt{n} (\frac{χ_{p, α}^{2}}{n} - l (θ^{*}))| θ = θ^{*}) \\ = & Pr (\sqrt{n} \frac{(l ({\hat{θ}}_{c}^{β}) - l (θ^{*}))}{σ_{W_{n, β}^{0}} (θ^{*})} > \frac{\sqrt{n}}{σ_{W_{n, β}^{0}} (θ^{*})} (\frac{χ_{p, α}^{2}}{n} - l (θ^{*}))| θ = θ^{*}) \\ = & 1 - Φ_{n} (\frac{\sqrt{n}}{σ_{W_{n, β}^{0}} (θ^{*})} (\frac{χ_{p, α}^{2}}{n} - l (θ^{*}))), \end{matrix}

where

Φ_{n}

is a sequence of distribution functions tending uniformly to the standard normal distribution function

Φ (x) .

It is clear that:

lim_{n \to \infty} β_{W_{n, β}^{0}} (θ^{*}) = 1

for all

α \in (0, 1) .

Therefore, the Wald-type test statistics are consistent in the sense of Fraser.

In many practical hypothesis testing problems, the restricted parameter space

Θ_{0} \subset Θ

is defined by a set of r restrictions of the form:

g (θ) = 0_{r}

(16)

on

Θ

, where

g : R^{p} \to R^{r}

is a vector-valued function such that the

p \times r

matrix:

G (θ) = \frac{\partial g {(θ)}^{T}}{\partial θ}

(17)

exists and is continuous in

θ

and rank

(G (θ)) = r

; where

0_{r}

denotes the null vector of dimension r.

Now, we are going to consider composite null hypotheses,

Θ_{0} \subset Θ

, in the way considered in (16), and our interest is in testing:

H_{0} : θ \in Θ_{0} against H_{1} : θ \notin Θ_{0}

(18)

on the basis of a random simple of size

n,

X_{1}, \dots, X_{n} .

Definition 1.

The family of Wald-type test statistics for testing (18) is given by:

W_{n, β} = n g {({\hat{θ}}_{c}^{β})}^{T} {[G {({\hat{θ}}_{c}^{β})}^{T} {(H_{β} ({\hat{θ}}_{c}^{β}))}^{- 1} J_{β} ({\hat{θ}}_{c}^{β}) {(H_{β} ({\hat{θ}}_{c}^{β}))}^{- 1} G ({\hat{θ}}_{c}^{β})]}^{- 1} g ({\hat{θ}}_{c}^{β}),

(19)

where the matrices

G (θ), H_{β} (θ)

and

J_{β} (θ)

were defined in (17), (10) and (11), respectively, and the function

g

in (16).

If we consider

β = 0

, then

{\hat{θ}}_{c}^{β}

coincides with the CMLE,

{\hat{θ}}_{c},

of

θ

and

{(H_{β} ({\hat{θ}}_{c}))}^{- 1} J_{β} ({\hat{θ}}_{c}) {(H_{β} ({\hat{θ}}_{c}))}^{- 1}

with the inverse of the Fisher information matrix, and then, we get the classical Wald test statistic considered in the composite likelihood methods.

In the next theorem, we present the asymptotic distribution of

W_{n, β} .

Theorem 4.

The asymptotic distribution of the Wald-type test statistics, given in (19), is a chi-square distribution with r degrees of freedom.

The proof of this Theorem is presented in Appendix A.3.

Consider the null hypothesis

H_{0} : θ \in Θ_{0} \subset Θ

. By Theorem 4, the null hypothesis should be rejected if

W_{n, β} \geq χ_{r, α}^{2}

. The following theorem can be used to approximate the power function. Assume that

θ^{*} \notin Θ_{0}

is the true value of the parameter, so that

{\hat{θ}}_{c}^{β} \overset{a . s .}{\underset{n \to \infty}{⟶}} θ^{*}

.

Theorem 5.

Let

θ^{*}

be the true value of the parameter, with

θ^{*} \neq θ_{0} .

Then, it holds:

\sqrt{n} (l^{*} ({\hat{θ}}_{c}^{β}) - l^{*} (θ^{*})) \underset{n \to \infty}{\overset{L}{⟶}} N (0, σ_{W_{β}}^{2} (θ^{*}))

being:

l^{*} (θ) = n g {(θ)}^{T} {[G {(θ_{0})}^{T} {(H_{β} (θ_{0}))}^{- 1} J_{β} (θ_{0}) {(H_{β} (θ_{0}))}^{- 1} G (θ_{0})]}^{- 1} g (θ)

and:

σ_{W_{β}}^{2} (θ^{*}) = {(\frac{\partial l^{*} (θ)}{\partial θ})}_{θ = θ^{*}}^{T} {(H_{β} (θ_{0}))}^{- 1} J_{β} (θ_{0}) {(H_{β} (θ_{0}))}^{- 1} {(\frac{\partial l^{*} (θ)}{\partial θ})}_{θ = θ^{*}} .

(20)

3. Numerical Example

In this section, we shall consider an example, studied previously by [8], in order to study the robustness of CMLE. The aim of this section is to clarify the different issues that were discussed in the previous sections.

Consider the random vector

Y = {(Y_{1}, Y_{2}, Y_{3}, Y_{4})}^{T}

, which follows a four-dimensional normal distribution with mean vector

μ = {(μ_{1}, μ_{2}, μ_{3}, μ_{4})}^{T}

and variance-covariance matrix:

Σ = (\begin{matrix} 1 & ρ & 2 ρ & 2 ρ \\ ρ & 1 & 2 ρ & 2 ρ \\ 2 ρ & 2 ρ & 1 & ρ \\ 2 ρ & 2 ρ & ρ & 1 \end{matrix}),

(21)

i.e., we suppose that the correlation between

Y_{1}

and

Y_{2}

is the same as the correlation between

Y_{3}

and

Y_{4}

. Taking into account that

Σ

should be semi-positive definite, the following condition is imposed:

- \frac{1}{5} \leq ρ \leq \frac{1}{3}

. In order to avoid several problems regarding the consistency of the CMLE of the parameter

ρ

(cf. [8]), we shall consider the composite likelihood function:

CL (θ, y) = f_{A_{1}} (θ, y) f_{A_{2}} (θ, y),

where:

\begin{matrix} f_{A_{1}} (θ, y) & = f_{12} (μ_{1}, μ_{2}, ρ, y_{1}, y_{2}), \\ f_{A_{2}} (θ, y) & = f_{34} (μ_{3}, μ_{4}, ρ, y_{3}, y_{4}), \end{matrix}

where

f_{12}

and

f_{34}

are the densities of the marginals of

Y

, i.e., bivariate normal distributions with mean vectors

{(μ_{1}, μ_{2})}^{T}

and

{(μ_{3}, μ_{4})}^{T}

, respectively, and common variance-covariance matrix:

(\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}),

with densities given by:

f_{h, h + 1} (μ_{h}, μ_{h + 1}, ρ, y_{h}, y_{h + 1}) = \frac{1}{2 π \sqrt{1 - ρ^{2}}} exp \{- \frac{1}{2 (1 - ρ^{2})} Q (y_{h}, y_{h + 1})\}, h \in {1, 3},

being:

Q (y_{h}, y_{h + 1}) = {(y_{h} - μ_{h})}^{2} - 2 ρ (y_{h} - μ_{h}) (y_{h + 1} - μ_{h + 1}) + {(y_{h + 1} - μ_{h + 1})}^{2}, h \in {1, 3} .

By

θ

, we are denoting the parameter vector of our model, i.e,

θ = {(μ_{1}, μ_{2}, μ_{3}, μ_{4}, ρ)}^{T}

. The system of equations that it is necessary to solve in order to obtain the CMDPDE:

{\hat{θ}}_{c}^{β} = {({\hat{μ}}_{1, c}^{β}, {\hat{μ}}_{2, c}^{β}, {\hat{μ}}_{3, c}^{β}, {\hat{μ}}_{4, c}^{β}, {\hat{ρ}}_{c}^{β})}^{T},

is given (see Appendix A.4) by:

\frac{1}{n} \sum_{i = 1}^{n} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β - 1} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{1 i} - μ_{1}) + 2 ρ (y_{2 i} - μ_{2})]\} = 0,

(22)

\frac{1}{n} \sum_{i = 1}^{n} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β - 1} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{2 i} - μ_{2}) + 2 ρ (y_{1 i} - μ_{1})]\} = 0,

(23)

\frac{1}{n} \sum_{i = 1}^{n} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β - 1} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{3 i} - μ_{3}) + 2 ρ (y_{4 i} - μ 4)]\} = 0,

(24)

\frac{1}{n} \sum_{i = 1}^{n} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{4 i} - μ_{4}) + 2 ρ (y_{3 i} - μ_{3})]\} = 0

(25)

and:

\frac{1}{n β} \sum_{i = 1}^{n} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial ρ} - \frac{β {(2 π)}^{- 2 β}}{{(β + 1)}^{3}} \frac{2 ρ}{{(1 - ρ^{2})}^{β + 1}} = 0,

(26)

being:

\begin{matrix} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial ρ} & = & \frac{ρ}{1 - ρ^{2}} β f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \\ \{2 + \frac{1}{ρ} \{(y_{1 i} - μ_{1}) (y_{2 i} - μ_{2}) + (y_{3 i} - μ_{3}) (y_{4 i} - μ_{4})\} \\ - \frac{1}{1 - ρ^{2}} ({(y_{1 i} - μ_{1})}^{2} - 2 ρ (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2}) + {(y_{2 i} - μ_{2})}^{2}) \\ - \frac{1}{1 - ρ^{2}} ({(y_{3 i} - μ_{3})}^{2} - 2 ρ (y_{3 i} - μ_{3}) (y_{4 i} - μ_{4}) + {(y_{4 i} - μ_{4})}^{2})\} . \end{matrix}

After some heavy algebraic manipulations specified in Appendix A.5, the sensitivity and variability matrices are given by:

H_{β} (θ) = \frac{C_{β}}{(β + 1) (1 - ρ^{2})} (\begin{matrix} 1 & - ρ & 0 & 0 & 0 \\ - ρ & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & - ρ & 0 \\ 0 & 0 & - ρ & 1 & 0 \\ 0 & 0 & 0 & 0 & 2 \frac{(ρ^{2} + 1) + 2 ρ^{2} β^{2}}{(1 - ρ^{2}) (1 + β)} \end{matrix})

(27)

and:

J_{β} (θ) = H_{2 β} (θ) - ξ_{β} (θ) ξ_{β} {(θ)}^{T},

(28)

where

C_{β} = \frac{1}{{(β + 1)}^{2}} {(\frac{1}{{(2 π)}^{2} (1 - ρ^{2})})}^{β}

and

ξ_{β} (θ) = {(0, 0, 0, 0, \frac{2 ρ β C_{β}}{(β + 1) (1 - ρ^{2})})}^{T}

.

Simulation Study

A simulation study, developed by using the R statistical programming environment, is presented in order to study the behavior of the CMDPDE, as well as the behavior of the Wald-type test statistics based on them. The theoretical model studied in the previous example is considered. The parameters in the model are:

θ = {(μ_{1}, μ_{2}, μ_{3}, μ_{4}, ρ)}^{T}

and we are interested in studying the behavior of the CMDPDE:

{\hat{θ}}_{c}^{β} = {({\hat{μ}}_{1, c}^{β}, {\hat{μ}}_{2, c}^{β}, {\hat{μ}}_{3, c}^{β}, {\hat{μ}}_{4, c}^{β}, {\hat{ρ}}_{c}^{β})}^{T}

as well as the behavior of the Wald-type test statistics for testing:

H_{0} : ρ = ρ_{0} against H_{1} : ρ \neq ρ_{0} .

(29)

Through R = 10,000 replications of the simulation experiment, we compare, for different values of

β,

the corresponding CMDPDE through the root of the mean square errors (RMSE), when the true value of the parameters is

θ = {(0, 0, 0, 0, ρ)}^{T}

and

ρ \in {- 0.1, 0, 0.15}

. We pay special attention to the problem of the existence of some outliers in the sample, generating

5 %

of the samples with

\tilde{θ} = {(1, 3, - 2, - 1, \tilde{ρ})}^{T}

and

\tilde{ρ} \in {- 0.15, 0.1, 0.2}

, respectively. Notice that, although the case

ρ = 0

has been considered; this case is less important taking into account the method of the theoretical model under consideration, and having the case of independent observations, the composite likelihood theory is useless. Results are presented in Table 1 and Table 2. Two points deserve our attention. The first one is that, as expected, RMSEs for contaminated data are always greater than RMSEs for pure data and that the RMSEs decrease when the sample size n increases. The second is that, while in pure data, RMSEs are greater for big values of

β

, when working with contaminated data, the CMDPDE with medium-low values of

β

(

β \in {0.1, 0.2, 0.3}

) present the best behavior in terms of efficiency. These statements are also true for larger levels of contamination, noting that, when larger percentages are considered, larger values of

β

are also considerable in terms of efficiency (see Table 3, Table 4 and Table 5 for contamination equal to

10 %

,

15 %

and

20 %

, respectively). Considering the mean absolute error (MAE) for the evaluation of the accuracy, we obtain similar results (Table 6).

For a nominal size

α = 0.05

, with the model under the null hypothesis given in (29), the estimated significance levels for different Wald-type test statistics are given by:

{\hat{α}}_{n}^{(β)} (ρ_{0}) = \hat{Pr} (W_{n}^{β} > χ_{1, 0.05}^{2} | H_{0}) = \frac{\sum_{i = 1}^{R} I (W_{n, i}^{β}) > χ_{1, 0.05}^{2} | ρ_{0})}{R},

with

I (S)

being the indicator function (with a value of one if S is true and zero otherwise). Empirical levels with the same previous parameter values are presented in Table 7 (pure data) and Table 8 (

5 %

of outliers). While medium-high values of

β

are not recommended at all, CMLE is generally the best choice when working with pure data. However, the lack of robustness of the CMLE test is impressive, as can be seen in Table 8. The effect of contamination in medium-low values of

β

is much lighter, while for medium-high values of

β

, it can return to being deceptively beneficial.

For finite sample sizes and nominal size

α = 0.05

, the simulated powers are obtained under

H_{1}

in (29), when

ρ \in {- 0.1, 0, 0.1}

,

\tilde{ρ} = 0.2

and

ρ_{0} = 0.15

(Table 9 and Table 10). The (simulated) power for different composite Wald-type test statistics is obtained by:

β_{n}^{(β)} (ρ_{0}, ρ) = Pr (W_{n}^{β} > χ_{1, 0.05}^{2} | H_{1}) and {\hat{β}}_{n}^{(λ)} (ρ_{0}, ρ) = \frac{\sum_{i = 1}^{R} I (W_{n, i}^{β} > χ_{1, 0.05}^{2} | ρ_{0}, ρ)}{R} .

As expected, when we get closer to the null hypothesis and when decreasing the sample sizes, the power decreases. With pure data, the best behavior is obtained with low values of

β

, and with this level of contamination (

5 %

), the best results are obtained for medium values of

β

.

4. Conclusions

The likelihood function is the basis of the maximum likelihood method in estimation theory, and it also plays a key role in the development of log-likelihood ratio tests. However, it is not so tractable in many cases, in practice. Maximum likelihood estimators are based on the likelihood function, and they can be easily obtained; however, there are cases where they do not exist or they cannot be obtained. In such a case, composite likelihood methods constitute an appealing methodology in the area of estimation and testing of hypotheses. On the other hand, the distance or divergence based on methods of estimation and testing have increasingly become fundamental tools in the field of mathematical statistics. The work in [13] is the first, to the best of our knowledge, to link the notion of composite likelihood with divergence based on methods for testing statistical hypotheses.

In this paper, MDPDE are introduced, and they are exploited to develop Wald-type test statistics for testing simple or composite null hypotheses, in a composite likelihood framework. The validity of the proposed procedures is investigated by means of simulations. The simulation results point out the robustness of the proposed information theoretic procedures in estimation and testing, in the composite likelihood context. There are several areas where the notions of divergence and composite likelihood are crucial, including spatial statistics and time series analysis. These are areas of interest, and they will be explored elsewhere.

Acknowledgments

We would like to thank the referees for their helpful comments and suggestions. Their comments have improved the paper. This research is supported by Grant MTM2015-67057-P, from Ministerio de Economia y Competitividad (Spain).

Author Contributions

All authors conceived and designed the study, conducted the numerical simulation and wrote the paper. All authors read and approved the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MLE	Maximum likelihood estimator
CMLE	Composite maximum likelihood estimator
DPD	Density power divergence
MDPDE	Minimum density power divergence estimator
CMDPDE	Composite minimum density power divergence estimator
RMSE	Root of mean square error
MAE	Mean absolute error

Appendix A. Proof of the Results

Appendix A.1. Proof of Theorem 2

The result follows in a straightforward manner because of the asymptotic normality of

{\hat{θ}}_{c}^{β},

\sqrt{n} ({\hat{θ}}_{c}^{β} - θ_{0}) \underset{n \to \infty}{\overset{L}{⟶}} N (0, {(H_{β} (θ_{0}))}^{- 1} J_{β} (θ_{0}) {(H_{β} (θ_{0}))}^{- 1}) .

Appendix A.2. Proof of Theorem 3

A first order Taylor expansion of

l (θ)

at

{\hat{θ}}_{c}^{β}

around

θ^{*}

gives:

l ({\hat{θ}}_{c}^{β}) - l (θ^{*}) = {(\frac{\partial l (θ)}{\partial θ})}_{θ = θ^{*}} ({\hat{θ}}_{c}^{β} - θ^{*}) + o_{p} (∥{\hat{θ}}_{c}^{β} - θ^{*}∥) .

Now, the result follows because the asymptotic distribution of

(l ({\hat{θ}}_{c}^{β}) - l (θ^{*}))

coincides with the asymptotic distribution of

\sqrt{n} {(\frac{\partial l (θ)}{\partial θ})}_{θ = θ^{*}} ({\hat{θ}}_{c}^{β} - θ^{*}) .

Appendix A.3. Proof of Theorem 4

We have:

\begin{matrix} g ({\hat{θ}}_{c}^{β}) & = & g (θ_{0}) + G {(θ_{0})}^{T} ({\hat{θ}}_{c}^{β} - θ_{0}) + o_{p} (∥{\hat{θ}}_{c}^{β} - θ_{0}∥) \\ = & G {(θ_{0})}^{T} ({\hat{θ}}_{c}^{β} - θ_{0}) + o_{p} (∥{\hat{θ}}_{c}^{β} - θ_{0}∥), \end{matrix}

because

g (θ_{0}) = 0_{r} .

Therefore:

\sqrt{n} g ({\hat{θ}}_{c}^{β}) \underset{n ⟶ \infty}{\overset{L}{⟶}} N (0, G_{β} {(θ_{0})}^{T} {(H_{β} (θ_{0}))}^{- 1} J_{β} (θ_{0}) {(H_{β} (θ_{0}))}^{- 1} G_{β} (θ_{0}))

because:

\sqrt{n} ({\hat{θ}}_{c}^{β} - θ_{0}) \underset{n ⟶ \infty}{\overset{L}{⟶}} N (0, {(H_{β} (θ_{0}))}^{- 1} J_{β} (θ_{0}) {(H_{β} (θ_{0}))}^{- 1}) .

Now:

W_{n, β} = n g {({\hat{θ}}_{β})}^{T} {[G {(θ_{0})}^{T} {(H_{β} (θ_{0}))}^{- 1} J_{β} (θ_{0}) {(H_{β} (θ_{0}))}^{- 1} G (θ_{0})]}^{- 1} g ({\hat{θ}}_{β}) \underset{n ⟶ \infty}{\overset{L}{⟶}} χ_{r}^{2} .

Appendix A.4. CMDPE for the Numerical Example

The estimator

{\hat{θ}}_{c}^{β}

is obtained by maximizing Expression (6) with respect to

θ .

Firstly, we are going to get:

\begin{matrix} \int_{R^{4}} \frac{\partial CL {(θ, y)}^{1 + β}}{\partial θ} d y & = & \frac{\partial}{\partial θ} \int_{R^{4}} CL {(θ, y)}^{1 + β} d y \\ = & \frac{\partial}{\partial θ} \int_{R^{4}} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1}, y_{2})}^{β + 1} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3}, y_{4})}^{β + 1} d y_{1} d y_{2} d y_{3} d y_{4} \\ = & \frac{\partial}{\partial θ} (\int_{R^{2}} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1}, y_{2})}^{β + 1} d y_{1} d y_{2} \int_{R^{2}} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3}, y_{4})}^{β + 1} d y_{3} d y_{4}) . \end{matrix}

Based on [14] (p. 32):

\int_{R^{2}} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1}, y_{2})}^{β + 1} d y_{1} d y_{2} = \int_{R^{2}} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3}, y_{4})}^{β + 1} d y_{3} d y_{4} = \frac{{(1 - ρ^{2})}^{- \frac{β}{2}}}{β + 1} {(2 π)}^{- β} .

Then:

\int_{R^{4}} \frac{\partial CL {(θ, y)}^{1 + β}}{\partial θ} d y = \frac{\partial}{\partial θ} \int_{R^{4}} CL {(θ, y)}^{1 + β} d y = \frac{\partial}{\partial θ} \frac{{(1 - ρ^{2})}^{- β}}{{(β + 1)}^{2}} {(2 π)}^{- 2 β}

and:

\frac{\partial}{\partial μ_{i}} \frac{{(1 - ρ^{2})}^{- β}}{{(β + 1)}^{2}} {(2 π)}^{- 2 β} = 0, i = 1, 2, 3, 4,

while:

\begin{matrix} \frac{\partial}{\partial ρ} \frac{{(1 - ρ^{2})}^{- β}}{{(β + 1)}^{2}} {(2 π)}^{- 2 β} & = & \frac{β {(2 π)}^{- 2 β}}{{(β + 1)}^{2}} \frac{2 ρ}{{(1 - ρ^{2})}^{β + 1}} . \end{matrix}

Now, we are going to get:

\frac{1}{n β} \sum_{i = 1}^{n} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial θ}

in order to obtain the CMDPDE,

{\hat{θ}}_{c}^{β}

, by maximizing (6) with respect to

θ .

We have,

CL {(θ, y)}^{β} = f_{12} {(μ_{1}, μ_{2}, ρ, y_{1}, y_{2})}^{β} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3}, y_{4})}^{β} .

Therefore,

\frac{\partial CL {(θ, y_{i})}^{β}}{\partial μ_{1}} = β f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β - 1} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{1 i} - μ_{1}) + 2 ρ (y_{2 i} - μ_{2})]\} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β}

and the expression:

\frac{1}{n β} \sum_{i = 1}^{n} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial μ_{1}} = 0

leads to the estimator of

μ_{1}

, given by:

\frac{1}{n} \sum_{i = 1}^{n} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β - 1} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{1 i} - μ_{1}) + 2 ρ (y_{2 i} - μ_{2})]\} = 0 .

(A1)

In a similar way:

\frac{\partial CL {(θ, y_{i})}^{β}}{\partial μ_{2}} = β f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β - 1} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{2 i} - μ_{2}) + 2 ρ (y_{1 i} - μ_{1})]\} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β},

\frac{\partial CL {(θ, y_{i})}^{β}}{\partial μ_{3}} = β f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{3 i} - μ_{3}) + 2 ρ (y_{4 i} - μ_{4})]\} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β - 1}

and:

\frac{\partial CL {(θ, y_{i})}^{β}}{\partial μ_{4}} = β f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{4 i} - μ_{4}) + 2 ρ (y_{3 i} - μ_{3})]\} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β - 1} .

Therefore, the equations:

\frac{1}{n β} \sum_{i = 1}^{n} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial μ_{2}} = 0, \frac{1}{n β} \sum_{i = 1}^{n} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial μ_{3}} = 0 and \frac{1}{n β} \sum_{i = 1}^{n} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial μ_{4}} = 0

lead to the estimators of

μ_{2}, μ_{3}

and

μ_{4}

, which should be read as follows:

\frac{1}{n} \sum_{i = 1}^{n} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β - 1} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{2 i} - μ_{2}) + 2 ρ (y_{1 i} - μ_{1})]\} = 0,

(A2)

\frac{1}{n} \sum_{i = 1}^{n} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β - 1} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{3 i} - μ_{3}) + 2 ρ (y_{4 i} - μ 4)]\} = 0

(A3)

and:

\frac{1}{n} \sum_{i = 1}^{n} f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \{- \frac{1}{2 (1 - ρ^{2})} [- 2 (y_{4 i} - μ_{4}) + 2 ρ (y_{3 i} - μ_{3})]\} = 0 .

(A4)

Now, it is necessary to get:

\begin{matrix} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial ρ} & = & \frac{\partial f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β}}{\partial ρ} \\ = & β f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β - 1} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \frac{\partial f_{12} (μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}{\partial ρ} \\ + β f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β - 1} \frac{\partial f_{34} (μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}{\partial ρ} . \end{matrix}

However,

\frac{\partial f_{12} (μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}{\partial ρ}

is given by:

\begin{matrix} \frac{1}{2 π} \frac{(- 1)}{(1 - ρ^{2})} \frac{(- 2 ρ)}{2 {(1 - ρ^{2})}^{\frac{1}{2}}} exp \{\frac{(- 1)}{2 (1 - ρ^{2})} [{(y_{1 i} - μ_{1})}^{2} - 2 ρ (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2}) + {(y_{2 i} - μ_{2})}^{2}]\} \\ + \frac{1}{2 π {(1 - ρ^{2})}^{\frac{1}{2}}} exp \{\frac{(- 1)}{2 (1 - ρ^{2})} [{(y_{1 i} - μ_{1})}^{2} - 2 ρ (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2}) + {(y_{2 i} - μ_{2})}^{2}]\} \\ [\frac{- ρ}{{(1 - ρ^{2})}^{2}} ({(y_{1 i} - μ_{1})}^{2} - 2 ρ (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2}) + {(y_{2 i} - μ_{2})}^{2}) + \frac{1}{(1 - ρ^{2})} (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2})] \\ = & \frac{ρ}{1 - ρ^{2}} f_{12} (μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i}) + f_{12} (μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i}) \\ [\frac{- ρ}{{(1 - ρ^{2})}^{2}} ({(y_{1 i} - μ_{1})}^{2} - 2 ρ (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2}) + {(y_{2 i} - μ_{2})}^{2}) + \frac{1}{(1 - ρ^{2})} (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2})] \\ = & f_{12} (μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i}) \frac{ρ}{1 - ρ^{2}} [1 - \frac{1}{1 - ρ^{2}} ({(y_{1 i} - μ_{1})}^{2} - 2 ρ (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2}) + {(y_{2 i} - μ_{2})}^{2}) \\ + \frac{1}{ρ} (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2})] . \end{matrix}

In a similar way,

\frac{\partial f_{34} (μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}{\partial ρ}

is given by:

\begin{matrix} f_{34} (μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i}) \frac{ρ}{1 - ρ^{2}} [1 - \frac{1}{1 - ρ^{2}} ({(y_{3 i} - μ_{3})}^{2} - 2 ρ (y_{3 i} - μ_{3}) (y_{4 i} - μ_{4}) + {(y_{4 i} - μ_{4})}^{2}) \\ + \frac{1}{ρ} (y_{3 i} - μ_{3}) (y_{4 i} - μ_{4})] . \end{matrix}

Therefore,

\begin{matrix} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial ρ} & = & \frac{ρ}{1 - ρ^{2}} β f_{12} {(μ_{1}, μ_{2}, ρ, y_{1 i}, y_{2 i})}^{β} f_{34} {(μ_{3}, μ_{4}, ρ, y_{3 i}, y_{4 i})}^{β} \\ \{2 + \frac{1}{ρ} \{(y_{1 i} - μ_{1}) (y_{2 i} - μ_{2}) + (y_{3 i} - μ_{3}) (y_{4 i} - μ_{4})\} \\ - \frac{1}{1 - ρ^{2}} ({(y_{1 i} - μ_{1})}^{2} - 2 ρ (y_{1 i} - μ_{1}) (y_{2 i} - μ_{2}) + {(y_{2 i} - μ_{2})}^{2}) \\ - \frac{1}{1 - ρ^{2}} ({(y_{3 i} - μ_{3})}^{2} - 2 ρ (y_{3 i} - μ_{3}) (y_{4 i} - μ_{4}) + {(y_{4 i} - μ_{4})}^{2})\} . \end{matrix}

(A5)

Therefore, the equation in relation to

ρ

is given by:

\frac{1}{n β} \sum_{i = 1}^{n} \frac{\partial CL {(θ, y_{i})}^{β}}{\partial ρ} - \frac{1}{β + 1} \int_{R^{m}} \frac{\partial CL {(θ, y_{i})}^{β + 1}}{\partial ρ} dy = 0

being:

\int_{R^{m}} \frac{\partial CL {(θ, y_{i})}^{β + 1}}{\partial θ} dy = \frac{β {(2 π)}^{- 2 β}}{{(β + 1)}^{2}} \frac{2 ρ}{{(1 - ρ^{2})}^{β + 1}}

(A6)

and:

\frac{\partial CL {(θ, y_{i})}^{β}}{\partial ρ}

was given in (A5).

Finally,

{\hat{θ}}_{c}^{β} = {({\hat{μ}}_{1, c}^{β}, {\hat{μ}}_{2, c}^{β}, {\hat{μ}}_{3, c}^{β}, {\hat{μ}}_{4, c}^{β}, {\hat{ρ}}_{c}^{β})}^{T}

will be obtained as the solution of the system of equations given by (A1)–(A6).

Appendix A.5. Computation of Sensitivity and Variability Matrices in the Numerical Example

We want to compute:

\begin{matrix} H_{β} (θ) & = \int_{R^{m}} CL {(θ, y)}^{β + 1} u {(θ, y)}^{T} u (θ, y) d y \\ J_{β} (θ) & = \int_{R^{m}} CL {(θ, y)}^{2 β + 1} u {(θ, y)}^{T} u (θ, y) d y \\ - \int_{R^{m}} CL {(θ, y)}^{β + 1} u (θ, y) d y \int_{R^{m}} {(u (θ, y))}^{T} CL {(θ, y)}^{β + 1} d y . \end{matrix}

First of all, we can see that:

\begin{matrix} CL {(θ, y)}^{β + 1} & = {(f_{A_{1}} (θ, y) f_{A_{2}} (θ, y))}^{β + 1} \\ = {(\frac{1}{2 π \sqrt{1 - ρ^{2}}} exp \{- \frac{1}{2 (1 - ρ^{2})} Q (y_{1}, y_{2})\} \cdot \frac{1}{2 π \sqrt{1 - ρ^{2}}} exp \{- \frac{1}{2 (1 - ρ^{2})} Q (y_{3}, y_{4})\})}^{β + 1} \\ = {(\frac{1}{{(2 π)}^{2} (1 - ρ^{2})})}^{β + 1} exp \{- \frac{β + 1}{2 (1 - ρ^{2})} [Q (y_{1}, y_{2}) + Q (y_{3}, y_{4})]\} \\ = \frac{1}{{(β + 1)}^{2}} {(\frac{1}{{(2 π)}^{2} (1 - ρ^{2})})}^{β} \frac{{(β + 1)}^{2}}{{(2 π)}^{2} (1 - ρ^{2})} exp \{- \frac{β + 1}{2 (1 - ρ^{2})} [Q (y_{1}, y_{2}) + Q (y_{3}, y_{4})]\} \\ = C_{β} \cdot {CL}_{β}^{*}, \end{matrix}

where

C_{β} = \frac{1}{{(β + 1)}^{2}} {(\frac{1}{{(2 π)}^{2} (1 - ρ^{2})})}^{β}

and

{CL}_{β}^{*} = {CL}_{β} {(θ, y)}^{*} \sim N (μ, Σ^{*})

, with

Σ^{*} = \frac{1}{β + 1} Σ

.

While

u (θ, y) = \frac{\partial log CL (θ, y)}{\partial θ}

, we will denote as

u {(θ, y)}^{*}

to

u {(θ, y)}^{*} = \frac{\partial log {CL}_{β}^{*}}{\partial θ}

. Then:

\begin{matrix} u (θ, y) & = \frac{\partial log CL (θ, y)}{\partial θ} = \frac{1}{β + 1} \frac{\partial log CL {(θ, y)}^{β + 1}}{\partial θ} = \frac{1}{β + 1} \frac{\partial log (C_{β} \cdot {CL}_{β}^{*})}{\partial θ} \\ = \frac{1}{β + 1} (\frac{\partial log C_{β}}{\partial θ} + \frac{\partial log {CL}_{β}^{*}}{\partial θ}) = \frac{1}{β + 1} (\frac{\partial log C_{β}}{\partial θ} + u {(θ, y)}^{*}) . \end{matrix}

(A7)

Further,

\begin{matrix} \int_{R^{m}} CL {(θ, y)}^{β + 1} u (θ, y) d y & = \int_{R^{m}} CL {(θ, y)}^{β + 1} \frac{\partial log CL (θ, y)}{\partial θ} d y = \int_{R^{m}} CL {(θ, y)}^{β} \frac{\partial CL (θ, y)}{\partial θ} d y \\ = \int_{R^{m}} \frac{1}{β + 1} \frac{\partial CL {(θ, y)}^{β + 1}}{\partial θ} d y = \frac{1}{β + 1} \frac{\partial}{\partial θ} \int_{R^{m}} CL {(θ, y)}^{β + 1} d y \\ = \frac{1}{β + 1} \frac{\partial C_{β}}{\partial θ} = {(0, 0, 0, 0, \frac{2 ρ β C_{β}}{(β + 1) (1 - ρ^{2})})}^{T} = ξ_{β} (θ) . \end{matrix}

(A8)

Now:

\begin{matrix} \int_{R^{4}} {CL}^{β + 1} u {(θ, y)}^{T} u (θ, y) d y \\ = \int_{R^{4}} (C_{β} \cdot {CL}_{β}^{*}) \frac{1}{{(β + 1)}^{2}} {(\frac{\partial log C_{β}}{\partial θ} + u {(θ, y)}^{*})}^{T} (\frac{\partial log C_{β}}{\partial θ} + u {(θ, y)}^{*}) d y \\ = \frac{C_{β}}{{(β + 1)}^{2}} \int_{R^{4}} [{(\frac{\partial log C_{β}}{\partial θ})}^{T} (\frac{\partial log C_{β}}{\partial θ}) {CL}_{β}^{*} \\ + {CL}_{β}^{*} {(u {(θ, y)}^{*})}^{T} \frac{\partial log C_{β}}{\partial θ} + {CL}_{β}^{*} {(\frac{\partial log C_{β}}{\partial θ})}^{T} u {(θ, y)}^{*} + {CL}_{β}^{*} {(u {(θ, y)}^{*})}^{T} u {(θ, y)}^{*}] d y \\ = \frac{C_{β}}{{(β + 1)}^{2}} [{(\frac{\partial log C_{β}}{\partial θ})}^{T} (\frac{\partial log C_{β}}{\partial θ}) \int_{R^{4}} {CL}_{β}^{*} d y + {(\int_{R^{4}} {CL}_{β}^{*} u {(θ, y)}^{*} d y)}^{T} (\frac{\partial log C_{β}}{\partial θ}) \\ + {(\frac{\partial log C_{β}}{\partial θ})}^{T} \int_{R^{4}} {CL}_{β}^{*} u {(θ, y)}^{*} d y + \int_{R^{4}} {CL}_{β}^{*} {(u {(θ, y)}^{*})}^{T} u {(θ, y)}^{*} d y] \\ = \frac{C_{β}}{{(β + 1)}^{2}} [K^{T} K + {(\int_{R^{4}} {CL}_{β}^{*} u {(θ, y)}^{*} d y)}^{T} K + K^{T} \int_{R^{4}} {CL}_{β}^{*} u {(θ, y)}^{*} d y + \int_{R^{4}} {CL}_{β}^{*} {(u {(θ, y)}^{*})}^{T} u {(θ, y)}^{*} d y], \end{matrix}

(A9)

where

K = \frac{\partial log C_{β}}{\partial θ} = (0, 0, 0, 0, \frac{2 ρ \cdot β}{1 - ρ^{2}})

. However:

\begin{matrix} \int_{R^{4}} {CL}_{β}^{*} u {(θ, y)}^{*} d y & = \int_{R^{4}} (\frac{1}{C_{β}} CL {(θ, y)}^{β + 1}) [(β + 1) u (θ, y) - \frac{\partial log C_{β}}{\partial θ}] d y \\ = \frac{β + 1}{C_{β}} [\int_{R^{4}} CL {(θ, y)}^{β + 1} u (θ, y) d y] - \frac{K}{C_{β}} \int_{R^{4}} CL {(θ, y)}^{β + 1} d y \\ = \frac{1}{C_{β}} \frac{\partial C_{β}}{\partial θ} - K = K - K = 0, \end{matrix}

and thus, (A9) can be expressed as:

\begin{matrix} \int_{R^{4}} CL {(θ, y)}^{β + 1} u {(θ, y)}^{T} u (θ, y) d y = \frac{C_{β}}{{(β + 1)}^{2}} [K^{T} K + \int_{R^{4}} {CL}_{β}^{*} {(u {(θ, y)}^{*})}^{T} u {(θ, y)}^{*} d y] . \end{matrix}

On the other hand, it is not difficult to prove that:

\int_{R^{4}} {CL}_{β}^{*} {(u {(θ, y)}^{*})}^{T} u {(θ, y)}^{*} d y = C \cdot \int_{R^{4}} CL (θ, y) u {(θ, y)}^{T} u (θ, y) d y = C \cdot H_{0} (θ),

where

C = d i a g (β + 1, β + 1, β + 1, β + 1, 1)

and ([13]):

H_{0} (θ) = (\begin{matrix} \frac{1}{1 - ρ^{2}} & \frac{- ρ}{1 - ρ^{2}} & 0 & 0 & 0 \\ \frac{- ρ}{1 - ρ^{2}} & \frac{1}{1 - ρ^{2}} & 0 & 0 & 0 \\ 0 & 0 & \frac{1}{1 - ρ^{2}} & \frac{- ρ}{1 - ρ^{2}} & 0 \\ 0 & 0 & \frac{- ρ}{1 - ρ^{2}} & \frac{1}{1 - ρ^{2}} & 0 \\ 0 & 0 & 0 & 0 & \frac{2 (ρ^{2} + 1)}{{(1 - ρ^{2})}^{2}} \end{matrix}) .

(A10)

Therefore,

H_{β} (θ) = \frac{C_{β}}{{(β + 1)}^{2}} [C \cdot H_{0} (θ) + K^{T} K],

that is:

H_{β} (θ) = \frac{C_{β}}{(β + 1) (1 - ρ^{2})} (\begin{matrix} 1 & - ρ & 0 & 0 & 0 \\ - ρ & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & - ρ & 0 \\ 0 & 0 & - ρ & 1 & 0 \\ 0 & 0 & 0 & 0 & 2 \frac{(ρ^{2} + 1) + 2 ρ^{2} β^{2}}{(1 - ρ^{2}) (1 + β)} \end{matrix}) .

(A11)

Note that, for

β = 0

, (A11) reduces to (A10).

On the other hand, the expression of the variability matrix

J_{β} (θ)

can be obtained from Expressions (27) and (A8) as:

J_{β} (θ) = H_{2 β} (θ) - ξ_{β} (θ) ξ_{β} {(θ)}^{T} .

(A12)

References

Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and efficient estimation by minimizing a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef]
Basu, A.; Mandal, A.; Martín, N.; Pardo, L. Testing statistical hypotheses based on the density power divergence. Ann. Inst. Stat. Math. 2013, 65, 319–348. [Google Scholar] [CrossRef]
Basu, A.; Mandal, A.; Martín, N.; Pardo, L. Robust tests for the equality of two normal means based on the density power divergence. Metrika 2015, 78, 611–634. [Google Scholar] [CrossRef]
Basu, A.; Mandal, A.; Martín, N.; Pardo, L. Generalized Wald-type tests based on minimum density power divergence estimators. Statistics 2016, 50, 1–26. [Google Scholar] [CrossRef]
Basu, A.; Ghosh, A.; Mandal, A.; Martín, N.; Pardo, L. A Wald-type test statistic for testing linear hypothesis in logistic regression models based on minimum density power divergence estimator. Electon. J. Stat. 2017, 11, 2741–2772. [Google Scholar] [CrossRef]
Ghosh, A.; Mandal, A.; Martín, N.; Pardo, L. Influence analysis of robust Wald-type tests. J. Multivar. Anal. 2016, 147, 102–126. [Google Scholar] [CrossRef]
Varin, C.; Reid, N.; Firth, D. An overview of composite likelihood methods. Stat. Sin. 2011, 21, 4–42. [Google Scholar]
Xu, X.; Reid, N. On the robustness of maximum composite estimate. J. Stat. Plan. Inference 2011, 141, 3047–3054. [Google Scholar] [CrossRef]
Joe, H.; Reid, N.; Somg, P.X.; Firth, D.; Varin, C. Composite Likelihood Methods. Report on the Workshop on Composite Likelihood. 2012. Available online: http://www.birs.ca/events/2012/5-day-workshops/12w5046 (accessed on 28 December 2017).
Lindsay, G. Composite likelihood methods. Contemp. Math. 1998, 80, 221–239. [Google Scholar]
Basu, A.; Shioya, H.; Park, C. Statistical Inference: The Minimum Distance Approach; Chapman & Hall/CRC: Boca Raton, FA, USA, 2011. [Google Scholar]
Maronna, R.A.; Martin, R.D.; Yohai, V.J. Time Series, in Robust Statistics: Theory and Methods; John Wiley & Sons, Ltd.: Chichester, UK, 2006. [Google Scholar]
Martín, N.; Pardo, L.; Zografos, K. On divergence tests for composite hypotheses under composite likelihood. In Statistical Papers; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: Boca Raton, FA, USA, 2006. [Google Scholar]

Table 1. RMSEs for pure data.

	$n = 100$			$n = 200$			$n = 300$
	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$
$β = 0$	0.0958	0.0950	0.0948	0.0683	0.0668	0.0666	0.0553	0.0552	0.0551
$β = 0.1$	0.0972	0.0961	0.0966	0.0693	0.0676	0.0677	0.0560	0.0559	0.0561
$β = 0.2$	0.1009	0.0991	0.1007	0.0718	0.0697	0.0704	0.0581	0.0575	0.0585
$β = 0.3$	0.1061	0.1034	0.1062	0.0754	0.0727	0.0742	0.0612	0.0599	0.0619
$β = 0.4$	0.1123	0.1087	0.1127	0.0797	0.0762	0.0787	0.0649	0.0628	0.0659
$β = 0.5$	0.1195	0.1147	0.1200	0.0845	0.0803	0.0837	0.0691	0.0661	0.0702
$β = 0.6$	0.1274	0.1215	0.1280	0.0898	0.0848	0.0892	0.0737	0.0697	0.0748
$β = 0.7$	0.1361	0.1291	0.1369	0.0955	0.0897	0.0952	0.0786	0.0736	0.0797
$β = 0.8$	0.1456	0.1374	0.1467	0.1015	0.0905	0.1016	0.0839	0.0778	0.0849

Table 2. RMSEs for contaminated data (

5 %

).

Table 2. RMSEs for contaminated data (

5 %

).

	$n = 100$			$n = 200$			$n = 300$
	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$
$β = 0$	0.1371	0.1336	0.1287	0.1210	0.1167	0.1113	0.1144	0.1098	0.1047
$β = 0.1$	0.1105	0.1104	0.1081	0.0875	0.0874	0.0843	0.0778	0.0786	0.0748
$β = 0.2$	0.1061	0.1053	0.1047	0.0783	0.0777	0.0759	0.0660	0.0669	0.0643
$β = 0.3$	0.1091	0.1072	0.1083	0.0783	0.0766	0.0761	0.0646	0.0645	0.0635
$β = 0.4$	0.1147	0.1118	0.1146	0.0814	0.0788	0.0798	0.0668	0.0657	0.0665
$β = 0.5$	0.1215	0.1176	0.1220	0.0858	0.0823	0.0848	0.0703	0.0683	0.0709
$β = 0.6$	0.1292	0.1242	0.1302	0.0907	0.0864	0.0905	0.0744	0.0716	0.0758
$β = 0.7$	0.1375	0.1315	0.1391	0.0961	0.0911	0.0966	0.0790	0.0753	0.0810
$β = 0.8$	0.1465	0.1396	0.1486	0.1018	0.0962	0.1031	0.0838	0.0794	0.0863

Table 3. RMSEs for contaminated data (

10 %

).

Table 3. RMSEs for contaminated data (

10 %

).

	$n = 100$			$n = 200$			$n = 300$
	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$
$β = 0$	0.2107	0.2052	0.2000	0.2003	0.1944	0.1884	0.1968	0.1911	0.1844
$β = 0.1$	0.1500	0.1472	0.1436	0.1324	0.1305	0.1264	0.1259	0.1250	0.1204
$β = 0.2$	0.1238	0.1229	0.1192	0.0991	0.0987	0.0951	0.0881	0.0898	0.0858
$β = 0.3$	0.1173	0.1170	0.1139	0.0882	0.0871	0.0846	0.0735	0.0754	0.0726
$β = 0.4$	0.1189	0.1187	0.1170	0.0872	0.0849	0.0845	0.0705	0.0714	0.0706
$β = 0.5$	0.1237	0.1234	0.1234	0.0901	0.0868	0.0884	0.0721	0.0718	0.0734
$β = 0.6$	0.1301	0.1296	0.1311	0.0944	0.0903	0.0938	0.0753	0.0742	0.0779
$β = 0.7$	0.1375	0.1367	0.1396	0.0995	0.0947	0.1000	0.0793	0.0776	0.0831
$β = 0.8$	0.1467	0.1446	0.1488	0.1050	0.0996	0.1064	0.0837	0.0814	0.0884

Table 4. RMSEs for contaminated data (

15 %

).

Table 4. RMSEs for contaminated data (

15 %

).

	$n = 100$			$n = 200$			$n = 300$
	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$
$β = 0$	0.2912	0.2854	0.2788	0.2835	0.2770	0.2713	0.2814	0.2757	0.2687
$β = 0.1$	0.2036	0.1994	0.1951	0.1909	0.1874	0.1828	0.1871	0.185	0.1785
$β = 0.2$	0.1530	0.1497	0.1453	0.1325	0.1306	0.1252	0.1252	0.1256	0.1181
$β = 0.3$	0.1329	0.1295	0.1257	0.1049	0.1031	0.0976	0.0932	0.0945	0.0872
$β = 0.4$	0.1287	0.1249	0.1229	0.0957	0.0931	0.0893	0.0805	0.0815	0.0763
$β = 0.5$	0.1312	0.1272	0.1272	0.0949	0.0915	0.0902	0.0774	0.0777	0.0755
$β = 0.6$	0.1367	0.1323	0.1343	0.0977	0.0936	0.0947	0.0784	0.0781	0.0788
$β = 0.7$	0.1436	0.1389	0.1425	0.1019	0.0974	0.1005	0.0811	0.0804	0.0836
$β = 0.8$	0.1514	0.1465	0.1514	0.1070	0.1020	0.1069	0.0847	0.0837	0.0888

Table 5. RMSEs for contaminated data (

20 %

).

Table 5. RMSEs for contaminated data (

20 %

).

	$n = 100$			$n = 200$			$n = 300$
	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$
$β = 0$	0.3725	0.3680	0.3612	0.3684	0.3618	0.3554	0.3661	0.3610	0.3534
$β = 0.1$	0.2691	0.2657	0.2591	0.2625	0.2566	0.2506	0.2577	0.2547	0.2473
$β = 0.2$	0.1949	0.1921	0.1831	0.1819	0.1766	0.1683	0.1742	0.1723	0.1624
$β = 0.3$	0.1562	0.1537	0.1441	0.1345	0.1299	0.1204	0.1235	0.1222	0.1109
$β = 0.4$	0.1419	0.1391	0.1316	0.1126	0.1082	0.1003	0.0987	0.0971	0.0876
$β = 0.5$	0.1397	0.1366	0.1323	0.1050	0.1005	0.0962	0.0890	0.0867	0.0812
$β = 0.6$	0.1430	0.1395	0.1383	0.1042	0.0996	0.0990	0.0866	0.0837	0.0828
$β = 0.7$	0.1488	0.1450	0.1463	0.1066	0.1018	0.1043	0.0877	0.0843	0.0873
$β = 0.8$	0.1560	0.1518	0.1552	0.1106	0.1056	0.1105	0.0905	0.0866	0.0927

Table 6. MAEs for pure and contaminated data (

5 %

,

10 %

,

15 %

and

20 %

),

n = 100

.

Table 6. MAEs for pure and contaminated data (

5 %

,

10 %

,

15 %

and

20 %

),

n = 100

.

	Pure data		$5 %$		$10 %$		$15 %$		$20 %$
	$ρ = - 0.1$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0.15$
$β = 0$	0.076	0.076	0.190	0.179	0.371	0.342	0.626	0.574	0.954	0.877
$β = 0.1$	0.077	0.077	0.167	0.163	0.289	0.277	0.464	0.437	0.697	0.652
$β = 0.2$	0.081	0.080	0.165	0.163	0.263	0.257	0.388	0.372	0.551	0.520
$β = 0.3$	0.085	0.085	0.172	0.170	0.264	0.260	0.370	0.359	0.495	0.473
$β = 0.4$	0.090	0.090	0.181	0.180	0.275	0.272	0.377	0.370	0.489	0.474
$β = 0.5$	0.095	0.095	0.192	0.192	0.290	0.289	0.394	0.391	0.504	0.496
$β = 0.6$	0.101	0.102	0.204	0.204	0.308	0.308	0.416	0.416	0.528	0.527
$β = 0.7$	0.108	0.109	0.218	0.218	0.328	0.329	0.441	0.444	0.558	0.561
$β = 0.8$	0.115	0.116	0.232	0.233	0.349	0.351	0.468	0.474	0.590	0.599

Table 7. Levels for pure data.

	$n = 100$			$n = 200$			$n = 300$
	$ρ_{0} = - 0.1$	$ρ_{0} = 0$	$ρ_{0} = 0.15$	$ρ_{0} = - 0.1$	$ρ_{0} = 0$	$ρ_{0} = 0.15$	$ρ_{0} = - 0.1$	$ρ_{0} = 0$	$ρ_{0} = 0.15$
$β = 0$	0.067	0.059	0.070	0.068	0.046	0.062	0.072	0.045	0.075
$β = 0.1$	0.067	0.060	0.072	0.062	0.046	0.070	0.085	0.045	0.079
$β = 0.2$	0.072	0.061	0.084	0.069	0.051	0.084	0.097	0.049	0.102
$β = 0.3$	0.081	0.062	0.093	0.084	0.053	0.100	0.112	0.051	0.121
$β = 0.4$	0.094	0.069	0.099	0.103	0.055	0.111	0.127	0.055	0.142
$β = 0.5$	0.105	0.071	0.111	0.118	0.056	0.122	0.149	0.051	0.155
$β = 0.6$	0.122	0.083	0.129	0.131	0.062	0.136	0.167	0.051	0.165
$β = 0.7$	0.135	0.088	0.141	0.139	0.063	0.146	0.181	0.055	0.177
$β = 0.8$	0.153	0.099	0.158	0.151	0.071	0.156	0.198	0.056	0.179

Table 8. Levels for contaminated data (

5 %

).

Table 8. Levels for contaminated data (

5 %

).

	$n = 100$			$n = 200$			$n = 300$
	$ρ_{0} = - 0.1$	$ρ_{0} = 0$	$ρ_{0} = 0.15$	$ρ_{0} = - 0.1$	$ρ_{0} = 0$	$ρ_{0} = 0.15$	$ρ_{0} = - 0.1$	$ρ_{0} = 0$	$ρ_{0} = 0.15$
$β = 0$	0.357	0.223	0.081	0.638	0.429	0.155	0.788	0.623	0.24 0
$β = 0.1$	0.121	0.113	0.056	0.207	0.191	0.077	0.287	0.284	0.100
$β = 0.2$	0.065	0.074	0.048	0.066	0.099	0.049	0.086	0.129	0.059
$β = 0.3$	0.057	0.067	0.071	0.057	0.066	0.059	0.065	0.077	0.073
$β = 0.4$	0.075	0.066	0.087	0.067	0.058	0.081	0.079	0.060	0.095
$β = 0.5$	0.090	0.062	0.107	0.080	0.061	0.110	0.105	0.051	0.128
$β = 0.6$	0.096	0.063	0.126	0.095	0.063	0.131	0.117	0.049	0.151
$β = 0.7$	0.109	0.073	0.137	0.101	0.061	0.141	0.127	0.047	0.159
$β = 0.8$	0.125	0.083	0.147	0.109	0.061	0.149	0.141	0.049	0.171

Table 9. Powers for pure data,

ρ_{0} = 0.15

.

Table 9. Powers for pure data,

ρ_{0} = 0.15

.

	$n = 100$			$n = 200$			$n = 300$
	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$
$β = 0$	0.945	0.603	0.141	1	0.871	0.180	1	0.962	0.265
$β = 0.1$	0.954	0.588	0.157	1	0.863	0.207	1	0.96	0.299
$β = 0.2$	0.952	0.557	0.158	1	0.825	0.213	1	0.944	0.315
$β = 0.3$	0.941	0.510	0.153	0.999	0.783	0.213	1	0.913	0.313
$β = 0.4$	0.925	0.465	0.154	0.999	0.734	0.210	1	0.885	0.301
$β = 0.5$	0.904	0.424	0.159	0.996	0.677	0.202	1	0.845	0.289
$β = 0.6$	0.873	0.395	0.153	0.990	0.618	0.197	0.999	0.789	0.277
$β = 0.7$	0.830	0.361	0.153	0.985	0.555	0.183	0.999	0.733	0.261
$β = 0.8$	0.789	0.322	0.161	0.974	0.499	0.179	0.997	0.678	0.246

Table 10. Powers for contaminated data (

5 %

),

ρ_{0} = 0.15

.

Table 10. Powers for contaminated data (

5 %

),

ρ_{0} = 0.15

.

	$n = 100$			$n = 200$			$n = 300$
	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$	$ρ = - 0.1$	$ρ = 0$	$ρ = 0.15$
$β = 0$	0.424	0.090	0.029	0.746	0.141	0.030	0.919	0.246	0.037
$β = 0.1$	0.716	0.222	0.041	0.954	0.397	0.029	0.994	0.569	0.037
$β = 0.2$	0.838	0.333	0.071	0.989	0.555	0.075	0.999	0.744	0.096
$β = 0.3$	0.881	0.383	0.105	0.993	0.633	0.121	0.999	0.803	0.161
$β = 0.4$	0.879	0.393	0.129	0.993	0.642	0.150	0.999	0.809	0.213
$β = 0.5$	0.865	0.381	0.135	0.992	0.621	0.168	0.999	0.797	0.241
$β = 0.6$	0.836	0.357	0.149	0.984	0.583	0.174	0.998	0.769	0.252
$β = 0.7$	0.808	0.332	0.146	0.980	0.531	0.173	0.997	0.713	0.256
$β = 0.8$	0.773	0.309	0.152	0.961	0.487	0.173	0.995	0.657	0.243

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Castilla, E.; Martín, N.; Pardo, L.; Zografos, K. Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator. Entropy 2018, 20, 18. https://doi.org/10.3390/e20010018

AMA Style

Castilla E, Martín N, Pardo L, Zografos K. Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator. Entropy. 2018; 20(1):18. https://doi.org/10.3390/e20010018

Chicago/Turabian Style

Castilla, Elena, Nirian Martín, Leandro Pardo, and Konstantinos Zografos. 2018. "Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator" Entropy 20, no. 1: 18. https://doi.org/10.3390/e20010018

APA Style

Castilla, E., Martín, N., Pardo, L., & Zografos, K. (2018). Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator. Entropy, 20(1), 18. https://doi.org/10.3390/e20010018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Composite Likelihood Methods Based on Minimum Density Power Divergence Estimator

Abstract

1. Introduction

2. Composite Minimum Density Power Divergence Estimator

2.1. Asymptotic Distribution of the Composite Minimum Density Power Divergence Estimator

2.2. Wald-Type Tests Statistics Based on the Composite Minimum Power Divergence Estimator

3. Numerical Example

Simulation Study

4. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

Abbreviations

Appendix A. Proof of the Results

Appendix A.1. Proof of Theorem 2

Appendix A.2. Proof of Theorem 3

Appendix A.3. Proof of Theorem 4

Appendix A.4. CMDPE for the Numerical Example

Appendix A.5. Computation of Sensitivity and Variability Matrices in the Numerical Example

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI