Minimum Penalized ϕ-Divergence Estimation under Model Misspecification

Alba-Fernández, M. Virtudes; Jiménez-Gamero, M. Dolores; Ariza-López, F. Javier

doi:10.3390/e20050329

Open AccessArticle

Minimum Penalized ϕ-Divergence Estimation under Model Misspecification

by

M. Virtudes Alba-Fernández

^1,*

,

M. Dolores Jiménez-Gamero

²

and

F. Javier Ariza-López

³

¹

Departamento de Estadística e Investigación Operativa, Universidad de Jaén, 23071, Jaén, Spain

²

Departamento de Estadística e Investigación Operativa, Universidad de Sevilla, 41012, Sevilla, Spain

³

Departamento de Ingeniería Cartográfica, Geodésica y Fotogrametría, Universidad de Jaén, 23071, Jaén, Spain

^*

Author to whom correspondence should be addressed.

Entropy 2018, 20(5), 329; https://doi.org/10.3390/e20050329

Submission received: 8 March 2018 / Revised: 22 April 2018 / Accepted: 27 April 2018 / Published: 30 April 2018

(This article belongs to the Special Issue New Developments in Statistical Information Theory Based on Entropy and Divergence Measures)

Download Versions Notes

Abstract

:

This paper focuses on the consequences of assuming a wrong model for multinomial data when using minimum penalized

ϕ

-divergence, also known as minimum penalized disparity estimators, to estimate the model parameters. These estimators are shown to converge to a well-defined limit. An application of the results obtained shows that a parametric bootstrap consistently estimates the null distribution of a certain class of test statistics for model misspecification detection. An illustrative application to the accuracy assessment of the thematic quality in a global land cover map is included.

Keywords:

minimum penalized ϕ-divergence estimator; consistency; asymptotic normality; goodness-of-fit; bootstrap distribution estimator; thematic quality assessment

1. Introduction

In many practical settings, individuals are classified into a finite number of unique nonoverlapping categories, and the experimenter collects the number of observations falling in each of such categories. In statistics, that sort data is called multinomial data. Examples arise in many scientific disciplines: in economics, when dealing with the number of different types of industries observed in a geographical area; in biology, when counting the number of individuals belonging to one of k species (see, for example, Pardo [1], pp. 94–95); in sports, when considering the number of injured players in soccer matches (see, for example, Pardo [1], p. 146); and many others.

When dealing with multinomial data, one often finds zero cell frequencies, even for large samples. Although many examples can be given, we will center on the following one, since two related data sets will be analyzed in Section 4. Zero cell frequencies are usually observed when the quality of the geographic information data is assessed, and specifically, when we pay attention to the thematic component of this quality. Roughly speaking, the thematic quality refers to the correctness of the qualitative aspect of an element (pixel, feature, etc.). To give an assessment of the thematic accuracy, a comparison is needed between the label considered as true of a feature and the label assigned to the same feature after a classification (among a number of labels previously stated). This way, each element/feature, which really belongs to a particular category, can be classified as belonging to the same category (correct assignment), or as belonging to another one (incorrect assignment). Given a sample of n elements belonging to a particular category, after collecting the number of elements correctly classified,

X_{1}

, and the number of incorrect classifications in a set of

k - 1

possible categories,

X_{i}

,

i = 2, \dots, k

, we obtain a multinomial vector

{(X_{1}, X_{2}, \dots, X_{k})}^{t}

, for which small or zero cell frequencies are often observed associated with the incorrect classifications,

X_{i}

,

i = 2, \dots, k

.

Motivated by this example in the geographic information data context, as well as many others, along this paper, it will be assumed that the available information can be summarized by means of a random vector

X = {(X_{1}, \dots, X_{k})}^{t}

having a k-cell multinomial distribution with parameters n and

π = {(π_{1}, \dots, π_{k})}^{t} \in Δ_{0 k} = {{(π_{1}, \dots, π_{k})}^{t} : π_{i} \geq 0, 1 \leq i \leq k, \sum_{i = 1}^{k} π_{i} = 1}

,

X \sim M_{k} (n; π)

in short. Notice that, if

π \in Δ_{0 k}

, then some components of

π

may equal 0, implying that some cell frequencies can be equal to zero, even for large samples. In many instances, it is assumed that

π

belongs to a parametric family

π \in P = {P (θ) = {(p_{1} (θ), \dots, p_{k} (θ))}^{t}, θ \in Θ} \subset Δ_{k} = {{(π_{1}, \dots, π_{k})}^{t} : π_{i} > 0, 1 \leq i \leq k, \sum_{i = 1}^{k} π_{i} = 1}

, where

Θ \subseteq R^{s}

,

k - s - 1 > 0

and

p_{1} (\cdot)

, …,

p_{k} (\cdot)

are known real functions.

When it is assumed that

π \in P

,

π

is usually estimated through

P (\hat{θ}) = {(p_{1} (\hat{θ}), \dots, p_{k} (\hat{θ}))}^{t}

for some estimator

\hat{θ}

of

θ

. A common choice for

\hat{θ}

is the maximum likelihood estimator (MLE), which is known to have good asymptotic properties. Basu and Sarkar [2] and Morales et al. [3] have shown that these properties are shared by a larger class of estimators: the minimum

ϕ

-divergence estimators (M

ϕ

E). This class includes MLEs as a particular case. However, as illustrated in Mandal et al. [4], the finite sample performance of these estimators can be improved by modifying the weight that each

ϕ

-divergence assigns to the empty cells. The resulting estimator is called the minimum penalized

ϕ

-divergence estimator (MP

ϕ

E). Moreover, Mandal et al. [4] have shown that such estimators have the same asymptotic properties as the M

ϕ

Es. Specifically, they are strongly consistent and, conveniently normalized, asymptotically normal. To derive these asymptotic properties, it is assumed that the probability model is correctly specified, that is to say, that we are sure about

π \in P

.

If the parametric model is not correctly specified, Jiménez-Gamero et al. [5] have shown that, under certain assumptions, the M

ϕ

Es still have a well defined limit, and, conveniently normalized, they are asymptotically normal. For the MLE, these results were known from those in [6]. Because, as argued before, the use of penalized

ϕ

-divergences may lead to better performance of the resulting estimators, the aim of this piece of research is to investigate the asymptotic properties of the MP

ϕ

Es under model misspecification. If the model considered is true, we obtain as a particular case the results in [4].

The usefulness of the results obtained is illustrated by applying them to the problem of testing goodness-of-fit to the parametric family

P

,

H_{0} : π \in P,

against the alternative

H_{1} : π \notin P,

using as a test statistic a penalized

ϕ_{1}

-divergence between a nonparametric estimator of

π

, the relative frequencies, and a parametric estimator of

π

, obtained by assuming that the null hypothesis is true,

P (\hat{θ})

,

\hat{θ}

being an MP

ϕ_{2}

E. Here,

ϕ_{1}

and

ϕ_{2}

may differ. The convenience of using this type of test statistics is justified in Mandal et al. [7]. Although these authors show that, under

H_{0}

, such test statistics are asymptotically distribution free, the asymptotic approximation to the null distribution of the test statistics in this class is rather poor. Some numerical examples illustrate this unsatisfactory behavior of the asymptotic approximation. By using the fact that the MP

ϕ

E always converges to a well-defined limit, whether the model in

H_{0}

is true or not, we prove that the bootstrap consistently estimates the null distribution of these test statistics. We then retake the previously cited numerical examples to exemplify the usefulness of the bootstrap approximation which, despite the demand for more computing time, is more accurate than that yielded by the asymptotic null distribution for small and moderate sample sizes.

The rest of the paper is organized as follows. Section 2 studies certain asymptotic properties of MP

ϕ_{2}

Es; specifically, conditions are given for the strong consistency and asymptotic normality. Section 3 uses such results to prove that a parametric bootstrap provides a consistent estimator to the null distribution of test statistics based on penalized

ϕ

-divergences for testing

H_{0}

. Section 4 displays an application of the results obtained in the context of a classification work in a cover land map.

Before ending this section we introduce some notation: all limits in this paper are taken when

n \to \infty

;

\overset{L}{\to}

denotes convergence in distribution;

\overset{P}{\to}

denotes convergence in probability;

\overset{a . s .}{\to}

denotes the almost sure convergence; let

{A_{n}}

be a sequence of random variables and let

ϵ \in R

, then

A_{n} = O_{P} (n^{- ϵ})

means that

n^{ϵ} A_{n}

is bounded in probability,

A_{n} = o_{P} (n^{- ϵ})

means that

n^{ϵ} A_{n} \overset{P}{\to} 0

, and

A_{n} = o (n^{- ϵ})

means that

n^{ϵ} A_{n} \overset{a . s .}{\to} 0

;

N_{k} (μ, Σ)

denotes the k-variate normal law with mean

μ

and variance matrix

Σ

; all vectors are column vectors; the superscript

^{t}

denotes transpose; if

x \in R^{k}

, with

x^{t} = (x_{1}, \dots, x_{k})

, then

D i a g (x)

is the

k \times k

diagonal matrix whose

(i, i)

entry is

x_{i}

,

1 \leq i \leq k

, and

Σ_{x} = D i a g (x) - x x^{t};

I_{k}

denotes the

k \times k

identity matrix; to simplify notation, all 0s appearing in the paper represent vectors of the appropriate dimension.

2. Some Asymptotic Properties of MP $ϕ$ Es

Let

X \sim M_{k} (n; π)

, with

π \in Δ_{0 k}

, and let

\hat{π} = {({\hat{π}}_{1}, {\hat{π}}_{2}, \dots, {\hat{π}}_{k})}^{t}

be the vector of relative frequencies,

{\hat{π}}_{i} = \frac{X_{i}}{n}, 1 \leq i \leq k .

(1)

Let

P

be a parametric model satisfying Assumption 1 below.

Assumption 1.

P = {P (θ) = {(p_{1} (θ), \dots, p_{k} (θ))}^{t}, θ \in Θ} \subset Δ_{k}

, where

Θ \subseteq R^{s}

,

k - s - 1 > 0

and

p_{1} (.)

, …,

p_{k} (.) : Θ ⟶ R

are known twice continuously differentiable in

i n t Θ

functions.

Let

ϕ : [0, \infty) \to R \cup {\infty}

be a continuous convex function. For arbitrary

Q = {(q_{1}, \dots, q_{k})}^{t} \in Δ_{0 k}

and

P = {(p_{1}, \dots, p_{k})}^{t} \in Δ_{k}

, the

ϕ

-divergence between Q and P is defined by (Csiszár [8])

D_{ϕ} (Q, P) = \sum_{i = 1}^{k} p_{i} ϕ (q_{i} / p_{i}) .

Note that

D_{ϕ} (Q, P) = \sum_{i / q_{i} > 0} p_{i} ϕ (q_{i} / p_{i}) + ϕ (0) \sum_{i / q_{i} = 0} p_{i} .

The penalized

ϕ

-divergence for the tuning parameter h between Q and P is defined from the above expression by replacing

ϕ (0)

with h as follows (see Mandal et al. [4]):

D_{ϕ, h} (Q, P) = \sum_{i / q_{i} > 0} p_{i} ϕ (q_{i} / p_{i}) + h \sum_{i / q_{i} = 0} p_{i} .

If

{\hat{θ}}_{ϕ, h} = \arg \min_{θ} D_{ϕ, h} (\hat{π}, P (θ)),

then

{\hat{θ}}_{ϕ, h}

is called the MP

ϕ

E of

θ

.

In order to study some of the properties of

{\hat{θ}}_{ϕ, h}

, we will assume that

ϕ

satisfies Assumption 2 below.

Assumption 2.

ϕ : [0, \infty) \to R

is a strictly convex function, twice continuously differentiable in

(0, \infty)

.

Assumption 2 is assumed when dealing with estimators based on minimum divergence, since it lets us take Taylor series expansions of

D_{ϕ} (\hat{π}, P (θ))

, which is useful to derive asymptotic properties of the M

ϕ

Es. For example, Section 3 of Lindsay [9] assumes that the function

ϕ

(he calls G what we call

ϕ

) is a thrice differentiable function (which is stronger than Assumption 2); Theorem 3 in Morales et al. [3] requires, among other conditions,

ϕ

to meet Assumption 2 to derive the consistency and asymptotic normality of M

ϕ

Es.

Assumption 2 is also assumed in Mandal et al. [4] (they call G what we call

ϕ

) to study the consistency and asymptotic normality of MP

ϕ

Es. Specifically, these authors show that, if

π \in P

and

θ_{0}

is the true parameter value, then, under suitable regularity conditions including Assumption 2, the MP

ϕ

E is consistent for

θ_{0}

, and

\sqrt{n} ({\hat{θ}}_{ϕ, h} - θ_{0})

is asymptotically normal with a mean of 0 and a variance matrix equal to the inverse of the information matrix.

Next we will only assume that

π \in Δ_{0 k}

, that is, the assumption that

π \in P

is dropped. In this context, we prove that the MP

ϕ

E is consistent for

θ_{0}

, where now

θ_{0}

is the parameter vector that minimizes

D_{ϕ, h} (π, P (θ))

, that is to say,

θ_{0} = \arg \min_{θ} D_{ϕ, h} (π, P (θ))

. Note that

θ_{0}

also depends on

ϕ

and h, so to be rigorous we should denote it by

θ_{0, ϕ, h}

, but to simplify notation we will simply denote it as

θ_{0}

. We also show that

\sqrt{n} ({\hat{θ}}_{ϕ, h} - θ_{0})

is asymptotically normal with a mean of 0. With this aim, we will also assume the following.

Assumption 3.

D_{ϕ, h} (π, P (θ))

has a unique minimum at

θ_{0} \in i n t Θ

.

Assumption 3 is assumed in papers on estimators based on minimum divergence estimation. For example, it is Assumption A3(b) in [6], which states, that it is the fundamental identification condition for quasi-maximum likelihood estimators to have a well-defined limit; and it is contained in Assumptions 7 and 9 in [10], required for minimum chi-square estimators to have a well-defined limit; it also coincides with Assumption 30 in [9], imposed for the same reason.

Let

θ_{0}

be as defined in Assumption 3. Then

P (θ_{0})

is the

(ϕ, h)

-projection of

π

on

P

. Section 3 in [11] shows that Assumption 3 holds for two-way tables when

P

is the uniform association model, so the

(ϕ, h)

-projection always exists for such model. Nevertheless, this projection may not exist, or may not be defined uniquely. See Example 2 in [12] for an instance where there is no unique minimum (because although

Θ

is that example is convex, the family

{P (θ), θ \in Θ}

is not convex, so the uniqueness of the projection is not guaranteed). Let

Δ_{k} (ϕ, P, h) = {π \in Δ_{0 k} such that Assumption 3 holds}

.

From now on, we will assume that the components of

π

are sorted so that

π_{1}, \dots, π_{m} > 0

, and

π_{m + 1} = \dots = π_{k} = 0

, for some

1 < m \leq k

, where, if

m = k

, then it is understood that all components of

π

are positive. We will write

π^{+} = {(π_{1}, \dots, π_{m})}^{t}

and

{\hat{π}}^{+} = {({\hat{π}}_{1}, \dots, {\hat{π}}_{m})}^{t}

. The next result shows the strong consistency and asymptotic normality of the MP

ϕ

E.

Theorem 1.

Let

P

be a parametric family satisfying Assumption 1. Let ϕ be a real function satisfying Assumption 2. Let

X \sim M_{k} (n; π)

with

π \in Δ_{k} (ϕ, P, h)

. Then

(a): ${\hat{θ}}_{ϕ, h} \overset{a . s .}{⟶} θ_{0}$ .
(b): $\sqrt{n} (\begin{matrix} {\hat{π}}^{+} - π^{+} \\ {\hat{θ}}_{ϕ, h} - θ_{0} \end{matrix}) \overset{L}{⟶} N_{m + s} (0, A Σ_{π^{+}} A^{t}),$ where $A^{t} = (I_{m}, G^{t})$ and G is defined in Equation (7). In particular,

$\sqrt{n} ({\hat{θ}}_{ϕ, h} - θ_{0}) \overset{L}{⟶} N_{s} (0, G Σ_{π^{+}} G^{t})$

(2)
(c): $\sqrt{n} (\begin{matrix} {\hat{π}}^{+} - π^{+} \\ P ({\hat{θ}}_{ϕ, h}) - P (θ_{0}) \end{matrix}) \overset{L}{⟶} N_{2 m} (0, B Σ_{π^{+}} B^{t})$ , where $B^{t} = (I_{m}, G^{t} D_{1} (P (θ_{0})))$ , with $D_{1} (P (θ))$ defined in Equation (8).

Remark 1.

Observe that, if

m = k

, then the penalization has no effect asymptotically; by contrast, if

m < k

, then the presence of the tuning parameter h influences the covariance matrix of the asymptotic law of

\sqrt{n} ({\hat{θ}}_{ϕ, h} - θ_{0})

and

\sqrt{n} (P ({\hat{θ}}_{ϕ, h}) - P (θ_{0}))

.

Remark 2.

If

π \in P

, we obtain as a particular case the results in Mandal et al. [4]. Our conditions are weaker than those in [4]. The reason is that they allow an infinite number of categories, while we are assuming that such a number is finite, k. Therefore, when the number of categories is finite, the assumptions in [4] for the consistency and asymptotic normality of the MPϕE can be weakened.

As a consequence of Theorem 1, the following corollary gives the asymptotic behavior of

D_{ϕ_{1}, h_{1}} (\hat{π}, P ({\hat{θ}}_{ϕ_{2}, h_{2}}))

, for arbitrary

ϕ_{1}

,

ϕ_{2}

, and

h_{1}

,

h_{2}

, that may or may not coincide. Part (a) of Corollary 1, which assumes that the model

P

is correctly specified, has been previously proven in [7]. It is included here for the sake of completeness. Part (b), which describes the limit in law under alternatives is, to the best of our knowledge, new.

Corollary 1.

Let

P

be a parametric family satisfying Assumption 1. Let

ϕ_{1}

and

ϕ_{2}

be two real functions satisfying Assumption 2. Let

X \sim M_{k} (n; π)

with

π \in Δ_{k} (ϕ, P, h)

.

(a): For $π \in P$ ,

$T = \frac{2 n}{ϕ_{1}^{″} (1)} {D_{ϕ_{1}, h_{1}} (\hat{π}, P ({\hat{θ}}_{ϕ_{2}, h_{2}})) - ϕ_{1} (1)} \overset{L}{⟶} χ_{k - s - 1}^{2} .$
(b): For $π \in Δ_{k} (ϕ_{2}, P, h_{2}) - P$ , let $θ_{0} = \arg \min_{θ} D_{ϕ_{2}, h_{2}} (π, P (θ))$ . Then

$W = \sqrt{n} {D_{ϕ_{1}, h_{1}} (\hat{π}, P ({\hat{θ}}_{ϕ_{2}, h_{2}})) - D_{ϕ_{1}, h_{1}} (π, P (θ_{0}))} \overset{L}{⟶} N (0, ϱ^{2})$

where $ϱ^{2} = a^{t} B Σ_{π} B^{t} a$ , with B, as defined in Theorem 1 with $ϕ = ϕ_{2}$ and $h = h_{2}$ ,

$a^{t} = (ϕ_{1}^{'} (\frac{π_{1}}{p_{1} (θ_{0})}), \dots, ϕ_{1}^{'} (\frac{π_{m}}{p_{m} (θ_{0})}), v_{1}, \dots, v_{m}, \underset{k - m times}{\underset{︸}{h_{1}, \dots, h_{1}}}),$

and $v_{i}$ , $1 \leq i \leq m$ , are as defined in Equation (5) with $ϕ = ϕ_{1}$ and $h = h_{1}$ .

Remark 3.

If

π \in P

, the asymptotic behavior of the statistic T does not depend either on

ϕ_{1}

,

ϕ_{2}

, or on

h_{1}

,

h_{2}

. In fact, the asymptotic law of T is the same as if non-penalized divergences were used.

Remark 4.

When

π \in Δ_{k} (ϕ_{2}, P, h_{2}) - P

, if

m = k

, then the asymptotic distribution of W does not depend on

h_{1}

,

h_{2}

; by contrast, if

m < k

, then the asymptotic distribution of W does depend on

h_{1}

and

h_{2}

.

Remark 5.

(Properties of the asymptotic test) As a consequence of Corollary 1(a), we have that for testing

H_{0}

vs.

H_{1}

, the test that rejects the null hypothesis when

T \geq χ_{k - s - 1, 1 - α}^{2}

is asymptotically correct, in the sense that

P_{0} (T \geq χ_{k - s - 1, 1 - α}^{2}) \to α

, where

χ_{k - s - 1, 1 - α}^{2}

stands for the

1 - α

percentile of the

χ_{k - s - 1}^{2}

distribution and

P_{0}

stands for the probability when the null hypothesis is true. From Corollary 1(b), it follows that such a test is consistent against fixed alternatives

π \in Δ_{k} (ϕ_{2}, P, h_{2}) - P

, in the sense that

P (T \geq χ_{k - s - 1, 1 - α}^{2}) \to 1

.

3. Application to Bootstrapping Goodness-Of-Fit Tests

As observed in Remark 5, the test that rejects

H_{0}

when

T \geq χ_{k - s - 1, 1 - α}^{2}

is asymptotically correct and consistent against fixed alternatives. Nevertheless, the

χ^{2}

approximation to the null distribution of the test statistic is rather poor. Next we illustrate this fact with three examples. The last one is motivated by a real data set application in Section 4. All computations have been performed using programs written in the R language [13].

Example 1.

Let

X \sim M_{3} (n; π)

, with

π \in P

so that

p_{1} (θ) = \frac{1}{3} - θ, p_{2} (θ) = \frac{2}{3} - θ, p_{3} (θ) = 2 θ, 0 < θ < 1 / 3 .

The problem of testing goodness-of-fit to this family is dealt with by considering as test statistic a penalized

ϕ_{1}

-divergence and an MP

ϕ_{2}

E, with

ϕ_{1}

and

ϕ_{2}

, two members of the power-divergence family, defined as follows:

P D_{λ} (x) = \frac{1}{λ (λ + 1)} (x^{(λ + 1)} - x - λ (x - 1)), λ \neq 0, - 1,

P D_{0} (x) = x \log (x) - x + 1

, for

λ = 0

, and

P D_{- 1} (x) = - \log (x) + x - 1

, for

λ = - 1

. We thank an anonymous referee for pointing out that the power divergence family is also known as the

α

-divergence family (see, for example, Section 4 of Amari [14]).

In order to evaluate the performance of the

χ^{2}

approximation to the null distribution of T, we carried out an extensive simulation experiment. As a previous part of the simulation experiment, we evaluated the possible effect of the tuning parameter

h_{2}

on the accuracy of the MP

ϕ_{2}

E. For this goal, we generated 10,000 samples of size 200 from the parametric family with

θ = 0.3333

, and calculated the MP

ϕ_{2}

E with

h_{2} = 0.5, 1, 2, 5, 10

and

ϕ_{2} = P D_{- 2}

, which correspond to the modified chi-square test statistic (see, for example, [1], p. 114). We calculated the root mean square deviation (RMSD) of the resulting estimations,

R M S D = \sqrt{\frac{\sum_{i = 1}^{10, 000} {({\hat{θ}}_{- 2, h_{2}} - θ)}^{2}}{10, 000}},

obtaining 0.00156, 0.00128, 0.00128, 0.00128, and 0.00128, respectively. According to these results, there are rather small differences in the performance of the MP

ϕ_{2}

E for the values of

h_{2}

considered. Because of this, we fixed

ϕ_{2} = P D_{- 2}

and

h_{2} = 0.5, 1, 2

.

Next, to study the goodness of the asymptotic approximation, we generated 10,000 samples of size

n = 100

from the parametric family with

θ = 0.3333

, and calculated the test statistic T with

h_{1} = h_{2} = 0.5

and

ϕ_{1} (x) = ϕ_{2} (x) = P D_{- 2} (x)

, as well as the associated p-values corresponding to the asymptotic null distribution. We then computed the fraction of these p-values, which are less than or equal to the nominal values

α = 0.05, 0.10

(top and below in tables). This experiment was repeated for

n = 150, 200

,

h_{1} = h_{2} = 1, 2

,

ϕ_{1} = P D_{1}

(which corresponds to the chi-square test statistic) and

ϕ_{1} = P D_{2}

. Table 1 shows the results obtained. We also considered the case

h_{1} \neq h_{2}

, obtaining quite close outcomes. Table 2 displays the results obtained for

n = 200

and

ϕ_{1} = ϕ_{2} = P D_{- 2}

. Looking at these tables, we conclude that the asymptotic null distribution does not provide an accurate estimation of the null distribution of T since the type I error probabilities are much greater than the nominal values, 0.05 and 0.10. Therefore, other approximations of the null distribution should be studied.

Example 2.

Let

X \sim M_{3} (n; π)

, with

π \in P

so that

p_{1} (θ) = 0.5 - 2 θ, p_{2} (θ) = 0.5 + θ, p_{3} (θ) = θ, 0 < θ < 1 / 4 .

We repeated the simulation schedule described in Example 1 for this law with

θ = 0.24

. Table 3 and Table 4 report the obtained results. In contrast to the results for Example 1, where the asymptotic approximation gives a rather liberal test, in this case the resulting test is very conservative. Therefore, we again conclude that the asymptotic null distribution does not provide an accurate estimation of the null distribution of T.

Example 3.

Let

X \sim M_{4} (n; π)

, with

π \in P

so that

p_{1} (θ) = θ^{2}, p_{2} (θ) = θ (1 - θ), p_{3} (θ) = θ (1 - θ), p_{4} (θ) = {(1 - θ)}^{2}, 0 < θ < 1 .

(3)

We repeated the simulation schedule described in Example 1 for this law with

θ = 0.8

. Table 5 and Table 6 report the results obtained. Looking at these tables, we see that the test based on asymptotic approximation is liberal, and conclude, as in the previous examples, that other approximations of the null distribution should be considered.

The reason for the unsatisfactory results in the three examples is that the asymptotic approximation requires unaffordably large sample sizes when some cells have extremely small probabilities, which provoke the presence of zero cell frequencies. To appreciate this fact, notice that Example 1 requires

n > 30, 000

to obtain expected cell frequencies greater than 10.

Motivated by these examples, the aim of this section is to study another way of approximating the null distribution of T, the bootstrap. The null bootstrap distribution of T is the conditional distribution of

T^{*} = \frac{2 n}{ϕ_{1}^{″} (1)} {D_{ϕ_{1}, h_{1}} ({\hat{π}}^{*}, P ({\hat{θ}}_{ϕ_{2}, h_{2}}^{*})) - ϕ_{1} (1)},

given

(X_{1}, \dots, X_{k}),

where

{\hat{π}}^{*}

is defined as

\hat{π}

with

(X_{1}, \dots, X_{k})

replaced by

(X_{1}^{*}, \dots, X_{k}^{*}) \sim M_{k} (n; P ({\hat{θ}}_{ϕ_{2}, h_{2}}))

, and

{\hat{θ}}_{ϕ_{2}, h_{2}}^{*} = \arg \min_{θ} D_{ϕ_{2}, h_{2}} ({\hat{π}}^{*}, P (θ))

.

Let

P_{*}

denote the bootstrap conditional probability law, given

(X_{1}, \dots, X_{k})

. The next theorem gives the weak limit of

T^{*}

.

Theorem 2.

Let

P

be a parametric family satisfying Assumption 1. Let

ϕ_{1}

and

ϕ_{2}

be two real functions satisfying Assumption 2. Let

X \sim M_{k} (n; π)

with

π \in Δ_{k} (ϕ, P, h)

. Then

\sup_{x} |P_{*} (T^{*} \leq x) - P (Y \leq x)| \overset{P}{⟶} 0

where

Y \sim χ_{k - s - 1}^{2}

.

Recall that, from Corollary 1(a), when

H_{0}

is true, the test statistic T converges in law to a

χ_{k - s - 1}^{2}

law. Thus, the result in Theorem 2 implies the consistency of the null bootstrap distribution of T as an estimator of the null distribution of T. It is important to remark that the result in Theorem 2 holds whether

H_{0}

is true or not, that is, the bootstrap properly estimates the null distribution, even if the available data does not obey the law in the null hypothesis. This is due to the fact that, under the assumed conditions, the MP

ϕ

E always converges to a well-defined limit.

Remark 6.

Properties of the Bootstrap Test. Similarly to Remark 5, as a consequence of Corollary 1(a) and Theorem 2, we have that, for testing

H_{0}

vs.

H_{1}

, the test that rejects the null hypothesis when

T \geq T_{1 - α}^{*}

is asymptotically correct, in the sense that

P_{0} (T \geq T_{1 - α}^{*}) \to α

, where

T_{1 - α}^{*}

stands for the

1 - α

percentile of the bootstrap distribution of T. From Corollary 1(b) and Theorem 2, it follows that such a test is consistent against fixed alternatives

π \in Δ_{k} (ϕ_{2}, P, h_{2}) - P

, in the sense that

P (T \geq T_{1 - α}^{*}) \to 1

.

In practice, the bootstrap p-value must be approximated by simulation as follows:

Calculate the observed value of the test statistic for the available data $(X_{1}, \dots, X_{k})$ , $T_{o b s}$ .
Generate B bootstrap samples $(X_{1}^{b *}, \dots, X_{k}^{b *}) \sim M_{k} (n; P ({\hat{θ}}_{ϕ_{2}, h_{2}}))$ , $b = 1, \dots, B$ , and calculate the test statistic for each bootstrap sample obtaining $T^{* b}$ , $b = 1, \dots, B$ .
Approximate the p-value by means of the expression

${\hat{p}}_{b o o t} = \frac{card {b : T_{b}^{* b} \geq T_{o b s}}}{B} .$

For the numerical experiments previously described, whose results are displayed in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6, we also calculated the bootstrap p-values. This was done by generating

B = 1000

bootstrap samples to approximate each p-value, and calculating the fraction of these p-values, which are less than or equal to 0.05 and 0.10 (top and bottom in the tables). Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12 display the estimated type I error probabilities obtained by using the bootstrap approximation as well as those obtained with the asymptotic approximation (bootstrap, B, and asymptotic, A, in the tables) taken from Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 in order to facilitate the comparison between them. Looking at Table 7, Table 8, Table 9, Table 10, Table 11 and Table 12, we conclude that the bootstrap approximation is superior to the asymptotic one for small and moderate sample sizes, since in all cases the bootstrap type I error probabilities were closer to the nominal values than those obtained using the asymptotic null distribution. This superior performance of the bootstrap null distribution estimator has been noticed in other inferential problems, where

ϕ

-divergences are used as test statistics (see, for example, [5,12,15,16]).

4. Application to the Evaluation of the Thematic Classification in Global Land Cover Maps

This section displays the results of an application of our proposal to two real data sets related to the thematic quality assessment of a global land cover (GLC) map. The data comprise the results of two thematic classifications of the land cover category “Evergreen Broadleaf Trees” (EBL) and summarize the number of sample units correctly classified in this class, and the number of confusions with other land cover classes: “Deciduous Broadleaf Trees” (DBL), “Evergreeen Needleleaf Trees” (ENL), and “Urban/Built Up” (U). The results of these two classifications were collected from two different global land cover maps: the Globcover map and the LC-CCI map (see Tsendbazar et al. [17] for additional details) and they are displayed in Table 13.

Parametric specifications of the multinomial vector of probabilities are quite attractive since they describe in a concise way the classification pattern. Because of this, given the similarity between the two observed classifications in Table 13, we are interested in the search of a parametric model suitable to depict the thematic accuracy of this class in both GLC maps. For this purpose, we consider the parametric family in Equation (3) of Example 3. The presence of a zero cell frequency in each data set leads us to consider a penalized

ϕ

-divergence as a test statistic for testing goodness-of-fit to such a parametric family.

Table 14 displays the observed values of the test statistic T and the associated bootstrap p-values for the goodness-of-fit test with respect to the parametric family in Equation (3) for the two observed classifications of the EBL class in Table 13. Looking at this table, it can be concluded that the null hypothesis cannot be rejected in both cases. Therefore, the parametric model in Equation (3) provides an adequate description of the thematic classification of the EBL class.

5. Proofs

Notice that

\begin{matrix} D_{ϕ, h} (π, P (θ)) & = & \sum_{i = 1}^{m} p_{i} (θ) ϕ (\frac{π_{i}}{p_{i} (θ)}) + h \sum_{i = m + 1}^{k} p_{i} (θ) \\ = & h I (m < k) + \sum_{i = 1}^{m} p_{i} (θ) ϕ_{h} (\frac{π_{i}}{p_{i} (θ)}) \end{matrix}

where

I

stands for the indicator function,

ϕ_{h} (x) = ϕ (x) - h

, if

m < k

, and

ϕ_{h} (x) = ϕ (x)

, if

m = k

. Let

D_{ϕ, h}^{+} (π, P (θ)) = \sum_{i = 1}^{m} p_{i} (θ) ϕ_{h} (\frac{π_{i}}{p_{i} (θ)}) .

Clearly,

\arg \min_{θ} D_{ϕ, h} (\hat{π}, P (θ)) = \arg \min_{θ} D_{ϕ, h}^{+} (\hat{π}, P (θ)) .

Note that, if Assumptions 1 and 2 hold, then Assumption 3 implies that

\frac{\partial}{\partial θ} D_{ϕ}^{+} (π, P (θ_{0})) = \sum_{i = 1}^{m} \frac{\partial}{\partial θ} p_{i} (θ_{0}) v_{i} = 0

(4)

where

v_{i} = ϕ (\frac{π_{i}}{p_{i} (θ_{0})}) - \frac{π_{i}}{p_{i} (θ_{0})} ϕ^{'} (\frac{π_{1}}{p_{i} (θ_{0})}) - h I (m < k)

(5)

1 \leq i \leq m

, and

ϕ^{'} (x) = \frac{\partial}{\partial x} ϕ (x)

. The

s \times s

matrix

D_{2} = \frac{\partial^{2}}{\partial θ \partial θ^{t}} D_{ϕ}^{+} (π, P (θ_{0})) = \sum_{i = 1}^{m} \frac{\partial^{2}}{\partial θ \partial θ^{t}} p_{i} (θ_{0}) v_{i} + \sum_{i = 1}^{m} \frac{\partial}{\partial θ} p_{i} (θ_{0}) \frac{\partial}{\partial θ} p_{i} {(θ_{0})}^{t} w_{i}

(6)

is positive definite, where

w_{i} = \frac{π_{i}^{2}}{p_{i}^{3} (θ_{0})} ϕ^{″} (\frac{π_{i}}{p_{i} (θ_{0})}),

1 \leq i \leq m

, and

ϕ^{″} (x) = \frac{\partial^{2}}{\partial x^{2}} ϕ (x)

. Therefore, by the Implicit Function Theorem (see, for example, Dieudonne [18], p. 272), there is an open neighborhood

U \subseteq {(0, 1)}^{m}

of

π^{+}

and s unique functions,

g_{i} : U \to R

,

1 \leq i \leq s

, so that

(i): ${\hat{θ}}_{ϕ} = {(g_{1} ({\hat{π}}^{+}), \dots, g_{s} ({\hat{π}}^{+}))}^{t}$ , $\forall n \geq n_{0}$ , for some $n_{0} \in N$ ;
(ii): $θ_{0} = {(g_{1} (π^{+}), \dots, g_{s} (π^{+}))}^{t}$ ;
(iii): $g = {(g_{1}, \dots, g_{s})}^{t}$ is continuously differentiable in U and the $s \times m$ Jacobian matrix of g at $(π_{1}, \dots, π_{m})$ is given by

$G = D_{2}^{- 1} D_{1} (P (θ_{0})) D i a g (ϖ)$

(7)

where

$D_{1} (P (θ)) = (\frac{\partial}{\partial θ} p_{1} (θ), \dots, \frac{\partial}{\partial θ} p_{m} (θ)),$

(8)

$ϖ = {(ϖ_{1}, \dots, ϖ_{m})}^{t}$ ,

$ϖ_{i} = \frac{π_{i}}{p_{i}^{2} (θ_{0})} ϕ^{″} (\frac{π_{i}}{p_{i} (θ_{0})}),$

and $1 \leq i \leq m$ .

Proof of Theorem 1.

Part (a) follows from (i) and (ii) above and the fact that

{\hat{π}}^{+} \to π^{+}

a.s. From (i)–(ii), and taking into account that

\sqrt{n} ({\hat{π}}^{+} - π^{+})

is asymptotically normal, it follows that

{\hat{θ}}_{ϕ} = θ_{0} + G (π, P (θ_{0}), ϕ) (\hat{π} - π) + o_{P} (n^{- 1 / 2}) .

(9)

Parts (b) and (c) follow from Equation (9) and the asymptotic normality of

\sqrt{n} ({\hat{π}}^{+} - π^{+})

. ☐

Proof of Corollary 1.

Part (a) was shown in Theorem 5.1 in [7]. To prove (b), we first demonstrate that

W = W_{0} + r_{n}

(10)

where

W_{0} = \sqrt{n} \{\sum_{j = 1}^{m} p_{j} ({\hat{θ}}_{ϕ_{2}, h_{2}}) ϕ_{1} (\frac{{\hat{π}}_{j}}{p_{j} ({\hat{θ}}_{ϕ_{2}, h_{2}})}) + h_{1} \sum_{j = m + 1}^{k} p_{j} ({\hat{θ}}_{ϕ_{2}, h_{2}}) - D_{ϕ_{1}, h_{1}} (π, P (θ_{0}))\} + r_{n},

and

r_{n} = o_{P} (1)

. Notice that

\begin{matrix} r_{n} & = & \sqrt{n} {h_{1} - ϕ_{1} (0)} \sum_{j : {\hat{π}}_{j} = 0, π_{j} > 0} p_{j} ({\hat{θ}}_{ϕ_{2}, h_{2}}) \\ = & \sqrt{n} {h_{1} - ϕ_{1} (0)} \sum_{j = 1}^{m} p_{j} ({\hat{θ}}_{ϕ_{2}, h_{2}}) I ({\hat{π}}_{j} = 0) . \end{matrix}

Therefore,

0 \leq E | r_{n} | \leq \sqrt{n} | h_{1} - ϕ_{1} (0) | \sum_{j = 1}^{m} P ({\hat{π}}_{j} = 0) = \sqrt{n} | h_{1} - ϕ_{1} (0) | \sum_{j = 1}^{m} {(1 - π_{j})}^{n} \to 0,

which implies

r_{n} = o_{P} (1)

. From Theorem 1 and Taylor expansion, it follows that

W_{0} \overset{L}{⟶} N (0, ϱ^{2})

; hence, the result in part (b) is proven. ☐

Proof of Theorem 2.

The proof of Theorem 2 is parallel to that of Theorem 2 in [5], so we omit it. ☐

Author Contributions

M.V. Alba-Fernández and M.D. Jiméz-Gamero conceived and designed the experiments; M.V. Alba-Fernández performed the experiments; M.V. Alba-Fernández and F.J. Ariza-López analyzed the data; F.J. Ariza-López contributed materials; M.V. Alba-Fernández and M.D. Jiméz-Gamero wrote the paper.

Acknowledgments

The authors thank the anonymous referees for their valuable time and careful comments, which improved the presentation of this paper. The research in this paper has been partially funded by grants: CTM2015–68276–R of the Spanish Ministry of Economy and Competitiveness (M.V. Alba-Fernández and F.J. Ariza-López) and MTM2017-89422-P of the Spanish Ministry of Economy, Industry and Competitiveness, ERDF support included (M.D. Jiménez-Gamero).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MLE	maximum likelihood estimator
M $ϕ$ E	minimum $ϕ$ -divergence estimator
MP $ϕ$ E	minimum penalized $ϕ$ -divergence estimator
RMSD	root mean square deviation
B	bootstrap
A	asymptotic
GLC	global land cover
EBL	evergreen broadleaf trees
DBL	deciduous broadleaf trees
ENL	evergreeen needleleaf trees
U	urban/built up

References

Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall: London, UK; CRC Press: Boca Raton, FL, USA, 2006. [Google Scholar]
Basu, A.; Sarkar, S. On disparity based goodness-of-fit tests for multinomial models. Stat. Probab. Lett. 1994, 19, 307–312. [Google Scholar] [CrossRef]
Morales, D.; Pardo, L.; Vajda, I. Asymptotic divergence of estimates of discrete distributions. J. Stat. Plann. Inference 1995, 48, 347–369. [Google Scholar] [CrossRef]
Mandal, A.; Basu, A.; Pardo, L. Minimum disparuty inference and the empty cell penalty: Asymptotic results. Sankhya Ser. A 2010, 72, 376–406. [Google Scholar] [CrossRef]
Jiménez-Gamero, M.D.; Pino-Mejías, R.; Alba-Fernández, M.V.; Moreno-Rebollo, J.L. Minimum ϕ-divergence estimation in misspecified multinomial models. Comput. Stat. Data Anal. 2011, 55, 3365–3378. [Google Scholar] [CrossRef]
White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
Mandal, A.; Basu, A. Minimum disparity inference and the empty cell penalty: Asymptotic results. Electron. J. Stat. 2011, 5, 1846–1875. [Google Scholar] [CrossRef]
Csiszár, I. Information type measures of difference of probability distributions and indirect observations. Studia Sci. Math. Hungar. 1967, 2, 299–318. [Google Scholar]
Lindsay, B.G. Efficiency versus robustness: The case for minimum Hellinger distance and related methods. Ann. Stat. 1994, 22, 1081–1114. [Google Scholar] [CrossRef]
Vuong, Q.H.; Wang, W. Minimum χ-square estimation and tests for model selection. J. Econom. 1993, 56, 141–168. [Google Scholar] [CrossRef]
Alba-Fernández, M.V.; Jiménez-Gamero, M.D.; Lagos-Álvarez, B. Divergence statistics for testing uniform association in cross-classifications. Inf. Sci. 2010, 180, 4557–4571. [Google Scholar] [CrossRef]
Jiménez-Gamero, M.D.; Pino-Mejías, R.; Rufián-Lizana, A. Minimum K_ϕ-divergence estimators for multinomial models and applications. Comput. Stat. 2014, 29, 363–401. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2017; Available online: https://www.R-project.org/ (accessed on 29 April 2018).
Amari, S. Integration of stochastic models by minimizing α-divergence. Neural Comput. 2007, 19, 2780–2796. [Google Scholar] [CrossRef] [PubMed]
Alba-Fernández, M.V.; Jiménez-Gamero, M.D. Bootstrapping divergence statistics for testing homogeneity in multinomial populations. Math. Comput. Simul. 2009, 79, 3375–3384. [Google Scholar] [CrossRef]
Jiménez-Gamero, M.D.; Alba-Fernández, M.V.; Barranco-Chamorro, I.; Muñoz-García, J. Two classes of divergence statistics for testing uniform association. Statistics 2014, 48, 367–387. [Google Scholar] [CrossRef]
Tsendbazar, N.E.; de Bruina, S.; Mora, B.; Schoutenc, L.; Herolda, M. Comparative assessment of thematic accuracy of GLC maps for specific applications using existing reference data. Int. J. Appl. Earth. Obs. Geoinf. 2016, 44, 124–135. [Google Scholar] [CrossRef]
Dieudonne, J. Foundations of Modern Analysis; Academic Press: New York, NY, USA; London, UK, 1969. [Google Scholar]

Table 1. Type I error probabilities obtained using asymptotic approximation for Example 1 with

θ = 0.3333

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

Table 1. Type I error probabilities obtained using asymptotic approximation for Example 1 with

θ = 0.3333

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

	$ϕ_{1} = {PD}_{- 2}$			$ϕ_{1} = {PD}_{1}$			$ϕ_{1} = {PD}_{2}$
	$h_{1} = h_{2}$			$h_{1} = h_{2}$			$h_{1} = h_{2}$
n	0.5	1	2	0.5	1	2	0.5	1	2
100	0.996	0.996	0.998	0.995	0.997	0.996	0.995	0.997	0.997
	0.996	0.996	0.998	0.995	0.997	0.996	0.995	0.997	0.997
150	0.995	0.995	0.996	0.994	0.995	0.996	0.994	0.994	0.995
	0.995	0.995	0.996	0.994	0.995	0.996	0.994	0.994	0.995
200	0.992	0.993	0.994	0.992	0.994	0.991	0.993	0.993	0.994
	0.992	0.994	0.994	0.992	0.994	0.991	0.993	0.993	0.994

Table 2. Type I error probabilities obtained using asymptotic approximation for Example 1 with

n = 200

,

θ = 0.3333

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

Table 2. Type I error probabilities obtained using asymptotic approximation for Example 1 with

n = 200

,

θ = 0.3333

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

$(h_{1}, h_{2})$	(0.5, 1)	(1, 0.5)	(0.5, 2)	(2, 0.5)	(1, 2)	(2, 1)
	0.989	0.997	0.998	0.998	0.994	0.998
	0.999	0.997	0.998	0.998	0.994	0.999

Table 3. Type I error probabilities obtained using asymptotic approximation for Example 2 with

θ = 0.24

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

Table 3. Type I error probabilities obtained using asymptotic approximation for Example 2 with

θ = 0.24

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

	$ϕ_{1} = {PD}_{- 2}$			$ϕ_{1} = {PD}_{1}$			$ϕ_{1} = {PD}_{2}$
	$h_{1} = h_{2}$			$h_{1} = h_{2}$			$h_{1} = h_{2}$
n	0.5	1	2	0.5	1	2	0.5	1	2
100	0.016	0.017	0.017	0.013	0.013	0.014	0.013	0.014	0.015
	0.034	0.036	0.036	0.031	0.030	0.031	0.030	0.033	0.033
150	0.018	0.019	0.017	0.014	0.014	0.014	0.013	0.015	0.016
	0.035	0.039	0.037	0.031	0.033	0.032	0.035	0.033	0.032
200	0.024	0.022	0.022	0.014	0.016	0.016	0.014	0.015	0.016
	0.043	0.042	0.040	0.032	0.034	0.032	0.032	0.035	0.033

Table 4. Type I error probabilities obtained using asymptotic approximation for Example 2 with

n = 200

,

θ = 0.24

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

Table 4. Type I error probabilities obtained using asymptotic approximation for Example 2 with

n = 200

,

θ = 0.24

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

$(h_{1}, h_{2})$	(0.5, 1)	(1, 0.5)	(0.5, 2)	(2, 0.5)	(1, 2)	(2, 1)
	0.017	0.017	0.018	0.019	0.018	0.016
	0.035	0.033	0.035	0.040	0.036	0.034

Table 5. Type I error probabilities obtained using asymptotic approximation for Example 3 with

θ = 0.8

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

Table 5. Type I error probabilities obtained using asymptotic approximation for Example 3 with

θ = 0.8

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

	$ϕ_{1} = {PD}_{- 2}$			$ϕ_{1} = {PD}_{1}$			$ϕ_{1} = {PD}_{2}$
	$h_{1} = h_{2}$			$h_{1} = h_{2}$			$h_{1} = h_{2}$
n	0.5	1	2	0.5	1	2	0.5	1	2
100	0.063	0.066	0.074	0.095	0.107	0.111	0.122	0.136	0.131
	0.122	0.120	0.125	0.157	0.165	0.161	0.181	0.190	0.182
150	0.063	0.064	0.066	0.083	0.082	0.084	0.099	0.105	0.100
	0.114	0.118	0.113	0.137	0.134	0.136	0.153	0.159	0.152
200	0.062	0.061	0.061	0.075	0.079	0.074	0.086	0.091	0.086
	0.111	0.111	0.115	0.129	0.137	0.123	0.145	0.148	0.144

Table 6. Type I error probabilities obtained using asymptotic approximation for Example 3 with

n = 200

,

θ = 0.8

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

Table 6. Type I error probabilities obtained using asymptotic approximation for Example 3 with

n = 200

,

θ = 0.8

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

$(h_{1}, h_{2})$	(0.5, 1)	(1, 0.5)	(0.5, 2)	(2, 0.5)	(1, 2)	(2, 1)
	0.060	0.062	0.063	0.062	0.063	0.058
	0.108	0.114	0.113	0.112	0.113	0.109

Table 7. Asymptotic and bootstrap type I error probabilities for Example 1 with

θ = 0.3333

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

,

h_{1} = h_{2} \in {0.5, 1, 2}

.

Table 7. Asymptotic and bootstrap type I error probabilities for Example 1 with

θ = 0.3333

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

,

h_{1} = h_{2} \in {0.5, 1, 2}

.

	$h_{1} = h_{2}$	$0.5$		1		2
$ϕ_{1}$	n	B	A	B	A	B	A
$P D_{- 2}$	100	0.051	0.996	0.048	0.996	0.048	0.998
		0.110	0.996	0.103	0.996	0.109	0.998
	150	0.055	0.995	0.050	0.995	0.056	0.996
		0.106	0.995	0.101	0.995	0.109	0.996
	200	0.053	0.992	0.053	0.993	0.056	0.994
		0.103	0.992	0.106	0.994	0.108	0.994
$P D_{1}$	100	0.057	0.995	0.056	0.997	0.055	0.996
		0.110	0.995	0.110	0.997	0.107	0.996
	150	0.054	0.994	0.052	0.995	0.055	0.996
		0.110	0.994	0.104	0.995	0.114	0.996
	200	0.055	0.992	0.051	0.994	0.052	0.991
		0.106	0.992	0.103	0.994	0.106	0.991
$P D_{2}$	100	0.055	0.995	0.056	0.997	0.054	0.997
		0.110	0.995	0.109	0.997	0.107	0.997
	150	0.054	0.994	0.055	0.994	0.056	0.995
		0.107	0.994	0.106	0.994	0.110	0.995
	200	0.054	0.993	0.053	0.993	0.055	0.994
		0.107	0.993	0.105	0.993	0.108	0.994

Table 8. Asymptotic and bootstrap type I error probabilities for Example 1 with

n = 200

,

θ = 0.3333

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

Table 8. Asymptotic and bootstrap type I error probabilities for Example 1 with

n = 200

,

θ = 0.3333

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

(0.5, 1)		(1, 0.5)		(0.5, 2)		(2, 0.5)		(1, 2)		(2, 1)
B	A	B	A	B	A	B	A	B	A	B	A
0.061	0.989	0.050	0.997	0.059	0.996	0.042	0.998	0.044	0.994	0.063	0.998
0.107	0.999	0.113	0.997	0.106	0.996	0.095	0.998	0.105	0.994	0.115	0.999

Table 9. Asymptotic and bootstrap type I error probabilities for Example 2 with

θ = 0.24

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

Table 9. Asymptotic and bootstrap type I error probabilities for Example 2 with

θ = 0.24

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

	$h_{1} = h_{2}$	$0.5$		1		2
$ϕ_{1}$	n	B	A	B	A	B	A
$P D_{- 2}$	100	0.057	0.016	0.055	0.017	0.051	0.017
		0.111	0.034	0.110	0.036	0.102	0.036
	150	0.049	0.018	0.048	0.019	0.051	0.017
		0.097	0.035	0.103	0.039	0.101	0.036
	200	0.051	0.024	0.055	0.022	0.051	0.022
		0.099	0.043	0.102	0.042	0.099	0.040
$P D_{1}$	100	0.058	0.013	0.054	0.013	0.051	0.014
		0.114	0.031	0.113	0.030	0.106	0.031
	150	0.050	0.014	0.051	0.014	0.052	0.014
		0.098	0.031	0.103	0.031	0.100	0.032
	200	0.049	0.014	0.054	0.016	0.052	0.016
		0.099	0.032	0.104	0.034	0.099	0.032
$P D_{2}$	100	0.055	0.013	0.053	0.014	0.050	0.015
		0.110	0.030	0.108	0.033	0.104	0.033
	150	0.050	0.013	0.052	0.015	0.051	0.016
		0.097	0.032	0.103	0.033	0.098	0.032
	200	0.049	0.014	0.051	0.015	0.051	0.016
		0.100	0.032	0.102	0.035	0.098	0.033

Table 10. Asymptotic and bootstrap type I error probabilities for Example 2 with

n = 200

,

θ = 0.24

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

Table 10. Asymptotic and bootstrap type I error probabilities for Example 2 with

n = 200

,

θ = 0.24

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

(0.5, 1)		(1, 0.5)		(0.5, 2)		(2, 0.5)		(1, 2)		(2, 1)
B	A	B	A	B	A	B	A	B	A	B	A
0.048	0.017	0.051	0.017	0.052	0.018	0.053	0.019	0.050	0.018	0.049	0.016
0.101	0.035	0.099	0.033	0.100	0.035	0.105	0.040	0.103	0.036	0.101	0.034

Table 11. Asymptotic and bootstrap type I error probabilities for Example 3 with

θ = 0.8

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

Table 11. Asymptotic and bootstrap type I error probabilities for Example 3 with

θ = 0.8

,

ϕ_{1} = P D_{λ}

,

λ \in {- 2, 1, 2}

,

ϕ_{2} = P D_{- 2}

, and

h_{1} = h_{2} \in {0.5, 1, 2}

.

	$h_{1} = h_{2}$	$0.5$		1		2
$ϕ_{1}$	n	B	A	B	A	B	A
$P D_{- 2}$	100	0.066	0.063	0.058	0.066	0.044	0.074
		0.119	0.122	0.101	0.120	0.086	0.125
	150	0.053	0.063	0.050	0.064	0.045	0.066
		0.098	0.114	0.095	0.118	0.093	0.113
	200	0.051	0.062	0.047	0.061	0.046	0.061
		0.099	0.111	0.096	0.111	0.100	0.115
$P D_{1}$	100	0.049	0.095	0.049	0.107	0.041	0.111
		0.103	0.157	0.098	0.065	0.084	0.161
	150	0.050	0.083	0.040	0.082	0.040	0.084
		0.098	0.137	0.090	0.134	0.087	0.136
	200	0.046	0.075	0.048	0.079	0.044	0.074
		0.095	0.129	0.102	0.137	0.092	0.123
$P D_{2}$	100	0.043	0.122	0.045	0.136	0.037	0.131
		0.099	0.181	0.046	0.190	0.077	0.182
	150	0.040	0.099	0.047	0.105	0.035	0.100
		0.041	0.153	0.093	0.159	0.081	0.152
	200	0.043	0.086	0.048	0.091	0.043	0.086
		0.092	0.145	0.097	0.148	0.090	0.144

Table 12. Asymptotic and bootstrap type I error probabilities for Example 3 with

n = 200

,

θ = 0.8

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

Table 12. Asymptotic and bootstrap type I error probabilities for Example 3 with

n = 200

,

θ = 0.8

,

ϕ_{1} = ϕ_{2} = P D_{- 2}

,

h_{1} \neq h_{2}

, and

h_{1}, h_{2} \in {0.5, 1, 2}

.

(0.5, 1)		(1, 0.5)		(0.5, 2)		(2, 0.5)		(1, 2)		(2, 1)
B	A	B	A	B	A	B	A	B	A	B	A
0.047	0.060	0.048	0.062	0.051	0.063	0.049	0.062	0.048	0.063	0.044	0.058
0.095	0.108	0.099	0.114	0.099	0.113	0.097	0.112	0.099	0.113	0.092	0.109

Table 13. Thematic classification of the Evergreen Broadleaf Trees (EBL) class.

		Globcover Map	LC-CCI Map
Classified Data	EBL	165	172
	DBL	13	5
	ENL	7	5
	U	0	0

Table 14. Results of the goodness-of-fit test applied to the thematic classification of the EBL class.

	Globcover Map			LC-CCI Map
	${\hat{θ}}_{- 2, 0.5} = 0.9490$			${\hat{θ}}_{- 2, 0.5} = 0.9721$
$ϕ_{1}$	$P D_{- 2}$	$P D_{1}$	$P D_{2}$	$P D_{- 2}$	$P D_{1}$	$P D_{2}$
$T_{o b s}$	2.3015	2.7618	3.0111	0.1432	0.1432	0.1433
${\hat{p}}_{b o o t}$	0.1700	0.2253	0.2926	0.9283	0.9200	0.9148
	${\hat{θ}}_{- 2, 1} = 0.9503$			${\hat{θ}}_{- 2, 1} = 0.9725$
$T_{o b s}$	2.7686	3.3752	3.6962	0.2821	0.2823	0.2826
${\hat{p}}_{b o o t}$	0.1801	0.2325	0.2671	0.8431	0.9162	0.9182
	${\hat{θ}}_{- 2, 2} = 0.9527$			${\hat{θ}}_{- 2, 2} = 0.9732$
$T_{o b s}$	3.6352	4.5400	5.0219	0.5492	0.5508	0.5514
${\hat{p}}_{b o o t}$	0.1300	0.2492	0.2584	0.7526	0.8144	0.8291

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alba-Fernández, M.V.; Jiménez-Gamero, M.D.; Ariza-López, F.J. Minimum Penalized ϕ-Divergence Estimation under Model Misspecification. Entropy 2018, 20, 329. https://doi.org/10.3390/e20050329

AMA Style

Alba-Fernández MV, Jiménez-Gamero MD, Ariza-López FJ. Minimum Penalized ϕ-Divergence Estimation under Model Misspecification. Entropy. 2018; 20(5):329. https://doi.org/10.3390/e20050329

Chicago/Turabian Style

Alba-Fernández, M. Virtudes, M. Dolores Jiménez-Gamero, and F. Javier Ariza-López. 2018. "Minimum Penalized ϕ-Divergence Estimation under Model Misspecification" Entropy 20, no. 5: 329. https://doi.org/10.3390/e20050329

APA Style

Alba-Fernández, M. V., Jiménez-Gamero, M. D., & Ariza-López, F. J. (2018). Minimum Penalized ϕ-Divergence Estimation under Model Misspecification. Entropy, 20(5), 329. https://doi.org/10.3390/e20050329

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Minimum Penalized ϕ-Divergence Estimation under Model Misspecification

Abstract

1. Introduction

2. Some Asymptotic Properties of MP $ϕ$ Es

3. Application to Bootstrapping Goodness-Of-Fit Tests

4. Application to the Evaluation of the Thematic Classification in Global Land Cover Maps

5. Proofs

Author Contributions

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Minimum Penalized ϕ-Divergence Estimation under Model Misspecification

Abstract

1. Introduction

2. Some Asymptotic Properties of MP ϕ Es

3. Application to Bootstrapping Goodness-Of-Fit Tests

4. Application to the Evaluation of the Thematic Classification in Global Land Cover Maps

5. Proofs

Author Contributions

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2. Some Asymptotic Properties of MP $ϕ$ Es