^{1}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

In this note we introduce some divergence-based model selection criteria. These criteria are defined by estimators of the expected overall discrepancy between the true unknown model and the candidate model, using dual representations of divergences and associated minimum divergence estimators. It is shown that the proposed criteria are asymptotically unbiased. The influence functions of these criteria are also derived and some comments on robustness are provided.

The minimum divergence approach is a useful technique in statistical inference. In recent years, the literature dedicated to the divergence-based statistical methods has grown substantially and the monographs of Pardo [

In this paper we apply estimators of divergences in dual form and corresponding minimum dual divergence estimators, as presented by Broniatowski and Keziou [

Model selection is a method for selecting the best model among candidate models. A model selection criterion can be considered as an approximately unbiased estimator of the expected overall discrepancy, a nonnegative quantity that measures the distance between the true unknown model and a fitted approximating model. If the value of the criterion is small, then the approximated candidate model can be chosen.

Many model selection criteria have been proposed so far. Classical model selection criteria using least square error and log-likelihood include the _{p}

In the present paper, we apply the same methodology used for AIC, and also for DIC, to a general class of divergences including the Cressie–Read divergences [

The paper is organized as follows. In Section 2 we recall the duality formula for divergences, as well as the definitions of associated dual divergence estimators and minimum dual divergence estimators, together with their asymptotic properties, all these being necessary in the next section where we define new criteria for model selection. In Section 3, we apply the same methodology used for AIC to the divergences in dual form in order to develop criteria for model selection. We define criteria based on estimators of the expected overall discrepancy and prove their asymptotic unbiasedness. The influence functions of the proposed criteria are also derived. In Section 4 we present some conclusions.

Let

When

A commonly used family of divergences is the so-called “power divergences” or Cressie–Read divergences. This family is defined by the class of functions

for γ ∈ ℝ \ {0,1} and _{0}(_{1}(_{1}, the modified Kullback–Leibler (KL_{m}_{0}, the ^{2} divergence to _{2}, the modified ^{2} divergence
_{1/2}. We refer to [^{2} and KL divergences.

Some applied models using divergence and entropy measures can be found in Toma and Leoni-Aubin [

Let {_{θ}, θ^{p}_{θ}_{θ}_{0} on the basis of an i.i.d. sample _{1},…, _{n}

In the following,
_{θ}

Using a Fenchel duality technique, Broniatowski and Keziou [

We consider divergences, defined through differentiable functions _{1}, _{2}, _{3} such that

Condition (C.0) holds for all power divergences, including KL and KL_{m}

Assuming that

with

where
_{0}, independently on

We mention that the dual representation

Naturally, for fixed

the supremum being attained for

_{0} called dual divergence estimators. Further, since

and since the infimum in the above display is unique, a natural definition of estimators of the parameter _{0}, called minimum dual divergence estimators, is provided by

For more details on the dual representation of divergences and associated minimum dual divergence estimators, we refer to Broniatowski and Keziou [

Broniatowski and Keziou [

(C.1) The estimates

(C.2)

(a) for any positive _{0} || >

(b) there exists some neighborhood
_{0} such that for any positive _{0}|| >

(C.3) There exists some neighborhood
_{0} and a positive function _{0},

(C.4) There exists a neighborhood
_{0} such that the first and the second order partial derivatives with respect to

(C.5) The integrals

Assume that conditions (C.1)–(C3) hold. Then

(a)

(b)
_{0} in probability. If (C.1)–(C.5) are fulfilled, then

(c)
_{0})^{−1}.

For discussions and examples about the fulfillment of conditions (C.1)–(C5), we refer to Broniatowski and Keziou [

In this section, we apply the same methodology used for AIC to the divergences in dual form in order to develop model selection criteria. Consider a random sample _{1}, …, _{n}_{θ}_{θ}^{p}_{θ}_{θ}

The target theoretical quantity that will be approximated by an asymptotically unbiased estimator is given by

where
_{θ}_{θ}

The next Lemma gives the gradient vector and the Hessian matrix of _{θ}_{θ}

(C.6) There exists a neighborhood _{θ} of

(C.7) There exists a neighborhood _{θ}

Assume that conditions (C.6) and (C.7) hold. Then, the gradient vector
_{θ} is given by

and the Hessian matrix

The proof of this Lemma is straightforward, therefore it is omitted.

Particularly, when using Cressie–Read divergences, the gradient vector
_{θ}

and the Hessian matrix

When the true model _{θ}_{θ}_{0} simplify to

The hypothesis that the true model _{θ}

When the true model g belongs to the parametric model (f_{θ}), assuming that conditions (C.6) and (C.7) are fulfilled for
_{0}, the expected overall discrepancy is given by

where
_{0} is the true value of the parameter.

By applying a Taylor expansion to _{θ}_{0} and taking

Then

In this section we construct an asymptotically unbiased estimator of the expected overall discrepancy, under the hypothesis that the true model _{θ}

For a given _{θ}

where

using the sample analogue of the dual representation of the divergence.

The following conditions allow derivation under the integral sign for the integral term of _{θ}.

(C.8) There exists a neighborhood _{θ}

(C.9) There exists a neighborhood _{θ}

Under (C.8) and (C.9), the gradient vector and the Hessian matrix of Q_{θ} are

Since

derivation yields

Note that, by its very definition,

taken with respect to

On the other hand,

Under conditions (C.1)–(C.3) and (C.8)–(C.9) and assuming that the integrals
_{θ} evaluated in

By the very definition of

A Taylor expansion of
_{0}, _{0}) yields

Using the fact that

Then, since

Thus we obtain

In the following, we suppose that conditions of Proposition 1, Proposition 2 and Proposition 3 are all satisfied. These conditions allow obtaining an asymptotically unbiased estimator of the expected overall discrepancy.

When the true model g belongs to the parametric model (f_{θ}), the expected overall discrepancy evaluated at

where

A Taylor expansion of _{θ}

and using Proposition 3, we have

Taking _{0}, for large

and consequently

Where

According to Proposition 2 it holds

Note that

Then, combining

Proposition 4 shows that an asymptotically unbiased estimator of the expected overall discrepancy is given by

According to Proposition 1,
_{p}_{0})^{−1})

In the following, we compute the influence function of the statistics
_{n}_{θ}

where

Since

the statistical functional corresponding to

where _{θ}

The influence function of

For the contaminated model

Derivation with respect to

Note that _{0}, _{0},

On the other hand, according to the results presented in [

Consequently, we obtain

Note that, for Cressie–Read divergences, it holds

irrespective of the used divergence, since

Generally,

The dual representation of divergences and corresponding minimum dual divergence estimators are useful tools in statistical inference. The presented theoretical results show that, in the context of model selection, these tools provide asymptotically unbiased criteria. These criteria are not robust in the sense of the bounded influence function, but this fact does not exclude the stability of the criteria with respect to other robustness measures. The computation of

The author thanks the referees for a careful reading of the paper and for the suggestions leading to an improved version of the paper. This work was supported by a grant of the Romanian National Authority for Scientific Research, CNCS-UEFISCDI, project number PN-II-RU-TE-2012-3-0007.

The author declares no conflict of interest.