Identification in Parametric Models: The Minimum Hellinger Distance Criterion

Pacini, David

doi:10.3390/econometrics10010010

Open AccessCommunication

Identification in Parametric Models: The Minimum Hellinger Distance Criterion

by

David Pacini

School of Economics, University of Bristol, Bristol BS8 1TU, UK

Econometrics 2022, 10(1), 10; https://doi.org/10.3390/econometrics10010010

Submission received: 5 October 2021 / Revised: 3 February 2022 / Accepted: 15 February 2022 / Published: 21 February 2022

Download

Browse Figures

Versions Notes

Abstract

:

This note studies the criterion for identifiability in parametric models based on the minimization of the Hellinger distance and exhibits its relationship to the identifiability criterion based on the Fisher matrix. It shows that the Hellinger distance criterion serves to establish identifiability of parameters of interest, or lack of it, in situations where the criterion based on the Fisher matrix does not apply, like in models where the support of the observed variables depends on the parameter of interest or in models with irregular points of the Fisher matrix. Several examples illustrating this result are provided.

Keywords:

identification; Hellinger distance; Fisher matrix

1. Introduction

There are values of unknown parameters of interest in data analysis that cannot be determined even in the most favorable situation where the maximum amount of data is available, i.e., when the distribution of the population is known. This difficulty has been tackled by either introducing criteria securing that the parameter of interest is (local) identifiable or by delineating the set of observationally equivalent values of the parameter of interest; for a review of these approaches, see, e.g., (Paulino and Pereira 1998) or (Lewbel 2019). This note contributes to these efforts by studying the criterion for identifiability based on the minimization of the Hellinger distance, which was introduced by Beran (1977), and exhibiting its relationship to the criterion for local identifiability based on the non-singularity the Fisher matrix, which was introduced by Rothenberg (1971). The similarities and differences between these two criteria for identifiability have so far not been studied.

The main result in this note is to show that the Hellinger distance criterion can be used to verify the (local) identifiability of a parameter of interest, or lack of it, either in models or points in the parameter space where the Fisher matrix criterion does not apply. This note illustrates this result with several examples, including a parametric procurement auction model, the uniform, normal squared, and Laplace location models. These models are either irregular because the support of the observed variables depends on the parameter of interest or the parameter space has irregular points of the Fisher matrix. Additional examples of irregular models and models with irregular points of the Fisher matrix are referenced below after defining the concepts of a regular point of the Fisher matrix and a regular model according to conventional usage, see, e.g., Rothenberg (1971).

Let Y be a vector-valued random variable in

Y \subseteq R^{L}

with probability function

P_{o}

. Let the available data be a sample

{Y_{i}}_{i = 1}^{N}

of independent and identically distributed replications of Y. Consider a family

F

of probability density functions

f : Y \to [0, \infty)

defined with respect to a common dominating measure

μ

, which will allow us to dispense with the need to distinguish between continuous and discrete random variables.1 Let

F_{Θ}

denote a subset of densities in

F

indexed by

θ \in Θ

, where the parameter space

Θ

is a subset of

R^{K}

, with K a positive integer. Let

f_{θ}

denote an element of

F_{Θ}

.

Definition 1

(Identifiability). A parameter point

θ_{o}

in Θ is said to be identifiable if there is no other θ in Θ such that

f_{θ} (y) = f_{θ_{o}} (y)

for μ-a.s y.

Definition 2

(Local Identifiability). A parameter point

θ_{o}

in Θ is said to be locally identifiable if there exists an open neighborhood

Θ_{★} \subseteq Θ

of

θ_{o}

containing no other θ such that Yes, it has been

f_{θ} (y) = f_{θ_{o}} (y)

for μ-a.s y.

Definition 3

(Regular Points). The Fisher matrix

I (θ)

is the variance-covariance of the score

▿ ℓ (θ) : = ▿ ln f_{θ}

,

I (θ) : = E [▿ ℓ (θ) ▿ ℓ {(θ)}^{⊤}] - E [▿ ℓ (θ)] E [▿ ℓ {(θ)}^{⊤}] .

The point

θ_{o} \in Θ

is said to be a regular point of the Fisher matrix if there exists an open neighborhood of

θ_{o}

in which

I (θ)

has constant rank.

The (local) identifiability of regular points of the Fisher matrix in parametric models has been extensively studied, see, e.g., Rothenberg (1971). In contrast, the identifiability of irregular points has been less studied and the literature is rather unclear about what may happen about (local) identifiability of irregular points of the Fisher matrix. The study of irregular points is worthy of consideration because, first, there are several models of interest with this type of point in the parameter space (see the list below), and second, because irregular points may either correspond to:

points in the parameter space that are not locally identifiable and for which a consistent estimator cannot not exist, e.g., the measurement error model studied by Reiersol (1950), or a consistent estimator can only exist after a normalization; or
points in the parameter space that are locally identifiable and for which a $\sqrt{N}$ -consistent estimator cannot exist (and some algorithms, e.g., Newton–Raphson method based on the Fisher matrix, will face difficulties in converging) or a $\sqrt{N}$ -consistent estimator can only exist after a reparametrization of the model, see, e.g., the bivariate probit model in Han and McCloskey (2019).

Hinkley (1973) noted that an irregular point of the Fisher matrix arises in the normal unsigned location model when the location parameter is zero. Sargan (1983) constructed simultaneous equation models with irregular points of the Fisher matrix. Lee and Chesher (1986) showed that the normal regression model with non-ignorable non-response has irregular points of the Fisher matrix in the vicinity of ignorable non-response. Li et al. (2009) noted that finite-mixture density models have irregular points of the Fisher matrix in the vicinity of homogeneity. Hallin and Ley (2012) showed that skew-symmetric density models have irregular points of the Fisher matrix in the vicinity of symmetry. We use below the normal squared location model (see Example 3) to illustrate in a transparent way the notion of an irregular point of the Fisher matrix.

The next Section shows that the criterion for local identifiability based on minimizing the Hellinger distance, unlike the criterion based on the non-singularity of the Fisher matrix, does apply to both regular and irregular points of the Fisher matrix and to regular and irregular models, to be defined below in Section 3. Section 3 shows that, for regular points of the Fisher matrix in the class of regular models studied by Rothenberg (1971), the criterion based on the Fisher matrix is a particular case of the criterion based on minimizing the Hellinger distance (but not for irregular models or irregular points of the Fisher matrix). Section 4 relates the minimum Hellinger distance criterion with the criterion based on the reversed Kullback–Liebler criterion, introduced by Bowden (1973), by showing that both are particular cases of the criterion for identifiability based on the minimization of a

φ

-divergence.

2. The Minimum Hellinger Distance Criterion

Identifying

θ_{o}

is the problem of distinguishing

f_{θ_{o}}

from the other members of

F_{Θ}

. It is then convenient to begin by introducing a notion of how densities differ from each other. The squared Hellinger distance for the pair of densities

f_{θ}, f_{θ_{0}}

in

F_{Θ}

is the square of the

L_{2} (μ)

-norm of the difference between the squared-root of the densities:

ρ (θ) : = \frac{1}{2} ∥ f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2} ∥_{L_{2} (μ)}^{2} = \frac{1}{2} \int {(f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2})}^{2} d μ = 1 - \int f_{θ}^{1 / 2} f_{θ_{o}}^{1 / 2} d μ .

The squared Hellinger distance has the following well-known properties (see, e.g., Pardo 2005, p. 51), which are going to be used later.

Lemma 1.

ρ can take values from 0 to 1, which are independent of the choice of the dominating measure μ, and

ρ (θ) = 0

if and only if

f_{θ} (y) = f_{θ_{o}} (y)

for μ-a.s y.

(All the proofs are in Appendix A) Alternative notions of divergence between densities, other than the squared Hellinger distance, are studied in the Section 4. Since

ρ (θ)

is equal to zero if and only if

f_{θ}

and

f_{θ_{o}}

are equal, one has the following characterization of identifiability.

Lemma 2.

The parameter

θ_{o} \in Θ

is identifiable in the model

F_{Θ}

if and only if, for all

θ \in Θ

such that

ρ (θ) = 0

,

θ = θ_{o}

.

Moreover, since

θ \mapsto ρ (θ)

is non-negative and reaches a minimum at

θ = θ_{o}

, one obtains the following criterion for identifiability based on minimizing the squared Hellinger distance.

Proposition 1.

The parameter

θ_{o} \in Θ

is identifiable in the model

F_{Θ}

if and only if

θ_{o} = arg min_{θ \in Θ} ρ (θ) .

This criterion applies to models where:

the support of Y depends on the parameter of interest (see Examples 1 and 2 below);
$θ_{o}$ is not a regular point of the Fisher matrix (see Example 3 below);
some elements of the Fisher matrix $I (θ_{o})$ are not defined (see Example 5 below);
$θ \mapsto I (θ)$ is not continuous (see Example 6 below);
Θ is infinite-dimensional, as in semiparametric models (which are out of the scope of this note).2

The following examples illustrate the use of Proposition 1 and the definitions introduced so far. They are also going to illustrate, in the next section, the regularity conditions employed by Rothenberg (1971) to obtain a criterion for local identifiability based on the Fisher matrix. In these examples,

μ

denotes the Lebesgue measure. The Supplementary Materials presents step-by-step calculations of the squared Hellinger distance in Examples 1–5.

Example 1

(Uniform Location Model). Set

Y = (0, \infty)

and

Θ = (0, \infty)

. Consider the uniform location model

f_{θ} (y) = θ^{- 1} 1 (0 \leq y \leq θ) .

The Hellinger distance is

ρ (θ) = 1 - \frac{(θ + θ_{o} - | θ - θ_{o} |)}{2 \sqrt{θ θ_{o}}} .

Since the unique solution to

\frac{1}{2 \sqrt{θ θ_{o}}} (θ + θ_{o} - | θ - θ_{o} |) = 1

is

θ = θ_{o}

, see Figure 1a, one has

arg {min}_{θ \in Θ} ρ (θ) = θ_{o}

.

The Fisher matrix is

I (θ_{o}) = 0

, which is a singular matrix.

Example 2

(First-Price Auction Model). Consider the first-price procurement auction model with m bidders introduced in (Paarsch 1992, Section 4.2.2). For bidders with independent private valuations, following an exponential distribution with parameter θ (Paarsch 1992, Display 4.18) shows that the density of the wining bid

Y_{i}

in the i-th auction is

f_{θ} (y) = \frac{m}{θ} exp (\frac{m}{m - 1} - \frac{m}{θ} y) 1 (y \geq \frac{θ}{m - 1}) .

Set

Y = R_{+}

and

Θ = (0, \infty)

. The Hellinger distance in this case is

ρ (θ) = 1 - \frac{2 \sqrt{θ θ_{o}}}{(θ + θ_{o})} exp {(1 - \frac{(θ + θ_{o})}{2 θ θ_{o}} max (θ, θ_{o}))}^{\frac{m}{m - 1}} .

For

θ < θ_{o}

(resp.

θ > θ_{o}

), one has

▿ ρ (θ) < 0

(

▿ ρ (θ) > 0

), see Figure 1b. Hence, by continuity of

θ \mapsto ρ (θ)

,

arg {min}_{θ \in Θ} ρ (θ) = θ_{o}

. The Fisher matrix is

I (θ_{o}) = 0

, which is a singular matrix.

Example 3

(Normal Squared Location Model). Set

Y = R

and

Θ = R

. Consider the normal squared location model

f_{θ} (y) = {(\sqrt{2 π})}^{- 1} exp [- {(y - θ^{2})}^{2} / 2] .

This model would arise, for example, if Y is the difference between a matched pair of random variables whose control and treatment labels are not observed. The Hellinger distance is

ρ (θ) = 1 - exp (- {(θ^{2} - θ_{o}^{2})}^{2} / 8) .

The parameter point

θ_{o} = 0

is identifiable because

θ \mapsto 1 - exp (- θ^{4} / 8)

is a strictly convex function, see Figure 2a. The parameter points

θ_{o} \neq 0

are not identifiable because

ρ (θ_{o}) = ρ (- θ_{o}) = 0

, see Figure 2b. The Fisher matrix is

I (θ_{o}) = 4 θ_{o}^{2}

, which implies that

I (0) = 0

is a singular matrix and

θ_{o} = 0

is an irregular point of the Fisher matrix.

Example 4

(Demand-and-Supply Model). Let

Y = (P, Q)

denote the observed price and quantity of a good transacted in a market at a given period of time. Linear approximations to the demand and supply functions are

\begin{matrix} D & = α + β P + V & (Demand) \\ S & = γ + δ P + U & (Supply) \\ Q & = D = S, & (Equilibrium) \end{matrix}

where

α, β, γ, δ

are unknown parameters and

(U, V)

is an unobserved random vector. Assume that U and V are independent and jointly normal distributed with mean zero and unknown variance

σ_{11}

and

σ_{22}

, respectively. Set

θ = (α, β, γ, δ, σ_{11}, σ_{22})

. The density of the observed variables is then

f_{θ} (y) = \frac{exp (- \frac{1}{2} {(y - μ)}^{'} Ω^{- 1} (y - μ))}{\sqrt{{(2 π)}^{2} det (Ω)}},

where

det (\cdot)

is the determinant of the matrix in the parenthesis and

\begin{matrix} μ = (\begin{matrix} \frac{α - γ}{δ - β} \\ \frac{δ α - β γ}{δ - β} \end{matrix}) a n d Ω = (\begin{matrix} \frac{σ_{11} + σ_{22}}{{(α - γ)}^{2}} & \frac{δ σ_{11} + β σ_{22}}{{(α - γ)}^{2}} \\ \frac{δ σ_{11} + β σ_{22}}{{(α - γ)}^{2}} & \frac{δ^{2} σ_{11} + β^{2} σ_{22}}{{(α - γ)}^{2}} \end{matrix}) . \end{matrix}

The squared Hellinger distance is

ρ (θ) = 1 - \frac{det {(Ω)}^{1 / 4} det {(Ω_{o})}^{1 / 4}}{det {(\frac{Ω + Ω_{o}}{2})}^{1 / 2}} exp (- \frac{1}{8} {(μ - μ_{o})}^{'} {(\frac{Ω + Ω_{o}}{2})}^{- 1} (μ - μ_{o})) .

To show that

θ_{o}

is not identifiable, by Proposition 1, it suffices to verify that

arg {min}_{θ} ρ (θ)

is not a singleton. We elaborate on this point in the Supplementary Material.

Example 5

(Laplace Location Model). Set

Y = R

,

Θ = R

. Consider the Laplace location model

f_{θ} (y) = \frac{1}{2} exp (- | y - θ |) .

The squared Hellinger distance is

ρ (θ) = 1 - exp (- \frac{| θ - θ_{o} |}{2}) - \frac{δ}{2} [1 (θ - θ_{o} > 0) exp (- \frac{θ - θ_{o}}{2}) - 1 (θ - θ_{o} < 0) exp (\frac{θ - θ_{o}}{2})] .

For any

θ - θ_{o} < 0

(

θ - θ_{o} < 0

), one has

▿ ρ (θ) > 0

(

▿ ρ (θ) > 0

). By continuity,

θ \mapsto ρ (θ)

has then a unique minimizer at

θ = θ_{o}

, and, by Proposition 1,

θ_{o}

is identifiable. The Fisher matrix is

I (θ) = 1

, which is a non-singular matrix.

Example 6

(Exponential Mixture). Set

Y = [0, \infty]

,

θ = {(θ_{1}, θ_{2}, θ_{3})}^{'}

and

Θ = [0, 1] \times [0, \infty) \times [0, \infty)

. Consider the finite mixture of exponential model

f_{θ} (y) = (1 - θ_{1}) exp (ln θ_{2} - θ_{2} y) + θ_{1} exp (ln θ_{3} - θ_{3} y) .

Consider

θ_{o} = (1 / 2, 1, 2)

and

θ_{★} = (1 / 2, 2, 1)

. Since

f_{θ_{o}}^{1 / 2} = f_{θ_{★}}^{1 / 2}

, one has

ρ (θ_{★}) = 0

and

θ_{★} \in arg {min}_{θ \in Θ} ρ (θ)

. Since

θ_{o} \neq θ_{★}

, it follows from Proposition 1 that

θ_{o}

is not identifiable.

The previous examples also illustrate the difference between identifiable and local identifiable points in the parameter space.

Example 7

(Normal Squared Location Model, Continued). In this example, any

θ_{o} \in Θ

is locally identifiable—even the irregular point

θ_{o} = 0

to the Fisher matrix—and only

θ_{o} = 0

is identifiable, see Figure 2.

We also have the following criterion for local identifiability based on minimizing the squared Hellinger distance.

Proposition 2.

The parameter

θ_{o} \in Θ

is locally identifiable in the model

F_{Θ}

if and only if there exists an open set

Θ_{★} ∋ θ_{o}

such that

θ_{o} = arg min_{θ \in Θ_{★}} ρ (θ) .

This criterion, unlike the criterion based on the Fisher matrix by Rothenberg (1971) and re-stated below as Lemma 3 for the sake of completeness, applies to the case when:

the support of Y depends on the parameter of interest;
$θ_{o}$ is not a regular point of the Fisher matrix;
some elements of the Fisher matrix $I (θ_{o})$ are not defined;
$θ \mapsto I (θ)$ is not continuous;
Θ is infinite-dimensional.

Proposition 2 reduces local identifiability to a unique solution of a well-defined minimization problem. One general criterion, and, as argued, e.g., (Rockafellar and Wets 1998), virtually the only available one, to check in advance for the uniqueness of a minimizer of an optimization problem is the strict convexity of the objective function. The application of this general criterion to the characterization of local identifiability in Proposition 2 yields the following result:

Proposition 3.

If

θ \mapsto ρ (θ)

is a locally strictly convex function around

θ_{o}

(i.e., if there is an open convex set

Θ_{★} ∋ θ_{o}

such that

ρ : Θ_{★} \mapsto [0, 1]

is a strictly convex function), then

θ_{o}

is locally identifiable.

Proposition 3 leads to the observation that local identifiability can be seen to be related to the local convexity of the Hellinger distance. As with our earlier propositions, it holds when the support of Y depends on the parameter of interest,

θ_{o}

is not a regular point of the Fisher matrix, some elements of the Fisher matrix

I (θ_{o})

are not defined or

θ \mapsto I (θ)

is not continuous.

3. The Fisher Matrix Criterion

Rothenberg (1971) gives a criterion for local identifiability in terms of the non-singularity of the Fisher matrix. Additional insight about the relevance—and limitations—of the Fisher matrix criterion for local identifiability may then be gained by relating it to the criterion based on minimizing the Hellinger distance. To study this relationship, we now focus on the regular models studied by Rothenberg (1971).

Assumption 1

(R (Regular Models)).

F_{Θ}

is such that:

(A1) Θ is an open set in

R^{K}

.

(A2)

f_{θ} \geq 0

and

\int f_{θ} d μ = 1

for all

θ \in Θ

.

(A3)

s u p p (f_{θ}) : = {y \in Y : f_{θ} (y) > 0}

is the same for all

θ \in Θ

.

(A4) For all θ in a convex set containing Θ and for all

y \in s u p p (f_{θ})

, the functions

θ \mapsto f_{θ}

and

θ \mapsto ℓ (θ) : = ln f_{θ}

are continuously differentiable.

(A5) The elements of the matrix

E [▿ ℓ (θ) ▿ ℓ {(θ)}^{⊤}]

are finite and are continuous functions of θ everywhere in Θ.

We now replicate the characterization of local identifiability by Rothenberg (1971) Theorem 1 based on the non-singularity of the Fisher matrix.

Lemma 3.

Let the regularity conditions in Assumption R hold. Let

θ_{o}

be a regular point of the Fisher matrix

I (θ_{o})

. Then,

θ_{o}

is locally identifiable if and only if

I (θ_{o})

is non-singular.

This characterization of local identifiability only applies to the regular models defined by Assumption R and to the regular points of the Fisher matrix, which may be a subset of the parameter space (see Example 3). These conditions do not have themselves any direct statistical or economic interpretation: their role is just to permit a characterization of local identifiability.3 We have already referenced in the introduction a list of models with irregular points of the Fisher matrix, for which the characterization in Lemma 3 does not apply. We now use Examples 1–5 to illustrate the notions of regular and irregular models and their implications for the analysis of identifiability. The richness of the possibilities that follow is a recall of the care needed in using the Fisher matrix criterion for showing local identifiability (or lack of it). It also highlights the convenience of the identifiability criterion based on minimizing the Hellinger distance as a unifying approach to study the identifiability of regular or irregular points of the Fisher matrix in either regular or irregular models. Specifically:

The uniform location model in Example 1 and the first-price auction model in Example 2 have, respectively, $s u p p (f_{θ}) = [0, 1 / θ]$ and $s u p p (f_{θ}) = [θ / [m - 1], \infty)$ , which means that these models violate the regularity condition $(A 3)$ . We have seen that $θ_{o}$ is identifiable in Examples 1 and 2, which implies that $(A 3)$ is not necessary for identifiability. These models also have a singular Fisher matrix, which implies that, in irregular models violating $(A 3)$ , the non-singularity of the Fisher matrix is not a necessary condition for (local) identifiability.
One can verify that the normal squared location model in Example 3 and the normal supply-and-demand model in Example 4 both satisfy the regularity conditions in Assumption R. We have seen that in Example 3 the parameter of interest is locally identifiable while in Example 4 it is not, which means that the regularity conditions in Assumption R are not sufficient or necessary for (local) identifiability, they are just convenient. In Example 3, moreover, $θ_{o} = 0$ is not a regular point of the Fisher matrix and is locally identifiable, which implies that, for irregular points of the Fisher matrix, the non-singularity of the Fisher matrix is not a necessary condition for (local) identifiability.
In Example 5, the function $θ \mapsto ln (1 / 2) - | y - θ |$ is not differentiable when $y = θ$ , which means that the Laplace location model is an irregular model because it violates $(A 4)$ .
To illustrate a failure of $(A 1)$ and $(A 5)$ , consider the finite mixture of exponential model in Example 6 with $θ_{1} = 0$ , $θ_{2} = 1$ and $θ_{3} = 0.5$ . In this case, $E [▿ ℓ (0) ▿ ℓ {(0)}^{⊤}] = {(1 - 2)}^{2} / [2 (2 - 2)]$ , which is not finite.

We also have the following result linking the Hellinger distance to the Fisher matrix, which we are going to use to show that, in regular models with irregular points to the Fisher matrix, the non-singularity of the Fisher matrix is only a sufficient condition for local identifiability.

Lemma 4.

Let the regularity conditions in Assumption R hold and assume that

θ \mapsto ℓ (θ) : = f_{θ}^{1 / 2}

is continuously differentiable μ-a.e. Then, the Hellinger distance and the Fisher matrix are related by

▿^{2} ρ (θ_{o}) = c I (θ_{o}), w h e r e c = 1 / 4 .

Though this result is known, see, e.g., (Borovkov 1998), its implications for local identifiability have so far not been drawn.

Since the Fisher matrix is a variance-covariance matrix, one has that

I (θ)

is, under

(A 5)

, a real symmetric semi-definite positive matrix for every

θ \in Θ

, and then the following result follows from Lemma 4 and the characterization of a convex function in terms of its Hessian, see, e.g., (Rockafellar and Wets 1998, Theorem 2.14).

Proposition 4.

Let the regularity conditions in Assumption R hold and assume that

θ \mapsto f_{θ}^{1 / 2}

is continuously differentiable μ-a.e. Then,

θ \mapsto ρ (θ)

is a locally convex function around

θ_{o}

. Furthermore, if

I (θ_{o})

is non-singular, then

θ \mapsto ρ (θ)

is a locally strict convex function around

θ_{o}

and

θ_{o}

is locally identifiable.

Two remarks are in order. First, notice that, unlike Lemma 3, the result in Proposition 4 also applies when

θ_{o}

is not a regular point of the Fisher matrix and the non-singularity of the Fisher matrix becomes only sufficient for local identifiability. Second, if

I (θ_{o})

is singular, the function

ρ : Θ_{★} \to [0, 1]

is still locally convex (because

I (θ_{o})

is positive semi-definite) and

arg {min}_{θ \in Θ_{★}} ρ (θ)

is a convex, but not necessarily bounded, set, which is a result that can be used to delineate the set of observational equivalent values of

θ_{o}

. This note does not pursue this interesting direction.

Table 1 summarizes the information in this note about the necessity and sufficiency of the non-singularity of the Fisher matrix for local identifiability.

We conclude this section by mentioning that, in response to the misbehavior of the Fisher matrix when informing about the difficulty to estimate parameters of interest in parametric models, alternative notions of information, other than the Fisher matrix, have been proposed in the literature (see, e.g., Donoho and Liu 1987). Without further elaboration, these alternative notions of information are not directly applicable to construct new criteria of identifiability. In particular, the geometric information based on the modulus of continuity of

θ_{o} \mapsto arg {min}_{θ} ρ (θ)

with respect to the Hellinger distance, introduced by Donoho and Liu (1987) to geometrize convergence rates, cannot be used to construct a criterion for local identifiability because this modulus of continuity, in its current format, is not defined for parameters that are not locally identifiable.4

4. The Kullback–Liebler Divergence and Other Divergences

Some of the examples where we have had success in using the Hellinger distance to analyze identifiability share the same structure: the Hellinger distance is a locally convex function, see Figure 2, and so the results from convex optimization become available. If the Hellinger distance proves to be difficult to analyze, one can set out a criterion for identifiability based on another divergence function, such as the reversed Kullback–Liebler divergence (see, e.g., Bowden 1973)

κ (θ) = - H (θ), where H (θ) : = \int ln (\frac{f_{θ}}{f_{θ_{o}}}) f_{θ_{o}} d μ .

One can unify the identification criteria based on the Hellinger distance and the reversed Kullback–Liebler divergence by using the family of

φ

-divergences defined as

δ_{φ} (θ) = \int φ (\frac{f_{θ}}{f_{θ_{o}}}) d μ,

where

f_{θ} / f_{θ_{o}}

is the likelihood ratio and

φ : R \to [0, + \infty]

is a proper closed convex function with

φ (1) = 0

and such that

x \mapsto φ (x)

is strictly convex on a neighborhood of

x = 1

. The squared Hellinger distance corresponds to the member of this family with

φ (x) = \frac{1}{2} {(1 - \sqrt{x})}^{2}

, whereas the reversed Kullback–Liebler divergence corresponds to

φ (x) = - ln x + x - 1

. The following result is an immediate consequence of the property that

δ_{φ}

is non-negative and it is equal to zero if and only if

f_{θ} = f_{θ_{o}}

(see, e.g., Pardo 2005, Proposition 1.1)).

Proposition 5.

The parameter

θ_{o} \in Θ

is locally identifiable in the model

F_{Θ}

if and only if there exists an open set

Θ_{★} ∋ θ_{o}

such that

θ_{o} = arg min_{θ \in Θ_{★}} δ_{φ} (θ) .

This result, which is a generalization of Proposition 2, shows that the choice of a

φ

-divergence for analyzing the identifiability of a parameter of interest only hinges on the difficulty to characterize the set

arg {min}_{θ} δ_{φ} (θ)

for a given

φ

-divergence. The choice of the Hellinger distance over the reversed Kullback–Liebler divergence is, however, not inconsequential when choosing

φ

-divergence to construct an estimator for the parameter of interest. The use of the Hellinger distance may lead to an estimator that is more robust than the maximum likelihood estimator and equally efficient, see, e.g., Beran (1977) and Jimenez and Shao (2002).5

We conclude this section with the following result showing that, for the regular models analyzed by Rothenberg (1971), the Hellinger distance and the reversed Kullback–Liebler divergence are both locally convex around a minimizer.

Lemma 5.

Let the regularity conditions in Assumption R hold and assume that

θ \mapsto f_{θ}^{1 / 2}

is continuously differentiable μ-a.e. Let us assume, furthermore, that, in a neighborhood of

θ_{o}

,

f_{θ}

and

ln f_{θ}

are twice differentiable in θ, with derivatives continuous in

y \in s u p p (f_{θ})

. Then, the Hellinger distance and the Kullback–Liebler divergence are related by

▿^{2} ρ (θ_{o}) = c ▿^{2} κ (θ_{o}) f o r c = 1 / 4 .

Supplementary Materials

The following are available at https://www.mdpi.com/article/10.3390/econometrics10010010/s1, Auxiliary calculations in Examples 1–5.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

I would like to thank Sami Stouli and Vincent Han for offering constructive suggestions on previous versions of this paper. All errors are mine.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proof

Proof of Lemma 1.

Write

ρ (θ) = \frac{1}{2} {∥ f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2} ∥}_{L^{2} (μ)} = \frac{1}{2} \int {(f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2})}^{2} d = 1 - \int f_{θ}^{1 / 2} f_{θ_{o}}^{1 / 2} d μ .

Hence,

ρ (θ) = 0

if and only if

f_{θ} = f_{θ_{o}}

and

ρ (θ) = 1

if and only if

f_{θ} f_{θ_{o}} = 0

. To show that

ρ (θ)

does not depend on the choice of the dominating measure

μ

, let

g_{θ}

and

g_{θ_{o}}

denote the densities of

P_{θ}

and

P_{θ_{o}}

relative to another dominating measure

ν

. Let h and k denote the densities of

μ

,

ν

relative to

μ + ν

. The density of

P_{θ}

relative to

μ + ν

is

f_{θ} h

and also

g_{θ} k

. Thus,

f_{θ} h = g_{θ} k

and also

f_{θ_{o}} h = g_{θ_{o}} k

. Hence,

{(f_{θ} f_{θ_{o}})}^{1 / 2} h = {(g_{θ} g_{θ_{o}})}^{1 / 2} k

and

\int {(g_{θ} g_{θ_{o}})}^{1 / 2} d ν = \int {(g_{θ} g_{θ_{o}})}^{1 / 2} k d (ν + μ) = \int {(g_{θ} g_{θ_{o}})}^{1 / 2} h d (ν + μ) = \int {(f_{θ} f_{θ_{o}})}^{1 / 2} d μ

which completes the proof. □

Proof of Lemma 2.

In the text. □

Proof of Lemma 3.

We replicate the proof by Rothenberg (1971) Theorem 1. By the mean value theorem, there is

θ_{★}

between

θ

and

θ_{o}

such that

ℓ_{θ} - ℓ_{θ_{o}} = ▿ ℓ_{θ} {(θ_{★})}^{⊤} (θ - θ_{o}) .

Assume that

θ_{o}

is not locally identifiable. Then, there is a sequence

{θ_{j}}_{j}

converging to

θ_{o}

such that

ℓ_{θ_{j}} = ℓ_{θ_{o}}

. This implies

▿ ℓ_{θ} {(θ_{★})}^{⊤} q_{j} = 0

, where

q_{j} = (θ_{j} - θ_{o}) / ∥ θ_{j} - θ_{o} ∥

. The sequence

{q_{j}}_{j}

belongs to the unit sphere and therefore is convergent to a limit

q_{o}

. As

θ_{j}

approaches

θ_{o}

,

q_{j}

approaches

q_{o}

and in the limit

q_{o}^{⊤} ▿ ℓ_{θ} (θ_{o})

. However, this implies that

q^{⊤} I_{θ_{o}} q = q^{⊤} E [▿ ℓ_{θ} (θ_{o}) ▿ ℓ_{θ} {(θ_{o})}^{⊤}] q = 0,

and, hence,

I_{θ_{o}}

must be singular.

To show the converse, suppose that

I_{θ}

has constant rank

r < K

in a neighborhood of

θ_{o}

. Consider then the eigenvector

v_{θ}

associated to one of the zero eigenvalues of

I_{θ}

. Since

0 = v_{θ}^{⊤} I_{θ} v_{θ}

, we have for all

θ

near

θ_{o}

v_{θ}^{⊤} ▿ ℓ_{θ} = 0 .

Since

I_{θ}

is continuous and has constant rank, the function

θ \mapsto v_{θ}

is continuous in a neighborhood of

θ_{o}

. Consider now the curve

γ : [0, t_{★}] \to R^{K}

defined by the function

θ (t)

, which solves the differential equation

\frac{\partial θ (t)}{\partial t} = v_{θ} with θ (0) = θ_{o} for 0 \leq t \leq t^{★} .

The log density function is differentiable in t with

\frac{\partial ℓ_{θ (t)}}{\partial t} = v_{θ (t)}^{⊤} ▿ ℓ_{θ} (θ (t)) .

However, by the preceding display this is zero for all

0 \leq t \leq t^{★}

. Thus

θ \mapsto ℓ_{θ}

is constant on the curve

γ

and

θ_{o}

is not locally identifiable. □

Proof of Lemma 4.

Assume first that

θ

is a scalar, i.e.,

K = 1

. Re-write

ρ (θ) : = \frac{1}{2} ∥ f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2} ∥_{L_{2} (μ)}^{2} = \frac{1}{2} [\int {(f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2})}^{2} d μ] = 1 - \int f_{θ}^{1 / 2} f_{θ_{o}}^{1 / 2} d μ .

Differentiating

θ \mapsto ρ (θ)

, one has that

▿ ρ (θ) = - \frac{1}{2} \int f_{θ_{o}}^{1 / 2} f_{θ}^{- 1 / 2} ▿ f_{θ} (θ) d μ = \frac{1}{2} \int \frac{(f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2}) ▿ f_{θ} (θ)}{f_{θ}^{1 / 2}} d μ,

where Assumptions (A3) and (A4) allow us to pass the derivative under the integral sign. Since

θ \mapsto ρ (θ)

reaches a minimum at

θ_{o}

, one has

▿ ρ (θ_{o}) = 0

and so

\frac{▿ ρ (θ) - ▿ ρ (θ_{o})}{(θ - θ_{o})} = \frac{1}{2} \int \frac{(f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2}) ▿ f_{θ} (θ)}{(θ - θ_{o}) f_{θ}^{1 / 2}} d μ,

which, by the Lebesgue dominated convergence theorem, satisfies

▿^{2} ρ (θ_{o}) : = lim_{θ \to θ_{o}} \frac{▿ ρ (θ) - ▿ ρ (θ_{o})}{θ - θ_{o}} = \frac{1}{4} I (θ_{o}),

because the integrand converges point-wise

\begin{matrix} \frac{(f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2}) ▿ f_{θ} (θ)}{(θ - θ_{o}) f_{θ}^{1 / 2}} \to \frac{▿ f_{θ} (θ_{o}) ▿ f_{θ} {(θ_{o})}^{⊤}}{2 f_{θ_{o}}} & = \frac{f_{θ_{o}} ▿ ln f_{θ} (θ_{o}) ▿ ln f_{θ} {(θ_{o})}^{⊤} f_{θ_{o}}}{2 f_{θ_{o}}} \\ = \frac{1}{2} ▿ ln f_{θ} (θ_{o}) ▿ ln f_{θ} {(θ_{o})}^{⊤} f_{θ_{o}}, \end{matrix}

and it is dominated by a sum of, under (A5), integrable functions

| \frac{(f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2}) ▿ f_{θ} (θ)}{(θ - θ_{o}) f_{θ}^{1 / 2}} | \leq \frac{{(f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2})}^{2}}{{(θ - θ_{o})}^{2}} + \frac{▿ f_{θ} (θ) ▿ f_{θ} {(θ)}^{⊤}}{f_{θ}} .

To extend this proof to the case when

θ

is a vector, one applies the argument above element-wise to the components of

▿^{2} ρ (θ_{o})

. □

Proof of Lemma 5.

If

▿^{2} H (θ) = - I (θ_{o})

, the claim then follows from Proposition 1 after replacing

c I (θ_{o}) = ▿^{2} ρ (θ_{o})

. To verify that

▿^{2} H (θ_{o}) = - I (θ_{o})

, we follow (Bowden 1973, Section 2). Recall that

▿ ℓ_{θ} = ▿ ln f_{θ} = \frac{▿ f_{θ}}{f_{θ}}

and, since we have assumed that

\int f_{θ} d μ = 1

for any

θ \in Θ

, one has that

\int ▿ f_{θ} (θ) d μ = \underset{K \times K}{0} and \int ▿^{2} f_{θ} (θ) d μ = \underset{K \times K}{0} for any θ \in Θ .

Differentiating

θ \mapsto H (θ)

, one obtains

▿ H (θ) = \int \frac{f_{θ_{o}}}{f_{θ}} \frac{▿ f_{θ} (θ)}{f_{θ_{o}}} f_{θ_{o}} d μ,

and differentiating again

▿^{2} H (θ) = \int \frac{\partial}{\partial θ^{⊤}} [\frac{▿ f_{θ} (θ)}{f_{θ}}] f_{θ_{o}} d μ = \int [\frac{▿^{2} f_{θ} (θ)}{f_{θ}} - ▿ ln f_{θ} (θ) ▿ ln f_{θ} {(θ)}^{⊤}] f_{θ_{o}} d μ .

Evaluating at

θ = θ_{o}

, and using

\int ▿^{2} f_{θ} (θ_{o}) d μ = 0

, one obtains

▿^{2} H (θ_{o}) = - I (θ_{o})

. □

Proof of Proposition 1.

The sufficiency has already been established by Beran (1977), Theorem 1(iii) and it is an immediate consequence of the definition of identifiability. The necessity is in the text and it follows immediately from Lemmas 1 and 2. □

Proof of Proposition 2.

It is immediate from Proposition 1 and the definition of local identifiability (Definition 2). □

Proof of Proposition 3.

This result follows from the uniqueness of a solution for strictly convex problems (see, e.g., Rockafellar and Wets 1998, Theorem 2.6) after noticing, from Lemma 1, that

θ \mapsto ρ (θ)

is bounded, and hence a proper function. □

Proof of Proposition 4.

The proof for the claim that

θ \mapsto ρ (θ)

is a locally convex function around

θ_{o}

is in the text. It only remains to show that, if the Fisher matrix is non-singular, then

θ_{o}

is locally identifiable. When the Fisher matrix is non-singular, by Lemma 4 and the characterization of convex functions in (Rockafellar and Wets 1998, Theorem 2.14), the Hellinger distance is a strictly locally convex function. The claim then follows Proposition 3. □

Proof of Proposition 5.

In the text. □

Appendix B. Variational Representation and Estimation

It is well-known, see, e.g., Beran (1977), that the estimator based on minimizing the Hellinger distance between the density postulated by the model for the observed variables and a kernel nonparametric estimator of the density of these variables can be more robust (to

ρ

-perturbations of density of the observed variables) than the maximum likelihood estimator and still asymptotically efficient in regular models. This minimum Hellinger distance-to-kernel estimator requires smoothing, which becomes an inconvenient requirement in models with observable variables with mixed support, such as the normal regression model with non-response in the dependent variable, or with support depending on the unknown parameter, such as the parametric auction model in Example 2, or in models with high-dimensional observable variables, due to the curse of dimensionality. This Appendix derives the variational representation of the Hellinger distance. This variational representation serves to construct the minimum dual Hellinger distance estimator, which unlike the minimum Hellinger distance-to-kernel estimator, does not require the use of a smooth estimator of the density of the observable variables.

Recall first that the squared Hellinger distance is

\begin{matrix} ρ (θ) = \frac{1}{2} \int {(f_{θ}^{1 / 2} - f_{θ_{o}}^{1 / 2})}^{2} d μ . \end{matrix}

(A1)

We are going to verify that

\begin{matrix} ρ (θ) = \frac{1}{2} sup_{\tilde{θ} \in Θ} \{\int (f_{θ} - f_{θ}^{1 / 2} f_{\tilde{θ}}^{1 / 2}) d μ - \int \frac{f_{θ}}{f_{\tilde{θ}}} d P_{θ_{o}} - 1\} \end{matrix}

(A2)

The expression in the last display, unlike (A1), admits, under a bracketing number condition on the family of likelihood ratios

{y \to f_{θ} (y) / f_{\tilde{θ}} (y), θ, \tilde{θ} \in Θ}

, a consistent sample analog estimator not depending on smoothing parameters. The minimum dual Hellinger distance estimator of

θ_{o}

is the set of minimizers of the sample analog of (A2):

\hat{θ} = \underset{generator}{\underset{︸}{arg min_{θ \in Θ}}} \underset{discriminator}{\underset{︸}{sup_{\tilde{θ} \in Θ}}} \{\int (f_{θ} - f_{θ}^{1 / 2} f_{\tilde{θ}}^{1 / 2}) d μ - N^{- 1} \sum_{i = 1}^{N} \frac{f_{θ} (Y_{i})}{f_{\tilde{θ}} (Y_{i})}\} .

One could use a simulator to approximate

f_{θ}

or

f_{\tilde{θ}}

if these densities have an untractable form.6

To verify (A2), define the functions

φ (x) : = \frac{1}{2} {(1 - x^{1 / 2})}^{2} and \bar{φ} (x) : = sup_{\bar{x} \in R} \bar{x} x - φ (x)

and write the squared Hellinger distance in (A1) as

\begin{matrix} ρ (θ) = \int φ (\frac{f_{θ}}{f_{θ_{o}}}) d μ . \end{matrix}

(A3)

The function

\bar{φ}

is the convex conjugate of

φ (x)

. We first show that

\begin{matrix} ρ (θ) = sup_{\tilde{θ} \in Θ} \{\int \dot{φ} (\frac{f_{θ}}{f_{\tilde{θ}}}) d μ - \int \bar{φ} (\dot{φ} (\frac{f_{θ}}{f_{\tilde{θ}}})) f_{θ_{o}} d μ\} \end{matrix}

(A4)

where

G_{θ} = {y \mapsto \dot{φ} (f_{θ} (y) / f_{\tilde{θ}} (y)), \tilde{θ} \in Θ}

and

\dot{φ} (x) = \frac{1}{2} (1 - x^{- 1 / 2})

is the derivative of

x \mapsto φ (x) = \frac{1}{2} {(1 - x^{1 / 2})}^{2}

. For all

x \in (0, + \infty)

, one has that

\ddot{φ} (x) = {\frac{1 / 2}{x}}^{- 3 / 2} > 0

and then

x \mapsto φ (x) = \frac{1}{2} {(1 - x^{1 / 2})}^{2}

is strictly convex on

(0, + \infty)

. By strict convexity, for any

x, \tilde{x} \in (0, + \infty)

, it holds that

φ (x) \geq φ (x) + \dot{φ} (\tilde{x}) (x - \tilde{x})

with equality if and only if

x = \tilde{x}

. Fix two values

θ, \tilde{θ}

in the parameter space and set

x = \frac{d P_{θ}}{d P_{θ_{o}}} and \tilde{x} = \frac{d P_{θ}}{d P_{\tilde{θ}}}

Inserting these values in the last inequality and integrating with respect to

P_{θ_{o}}

yields

ρ (θ) \geq \int [\dot{φ} (\frac{d P_{θ}}{d P_{\tilde{θ}}}) d P_{θ} - \tilde{φ} (\frac{d P_{θ}}{d P_{\tilde{θ}}})] d P_{θ_{o}},

where

\tilde{φ} (x) : = \dot{φ} (x) x - ϕ (x)

, which in turn implies

ρ (θ) \geq \int \dot{φ} (\frac{d P_{θ}}{d P_{\tilde{θ}}}) d P_{θ} - \int \tilde{φ} (\frac{d P_{θ}}{d P_{\tilde{θ}}}) d P_{θ_{o}} .

When

\tilde{θ} = θ_{o}

, this inequality turns to equality, which yields (A4) after noticing that

\dot{φ} (x) x - φ (x) = \bar{φ} (\dot{φ} (x))

.

To conclude the verification, since

x \mapsto φ (x) = \frac{1}{2} {(1 - x^{1 / 2})}^{2}

is differentiable for all

x \in (0, + \infty)

, one has

\begin{matrix} \bar{φ} (\dot{φ} (x)) & = \dot{φ} (x) x - φ (x) \\ = \frac{1}{2} (1 - x^{- 1 / 2}) x - \frac{1}{2} {(1 - x^{1 / 2})}^{2} = \frac{1}{2} [(x - x^{1 / 2}) - {(1 - x^{1 / 2})}^{2}] \\ = \frac{1}{2} (x^{1 / 2} - 1) \end{matrix}

By replacing (A5) back in (A4), one obtains (A2).

[custom]

Notes

1	We use → in ‘ $f : Y \to [0, \infty)$ ’ to declare the domain ( $Y$ ) and codomain ( $[0, \infty)$ ) of the function f and we use the arrow notation ‘↦’ to define the rule of a function inline. We use ‘ $: =$ ’ to indicate that an expression is ‘defined to be equal to’. This notation is in line with conventional usage.
2	See, e.g., Escanciano (2021) for a systematic approach to identification in semiparametric models.
3	As a referee has pointed out, necessary and sufficient conditions for (local) identification require different assumptions. Some of the conditions in R are not necessary if we only seek sufficient conditions: differentiability of the score function and non-singularity of the Fisher matrix would suffice.
4	A related modulus of continuity has been introduced by Escanciano (2021, Online Supplementary Materials, Lemma 1.3) to provide sufficient conditions for (local) identification in semiparametric models. The analysis of these models is out of the scope of this paper.
5	Appendix B elaborates more on this point by using the variational representation of the Hellinger distance to construct a minimum distance estimator which does not require a non-parametric estimator of the density of the data.
6	One could also replace the space $Θ$ in the discriminator model by a family of compositional functions—as in neural network models—to gain, if needed, flexibility when fitting $f_{\tilde{θ}}$ by introducing, again, tuning parameters.

References

Beran, R. 1977. Minimum Hellinger Distance Estimates for Parametric Models. The Annals of Statistics 5: 445–63. [Google Scholar] [CrossRef]
Borovkov, A. 1998. Mathematical Statistics. Amsterdam: Gordon and Breach. [Google Scholar]
Bowden, R. 1973. The Theory of Parametric Identification. Econometrica 41: 1069. [Google Scholar] [CrossRef]
Donoho, D., and R. Liu. 1987. Geometrizing Rates of Convergence, I. Technical Report 137. Berkeley: University of California. [Google Scholar]
Escanciano, J. C. 2021. Semiparametric Identification and Fisher Information. Econometric Theory, 1–38. [Google Scholar]
Hallin, M., and C. Ley. 2012. Skew-Symmetric Distributions and Fisher Information—A Tale of Two Densities. Bernoulli 18: 747–63. [Google Scholar] [CrossRef]
Han, S., and A. McCloskey. 2019. Estimation and Inference with a (Nearly) Singular Jacobian. Quantitative Economics 10: 1019–68. [Google Scholar] [CrossRef]
Hinkley, D. 1973. Two-Sample Tests with Unordered Pairs. Journal of the Royal Statistical Association (Series B: Methodological) 36: 2466–80. [Google Scholar] [CrossRef]
Jimenez, R., and Y. Shao. 2002. On Robustness and Efficiency of Minimum Divergence Estimators. Test 10: 241–48. [Google Scholar] [CrossRef]
Lee, L., and A. Chesher. 1986. Specification Testing when Score Test Statistics are Identically Zero. Journal of Econometrics 31: 121–49. [Google Scholar] [CrossRef]
Lewbel, A. 2019. The Identification Zoo: Meanings of Identification in Econometrics. Journal of Economic Literature 57: 835–903. [Google Scholar] [CrossRef]
Li, P., J. Chen, and P. Marriot. 2009. Non-Finite Fisher Information and Homogeneity: And EM Approach. Biometrika 96: 411–26. [Google Scholar] [CrossRef]
Paarsch, H. 1992. Deciding between the Common and Private Value Paradigms in Empirical Models of Auctions. Journal of Econometrics 51: 191–215. [Google Scholar] [CrossRef]
Pardo, L. 2005. Statistical Inference Based on Divergence Measures. New York: Chapman & Hall/CRC Press. [Google Scholar]
Paulino, C., and C. Pereira. 1994. On Identifiability of Parametric Statistical Models. Journal Italian Statistical Society 3: 125–51. [Google Scholar] [CrossRef]
Reiersol, O. 1950. The Identifiability of a Linear Reation between Variables which are Subject to Error. Econometrica 18: 375–89. [Google Scholar] [CrossRef]
Rockafellar, T., and R. Wets. 1998. Variational Analysis. Berlin: Springer. [Google Scholar]
Rothenberg, R. 1971. Identification in Parametric Models. Econometrica 39: 577–91. [Google Scholar] [CrossRef]
Sargan, J. 1983. Identification and Lack of Identification. Econometrica 51: 1605–33. [Google Scholar] [CrossRef]

Figure 1. The Hellinger distance in Examples 1 and 2. (a) Example 1 (

θ_{o} = 4

); (b) Example 2 (

θ_{o} = 4

,

m = 5

).

Figure 1. The Hellinger distance in Examples 1 and 2. (a) Example 1 (

θ_{o} = 4

); (b) Example 2 (

θ_{o} = 4

,

m = 5

).

Figure 2. The Hellinger distance in Example 3. (a) Example 3 (

θ_{o} = 0

); (b) Example 3 (

θ_{o} = 1

).

Figure 2. The Hellinger distance in Example 3. (a) Example 3 (

θ_{o} = 0

); (b) Example 3 (

θ_{o} = 1

).

Table 1. For local identifiability, the non-singularity of the Fisher Matrix is ....

	Regular Points	Irregular Points
Regular Models	necessary and sufficient (Lemma 3)	only sufficient (Proposition 4 and Example 3)
Irregular Models	not necessary (Examples 1, 2, and 5)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pacini, D. Identification in Parametric Models: The Minimum Hellinger Distance Criterion. Econometrics 2022, 10, 10. https://doi.org/10.3390/econometrics10010010

AMA Style

Pacini D. Identification in Parametric Models: The Minimum Hellinger Distance Criterion. Econometrics. 2022; 10(1):10. https://doi.org/10.3390/econometrics10010010

Chicago/Turabian Style

Pacini, David. 2022. "Identification in Parametric Models: The Minimum Hellinger Distance Criterion" Econometrics 10, no. 1: 10. https://doi.org/10.3390/econometrics10010010

APA Style

Pacini, D. (2022). Identification in Parametric Models: The Minimum Hellinger Distance Criterion. Econometrics, 10(1), 10. https://doi.org/10.3390/econometrics10010010

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Identification in Parametric Models: The Minimum Hellinger Distance Criterion

Abstract

1. Introduction

2. The Minimum Hellinger Distance Criterion

3. The Fisher Matrix Criterion

4. The Kullback–Liebler Divergence and Other Divergences

Supplementary Materials

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proof

Appendix B. Variational Representation and Estimation

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI