Next Article in Journal
k-Nearest Neighbor Learning with Graph Neural Networks
Next Article in Special Issue
On Similarity Measures for Stochastic and Statistical Modeling
Previous Article in Journal
Colorings of (r, r)-Uniform, Complete, Circular, Mixed Hypergraphs
Previous Article in Special Issue
On the Computation of Some Interval Reliability Indicators for Semi-Markov Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Robust Version of the Empirical Likelihood Estimator

1
Laboratoire de Mathématiques de Reims, UMR9008 CNRS et Université de Reims Champagne-Ardenne, UFR SEN, Moulin de la Housse, B.P. 1039, 51687 Reims, France
2
Department of Applied Mathematics, Bucharest University of Economic Studies, Piaţa Romană no. 6, 010374 Bucharest, Romania
3
“Gheorghe Mihoc-Caius Iacob” Institute of Mathematical Statistics and Applied Mathematics of the Romanian Academy, Calea 13 Septembrie no. 13, 050711 Bucharest, Romania
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2021, 9(8), 829; https://doi.org/10.3390/math9080829
Submission received: 1 March 2021 / Revised: 3 April 2021 / Accepted: 7 April 2021 / Published: 10 April 2021
(This article belongs to the Special Issue Stochastic Models and Methods with Applications)

Abstract

:
In this paper, we introduce a robust version of the empirical likelihood estimator for semiparametric moment condition models. This estimator is obtained by minimizing the modified Kullback–Leibler divergence, in its dual form, using truncated orthogonality functions. We prove the robustness and the consistency of the new estimator. The performance of the robust empirical likelihood estimator is illustrated through examples based on Monte Carlo simulations.

1. Introduction

A moment condition model is a family M ( 1 ) of probability measures (p.m.), all defined on the same measurable space ( R m , B ( R m ) ) , such that (s.t.)
R m g ( x , θ ) d Q ( x ) = 0 for all Q M ( 1 ) ,
where B ( R m ) is the Borel σ -field. The parameter of interest θ belongs to a compact set Θ R d and the function g : = ( g 1 , , g ) , with d , is defined on R m × Θ , each component g i being real valued function. Denote by M ( 1 ) the set of all probability measures on ( R m , B ( R m ) ) , and for each θ Θ define
M θ ( 1 ) : = Q M ( 1 ) s . t . R m g ( x , θ ) d Q ( x ) = 0 ,
so that
M ( 1 ) = θ Θ M θ ( 1 ) .
Let X 1 , , X n be an i.i.d. sample with unknown p.m. P 0 . We assume that the equation g ( x , θ ) d P 0 ( x ) = 0 has a unique solution (in θ ), which will be denoted θ 0 . We consider the estimation problem of θ 0 from the data X 1 , , X n . The traditional way of estimating the parameter θ 0 is given by the Generalized Method of Moments (GMM) [1]. The GMM estimators are consistent and have asymptotically normal distribution. Despite their desirable asymptotic properties, the finite sample performance of the GMM estimators is not satisfactory. Some other alternative methods have been proposed in the literature. The Continuous Updating (CU) estimator [2], the Empirical Likelihood (EL) estimator [3,4,5], and the Exponential Tilting (ET) [6], are three of the most known examples. Imbens [7] showed that EL and ET estimators are characterized by lower bias than GMM in nonlinear models. Newey and Smith [8] studied the theoretical properties of EL, ET, and CU estimators, by including them into the Generalized Empirical Likelihood (GEL) family of estimators, and showed that all GEL estimators are characterized by lower asymptotic bias than GMM. The information and entropy econometric (IEE) techniques have been proposed in order to improve the finite sample performance of the GMM estimators and tests [4,6]. Ronchetti and Trojani [9] and Lo and Ronchetti [10] have proposed robust alternatives for GMM estimators and tests, respectively, for the IEE techniques, keeping also the finite sample accuracy. Felipe et al. [11] proposed empirical divergence test statistics based on exponentially tilted empirical likelihood estimator with good robustness properties. Broniatowski and Keziou [12] proposed a general approach for estimation and testing in moment condition models, which includes some of the above-mentioned methods. This approach, based on minimizing divergences with their dual forms, allows the asymptotic study of the estimators (called minimum empirical divergence estimators) and of the associated test statistics, both under the model and under mis-specification of the model. The approach based on divergences and duality was firstly considered in the case of parametric models, for example, in [13,14,15]. Applications of the minimum dual divergence estimators in model selection problems are considered in [16].
The EL paradigm enters as a special case of the general methodology from Broniatowski and Keziou [12], namely, when using the modified Kullback–Leibler divergence. Although the EL estimator is preferable to other estimators due to higher-order asymptotic properties, these properties are valid only in the case of the correct specification of the moment conditions. On the other hand, when the support of the p.m. corresponding to the model and the orthogonality functions are not bounded, the EL estimator may cease to be root n consistent (Schennach [17]) under mis-specification. It is a known fact that the EL estimator for moment condition models is not robust. This fact is also justified by the results from [18], where it is shown that the influence function of a minimum dual divergence estimator, and particularly that of the EL estimator, is linear related to the orthogonality function g corresponding to the model. Thus, the influence function of the EL estimator is bounded if and only if the function g corresponding to the underlying model is bounded. Hence, the EL estimator is usually not robust, since the function g could be unbounded on the observation. For this reason, in practice, the classical EL estimator, as well as the minimum dual divergence estimators and also GMM estimators, is unstable even under small deviations from the assumed model.
As examples in this context, we mention models for which the orthogonality functions are unbounded [9]. The autoregressive models with heteroscedastic errors [19] can be written under the form of moment condition models, but the orthogonality functions defining the orthogonality conditions are unbounded. Moreover, the nonlinear empirical asset pricing models [20] can be written under the form of moment condition models and have natural orthogonality conditions (given by the given asset pricing equations), which are given by unbounded orthogonality functions. We also recall the following classical example, which is used in the last section of the paper, in the Monte Carlo simulation study.
Example 1
([3] p. 302). Consider a random variable X with unbounded support ( R or R + for instance). Let E ( X ) = θ , and assume that E ( X 2 ) = h ( θ ) , with h ( · ) being a known function. The aim is to estimate the parameter θ using an i.i.d. sample X 1 , , X n of X. The information on the probability distribution P 0 of X can be expressed in the context of model (1), with d = 1 and = 2 , by taking g ( x , θ ) = : ( g 1 ( x , θ ) , g 2 ( x , θ ) ) = ( x θ , x 2 h ( θ ) ) . One can see that the orthogonality functions g 1 ( · , θ ) and g 2 ( · , θ ) are unbounded (with respect to x).
For such models, the lack of robustness of the EL estimator, as well as the lack of robustness of other classical estimators, represents the motivation to study some robust alternatives.
In the present paper, we propose a locally robust version of the EL estimator for moment condition models. Locally robust in the sense that the functional associated with this estimator is locally approximated by the means of the influence function, and then, the boundedness of the influence function will imply the fact that, in the neighborhood of the model, the asymptotic bias of the estimator cannot become arbitrarily large (see [21]). The new estimator is defined by minimization of an empirical version of the modified Kullback–Leibler divergence in dual form, using truncated orthogonality functions. This leads to a robust EL estimate. Moreover, we prove the consistency of this estimator, Finally, we present an example based on Monte Carlo simulations illustrating the performance of the robust EL estimator in the case of contaminated data.

2. A Robust Version of the Empirical Likelihood Estimator

2.1. Statistical Divergences

Let φ be a convex function on R onto 0 , + valued, satisfying φ ( 1 ) = 0 . Let P be some p.m. on the measurable space ( R m , B ( R m ) ) . For any signed finite measure Q, on the same measurable space ( R m , B ( R m ) ) , absolutely continuous with respect to P, the φ -divergence (sometimes we simply say divergence) between Q and P is defined by
D φ ( Q , P ) : = R m φ d Q d P ( x ) d P ( x ) ,
where d Q d P is the Radon–Nikodym derivative. When Q is not absolutely continuous with respect to P , we set D φ ( Q , P ) = + . This definition extends the one given in [22] for divergences between p.m.’s. A known class of divergences between p.m.’s is the class of Cressie–Read divergences, introduced in [23] and defined by the functions
φ γ : x R + φ γ ( x ) : = x γ γ x + γ 1 γ ( γ 1 ) ,
for γ R \ { 0 , 1 } , φ 0 ( x ) : = log x + x 1 and φ 1 ( x ) : = x log x x + 1 . For any γ R , if φ γ ( 0 ) is not defined, we set φ γ ( 0 ) : = lim x 0 + φ γ ( x ) , which may be finite or infinite. The Kullback–Leibler divergence ( K L ) is associated to φ 1 , the modified Kullback–Leibler ( K L m ) to φ 0 , the χ 2 divergence to φ 2 , the modified χ 2 divergence ( χ m 2 ) to φ 1 and the Hellinger (H) distance to φ 1 / 2 . The φ -divergence between some set Ω of probability measures and a probability measure P is defined by
D φ ( Ω , P ) : = inf Q Ω D φ ( Q , P ) .

2.2. Definition of the Estimator

We consider a reference identifiable model P θ ; θ Θ of probability measures such that, for each θ Θ , P θ M θ , which means that R m g ( x , θ ) d P θ ( x ) = 0 , and assume that θ is the unique solution of the equation. We assume that the p.m. P 0 of the data corresponding to the true unknown value θ 0 of the parameter to be estimated belongs to this reference model. The reference model will be associated with the truncated orthogonality function that will be used to define the robust version of the EL estimator of the parameter θ 0 . We will use the notation · for the Euclidean norm. Similarly as in [9], using the reference model P θ ; θ Θ , we define the function g c : R m × Θ R ,
g c ( x , θ ) : = H c A θ g ( x , θ ) τ θ ,
where H c : R R is the Huber’s function
H c ( y ) : = y · min 1 , c y if y 0 , 0 if y = 0 ,
and A θ , τ θ are, respectively, × -matrix and -vector, defined as the solutions of the system of implicit equations
g c ( x , θ ) d P θ ( x ) = 0 , g c ( x , θ ) g c ( x , θ ) d P 0 ( x ) = I ,
where I is the × identity matrix and c > 0 is a given positive constant. Therefore, we have g c ( x , θ ) c , for all x and θ . We also use the function
h c ( x , θ , A , τ ) : = H c A [ g ( x , θ ) τ ] ,
when needed to work with the dependence on the matrix A and on the vector τ . Therefore,
g c ( x , θ ) = h c ( x , θ , A θ , τ θ ) .
For given P θ from the reference model, the triplet ( θ , A θ , τ θ ) is the unique solution of the system
g ( x , θ ) d P θ ( x ) = 0 , g c ( x , θ ) d P θ ( x ) = 0 , g c ( x , θ ) g c ( x , θ ) d P 0 ( x ) = I
see [9], p. 48.
Consider the estimating problem of the triplet ( θ 0 , A θ 0 , τ θ 0 ) on the basis of a sample X 1 , , X n P 0 , P 0 M θ 0 . For each θ Θ , using the p.m. P θ from the reference model, we define A ^ θ and τ ^ θ , solutions of the system
h c ( x , θ , A ^ θ , τ ^ θ ) d P θ ( x ) = 0 , h c ( x , θ , A ^ θ , τ ^ θ ) h c ( x , θ , A ^ θ , τ ^ θ ) d P n ( x ) = I ,
where P n ( · ) is the empirical measure associated with the sample,
P n ( · ) : = 1 n i = 1 n δ X i ( · ) ,
with δ x ( · ) being the Dirac measure at the point x, for any x. We denote
g n c ( x , θ ) : = h c ( x , θ , A ^ θ , τ ^ θ ) = H c A ^ θ [ g ( x , θ ) τ ^ θ ] .
Note that g n c ( x , θ ) depends on both the data and the reference probability P θ . We now consider the moment condition model associated to the function g n c ( x , θ ) , namely,
M c , n : = θ Θ M θ c , n ,
where
M θ c , n : = Q M ( 1 ) s . t . g n c ( x , θ ) d Q ( x ) = 0 , θ Θ .
The p.m. P 0 belongs to M θ 0 c , n .
In what follows, we consider the modified Kullback–Leibler divergence, which corresponds to the strictly convex function φ ( x ) : = log x + x 1 , if x > 0 , respectively φ ( x ) : = + , if x 0 . The convex conjugate, called also the Fenchel–Legendre transform, of any function f : x R f ( x ) R ¯ , is the function defined by f * ( u ) : = sup x R { u x f ( x ) } , for all u R . A straightforward calculus shows that the convex conjugate of the convex function φ , denote it ψ , is given by ψ ( u ) = log ( 1 u ) if u < 1 , respectively ψ ( u ) = + , if u 1 . For given θ Θ , we define the set
Λ ¯ θ c , n ( P 0 ) : = t ¯ : = ( t 0 , t 1 , , t ) R 1 + s . t . | ψ t ¯ g ¯ n c ( x , θ ) | d P 0 ( x ) < ,
where g ¯ n c : = ( 1 R m , g n c ) . We denote Λ ¯ θ c , n : = Λ ¯ θ c , n ( P 0 ) and Λ ¯ θ c , n , n : = Λ ¯ θ c , n ( P n ) .
Since g n c ( x , θ ) is bounded (with respect to x), on the basis of Theorem 1.1 in [24] and Proposition 4.2 in [12], the following dual representation of K L m -divergence holds
K L m ( M θ c , n , P 0 ) = sup t ¯ Λ ¯ θ c , n m n c ( x , θ , t ¯ ) d P 0 ( x ) ,
where
m n c ( x , θ , t ¯ ) : = t 0 ψ ( t ¯ g ¯ n c ( x , θ ) ) = t 0 + log ( 1 t ¯ g ¯ n c ( x , θ ) ) ,
and the supremum in (16) is reached, provided that K L m ( M θ c , n , P 0 ) is finite. Note that m n c ( x , θ , t ¯ ) depends on the reference p.m. P θ , since g n c ( x , θ ) depends on P θ . W denote then t ¯ θ c , n = t ¯ θ c , n ( P 0 ) any vector such that
t ¯ θ c , n : = arg sup t ¯ Λ ¯ θ c , n m n c ( x , θ , t ¯ ) d P 0 ( x ) .
Furthermore, according to Proposition 4.2 from [12], for each θ Θ , the condition
P 0 { x R m s . t . t ¯ g ¯ n c ( x , θ ) 0 } > 0 , for all t ¯ R 1 + \ { 0 } ,
ensures that t ¯ θ c , n , defined as solution of optimization problem (18), is unique. Notice that the linear independence of the functions 1 R m , g n , 1 c ( · , θ ) , , g n , c ( · , θ ) implies condition (19) whenever P 0 is not degenerate.
Moreover, using again Proposition 4.2 and Remark 4.4 from [12], for each θ Θ , one can show that the first component of the optimal solution t ¯ θ c , n in (18) equals to zero. One can then omit the first component of the vector t ¯ in displays ((15)–(18)). Therefore, they will be replaced by
Λ θ c , n ( P 0 ) : = t : = ( t 1 , , t ) R s . t . | ψ t g n c ( x , θ ) | d P 0 ( x ) < ,
K L m ( M θ c , n , P 0 ) = sup t Λ θ c , n m n c ( x , θ , t ) d P 0 ( x ) ,
where
m n c ( x , θ , t ) : = ψ ( t g n c ( x , θ ) ) = log ( 1 t g n c ( x , θ ) )
and
t θ c , n : = arg sup t Λ θ c , n m n c ( x , θ , t ) d P 0 ( x ) .
Denote
Λ θ c , n , n : = Λ θ c , n ( P n ) = t : = ( t 1 , , t ) R s . t . 1 n i = 1 n log ( 1 j = 1 t j g n , j c ( X i , θ ) ) < .
In view of relation (23), a natural estimator of t θ c , n , is defined by
t θ c ^ : = arg sup t Λ θ c , n , n m n c ( x , θ , t ) d P n ( x ) .
Then, a “dual” plug-in estimator of the modified Kullback–Leibler divergence, between M θ c , n and P 0 , can be defined by
K L m ^ ( M θ c , n , P 0 ) : = sup t Λ θ c , n , n m n c ( x , θ , t ) d P n ( x ) = sup ( t 1 , , t ) Λ θ c , n , n log 1 j = 1 t j g n , j c ( x , θ ) d P n ( x ) = sup ( t 1 , , t ) R 1 n i = 1 n log ¯ 1 j = 1 t j g n , j c ( X i , θ ) ,
where log ¯ ( · ) is the extended logarithm function, i.e., the function defined by log ¯ ( u ) = log ( u ) if u > 0 , and log ¯ ( u ) = if u 0 , for any u R . Finally, we define the following estimator of θ 0
θ c ^ : = arg inf θ Θ sup t Λ θ c , n , n m n c ( x , θ , t ) d P n ( x ) = arg inf θ Θ sup ( t 1 , , t ) R 1 n i = 1 n log ¯ 1 j = 1 t j g n , j c ( X i , θ ) ,
which can be seen as a “robust” version of the well-known EL estimator.
Recall that the EL estimator can be written as (see, e.g., [5])
θ ^ = arg inf θ Θ sup ( t 1 , , t ) R 1 n i = 1 n log ¯ 1 j = 1 t j g j ( X i , θ ) .
For establishing asymptotic properties of the proposed estimators, we need the following additional notations. Consider the moment condition model associated with the truncated function
g c ( x , θ ) = H c ( A θ [ g ( x , θ ) τ θ ] ) ,
where A θ and τ θ are the solution to the system (7). Note that g c ( x , θ ) depends only on the reference model P θ and not on the data. This model is defined by
M c : = θ Θ M θ c ,
where
M θ c : = Q M ( 1 ) s . t . g c ( x , θ ) d Q ( x ) = 0 , θ Θ .
Let
Λ θ c ( P 0 ) : = t : = ( t 1 , , t ) R s . t . | ψ t g c ( x , θ ) | d P 0 ( x ) < .
Therefore, as above, we have the following dual representation for K L m ( M θ c , P 0 )
K L m ( M θ c , P 0 ) = sup t Λ θ c m c ( x , θ , t ) d P 0 ( x ) ,
where
m c ( x , θ , t ) : = ψ ( t g c ( x , θ ) ) = log ( 1 t g c ( x , θ ) ) ,
and the supremum in (31) is reached, provided that K L m ( M θ c , P 0 ) is finite. Moreover, the supremum in (31) is unique under the following assumption
P 0 { x R m s . t . t ¯ g ¯ c ( x , θ ) 0 } > 0 , for all t ¯ R 1 + \ { 0 } ,
which is satisfied if the functions 1 R m , g 1 c ( · , θ ) , , g c ( · , θ ) are linearly independent and P 0 is not degenerate. We denote then
t θ c : = arg sup t Λ θ c m c ( x , θ , t ) d P 0 ( x ) .
Finally, we have
θ 0 = arg inf θ Θ K L m ( M θ c , P 0 ) = arg inf θ Θ sup t Λ θ c m c ( x , θ , t ) d P 0 ( x ) = arg inf θ Θ m c ( x , θ , t θ c ) d P 0 ( x ) .
We also use the function
n c ( x , θ , t , A , τ ) : = log ( 1 t h c ( x , θ , A , τ ) ) ,
when needed to work with the dependence on matrix A and on vector τ . We have, then, with the above notation, m c ( x , θ , t ) = n c ( x , θ , t , A θ , τ θ ) , where A θ and τ θ are the solution of the system of Equation (7).

2.3. Robustness Property

In order to prove the robustness of the estimator θ c ^ , we use the following well-known tools from the theory of robust statistics; see, e.g., [21]. A functional T, defined on a set of probability measures and parameter space valued, is called a statistical functional associated with an estimator θ ^ of the parameter θ from the model P θ , if θ ^ = T ( P n ) . The influence function of T at P θ is defined by
IF ( x ; T , P θ ) : = T ( P ˜ ε x ) ε ε = 0 ,
where P ˜ ε x : = ( 1 ε ) P θ + ε δ x . A natural robustness requirement on the statistical functional corresponding to the estimator is the boundedness of its influence function.
The statistical functional corresponding to the estimator θ c ^ given by (26), is defined by
T c ( P ) : = arg inf θ Θ sup t Λ θ c ( P ) m c ( y , θ , t , P ) d P ( y ) ,
where
Λ θ c ( P ) : = t R s . t . | ψ t g c ( x , θ , P ) | d P ( x ) < ,
g c ( x , θ , P ) : = h c ( x , θ , A θ ( P ) , τ θ ( P ) ) , with A θ ( P ) and τ θ ( P ) solutions of the system
h c ( x , θ , A θ ( P ) , τ θ ( P ) ) d P θ ( x ) = 0 h c ( x , θ , A θ ( P ) , τ θ ( P ) ) h c ( x , θ , A θ ( P ) , τ θ ( P ) ) d P ( x ) = I
and
m c ( x , θ , t , P ) : = ψ t g c ( x , θ , P ) = log 1 t g c ( x , θ , P ) .
Note that, for a given θ , the function g c ( x , θ , P ) , as well as m c ( x , θ , t , P ) , both depend on the p.m. P θ . In addition, note that g c ( x , θ , P θ ) coincides with g c ( x , θ ) defined in the preceding section. We denote
t θ c ( P ) : = arg sup t Λ θ c ( P ) m c ( y , θ , t , P ) d P ( y ) .
Then
T c ( P ) = arg inf θ Θ m c ( y , θ , t θ c ( P ) , P ) d P ( y ) .
Proposition 1.
The influence function of the estimator θ c ^ is given by
IF ( x ; T c , P 0 ) = θ g c ( y , θ 0 ) d P 0 ( y ) θ g c ( y , θ 0 ) d P 0 ( y ) 1 · · θ g c ( y , θ 0 ) d P 0 ( y ) g c ( x , θ 0 ) .
Proof. 
Using the definitions of T c ( P ) and t θ c ( P ) , we have
t m c ( y , θ , t θ c ( P ) , P ) d P ( y ) = 0
t m c ( y , T c ( P ) , t T c ( P ) c ( P ) , P ) d P ( y ) = 0
θ [ m c ( y , θ , t θ c ( P ) , P ) ] θ = T c ( P ) d P ( y ) = 0 .
Using (46), since
m c ( y , θ , t θ c ( P ) , P ) = log 1 t θ c ( P ) h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ,
T c ( P ) is solution of the equation
[ θ t θ c ( P ) ] t m c ( y , θ , t θ c ( P ) , P ) d P ( y ) [ θ h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ] t θ c ( P ) 1 t θ c ( P ) h c ( y , θ , A θ ( P ) , τ θ ( P ) ) d P ( y ) = 0 .
Since the first integral in the above display is 0, according to (44), the above equation simplifies to
[ θ h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ] t θ c ( P ) 1 t θ c ( P ) h c ( y , θ , A θ ( P ) , τ θ ( P ) ) d P ( y ) = 0 .
By replacing P with the contaminated model P ˜ ε x : = ( 1 ε ) P 0 + ε δ x in Equation (49) and then derivating with respect to ε , the resulting equation, we obtain
θ h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) d P 0 ( y ) · ε t T c ( P ˜ ε x ) c ( P ˜ ε x ) ε = 0 = 0 .
On the other hand,
ε t T c ( P ˜ ε x ) c ( P ˜ ε x ) ε = 0 = θ t θ 0 c ( P 0 ) · IF ( x ; T c , P 0 ) + IF ( x ; t θ 0 c , P 0 ) .
Combining (50) and (51), we get
θ h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) d P 0 ( y ) · θ t θ 0 c ( P 0 ) · IF ( x ; T c , P 0 ) + IF ( x ; t θ 0 c , P 0 ) = 0 .
Differentiating with respect to θ of (44) leads to
θ t θ 0 c ( P 0 ) = 2 2 t m c ( y , θ 0 , t θ 0 c ( P 0 ) , P 0 ) d P 0 ( y ) 1 θ t m c ( y , θ 0 , t θ 0 c ( P 0 ) , P 0 ) d P 0 ( y ) .
Some simple calculations yield
t m c ( y , θ , t , P ) = h c ( y , θ , A θ ( P ) , τ θ ( P ) ) 1 t h c ( y , θ , A θ ( P ) , τ θ ( P ) )
2 2 t m c ( y , θ , t , P ) = h c ( y , θ , A θ ( P ) , τ θ ( P ) ) h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ( 1 t h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ) 2
and
θ t m c ( y , θ , t , P ) = θ h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ( 1 t h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ) ( 1 t h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ) 2 + h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ( 1 t θ h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ( 1 t h c ( y , θ , A θ ( P ) , τ θ ( P ) ) ) 2 .
Then, we obtain
2 2 t m c ( y , θ 0 , t θ 0 c ( P 0 ) , P 0 ) d P 0 ( y ) = I
and
θ t m c ( y , θ 0 , t θ 0 c ( P 0 ) , P 0 ) d P 0 ( y ) = θ h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) d P 0 ( y ) .
By replacing in (53), we obtain
θ t θ 0 c ( P 0 ) = θ h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) d P 0 ( y ) .
Using (52) and (57), we obtain
θ h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) d P 0 ( y ) · θ h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) d P 0 ( y ) · IF ( x ; T c , P 0 ) + θ h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) d P 0 ( y ) · IF ( x ; t θ 0 c , P 0 ) = 0 .
In order to calculate IF ( x ; t θ c , P 0 ) , we first write Equation (44) for P = P ˜ ε x and also use (54). Thus, we obtain
( 1 ε ) h c ( y , θ , A θ ( P ˜ ε x ) , τ θ ( P ˜ ε x ) ) 1 t θ c ( P ˜ ε x ) h c ( y , θ , A θ ( P ˜ ε x ) , τ θ ( P ˜ ε x ) ) d P 0 ( y ) + ε h c ( x , θ , A θ ( P ˜ ε x ) , τ θ ( P ˜ ε x ) ) 1 t θ c ( P ˜ ε x ) h c ( x , θ , A θ ( P ˜ ε x ) , τ θ ( P ˜ ε x ) ) = 0 .
By differentiation with respect to ε , and taking θ = θ 0 , we get
ε [ h c ( y , θ 0 , A θ 0 ( P ˜ ε x ) , τ θ 0 ( P ˜ ε x ) ) ] ε = 0 d P 0 ( y ) h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) d P 0 ( y ) IF ( x ; t θ 0 c , P 0 ) + h c ( x , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) = 0 .
Consequently,
IF ( x ; t θ 0 c , P 0 ) = h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) h c ( y , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) d P 0 ( y ) 1 · h c ( x , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) = h c ( x , θ 0 , A θ 0 ( P 0 ) , τ θ 0 ( P 0 ) ) .
By combining (58) with (60), we obtain (43). □
Remark 1.
The classical empirical likelihood estimator of the parameter θ 0 of the moment condition model can be obtained as a particular case of the class of minimum empirical divergence estimators introduced by Broniatowski and Keziou [12]. Toma [18] proved that, in the case when P 0 belongs to the model M ( 1 ) , the influence functions for the estimators from this class, so particularly the influence function of the EL estimator, are all of the form
IF ( x ; T , P 0 ) = θ g ( y , θ 0 ) d P 0 ( y ) g ( y , θ 0 ) g ( y , θ 0 ) d P 0 ( y ) 1 · θ g ( y , θ 0 ) d P 0 ( y ) 1 · θ g ( y , θ 0 ) d P 0 ( y ) · g ( y , θ 0 ) g ( y , θ 0 ) d P 0 ( y ) 1 g ( x , θ 0 ) ,
irrespective of the used divergence. This influence function also coincides with the influence function of the GMM estimator obtained by Ronchetti and Trojani [9] and is linearly related to the function g ( x , θ ) of the model. When the orthogonality function g ( x , θ ) is not bounded in x, the minimum empirical divergence estimators, and particularly the EL estimator of θ 0 , are not robust. For many moment condition models, the orthogonality functions are linear and hence unbounded; therefore, these estimation methods are generally not robust. This is also the case of other known estimators, such as the least squares estimators, the GMM estimators, and the exponential tilting estimator for moment condition models. Instead, for the new estimator defined in the present paper, the influence function is linearly related to the function g c ( x , θ ) , which is bounded; therefore, this estimator can be seen as a robust version of the classical EL estimator.
An important feature of the robust version of the EL estimator is that its robustness can be controlled by a positive constant c. This constant appears in the Huber function used. In addition, an advantage of this approach based on using the Huber function is that we can require the bound of influence function of the estimator to be satisfied in a norm that is self-standardized with respect to the covariance matrix of the estimator; this norm measures the influence of the estimator relative to its variability expressed by its covariance matrix. Such an approach is also suitable to induce stable testing procedures (see [9]), which could be useful in future studies regarding robust testing. The robust estimator proposed in the present paper has the self-standardized influence function bounded by the constant c appearing in the Huber function. In a similar manner as in [9], the constant c controls the degree of robustness of the estimator, and in practice, we could take a value close to the lower bound in order to enforce a maximum amount of robustness. On the other hand, the density power divergences [25] combined with the minimum divergence approach have proved useful for construction of robust estimators in different contexts, in particular for parameter density estimation. Other approaches for robust estimation in regression models is proposed by [26]. Such approaches could be considered in future research studies in order to be adapted to the context of the moment condition models.

2.4. Consistency of the Estimators

In this subsection, we establish consistency of the estimator t θ c ^ of t θ c , for any fixed θ Θ , and the consistency of the estimator θ c ^ of θ 0 . First, for any fixed θ Θ , we state the consistency of the estimators A ^ θ and τ ^ θ defined by the system (11).

2.4.1. Consistency of the Estimators A ^ θ and τ ^ θ , for Fixed θ Θ

The estimators A ^ θ and τ ^ θ , of A θ and τ θ defined by the theoretical system of Equation (7), are Z-estimators. We consider the following notations
Ψ 1 ( θ , A , τ ) : = h c ( y , θ , A , τ ) d P θ ( y )
Ψ 2 ( x , θ , A , τ ) : = h c ( x , θ , A , τ ) h c ( x , θ , A , τ ) I
and Ψ ( x , θ , A , τ ) : = ( Ψ 1 ( θ , A , τ ) , vec ( Ψ 2 ( x , θ , A , τ ) ) ) , where “ vec ( · ) ” is the operator that transforms the matrix into vector, by putting all the columns of the matrix one under the other. Notice that, Ψ 1 ( θ , A , τ ) is a constant function with respect to x. With these notations, for a given θ Θ , the Z-estimators A ^ θ and τ ^ θ are solutions of the system (see (11)),
Ψ ( x , θ , A ^ θ , τ ^ θ ) d P n ( x ) = 0 ,
and their theoretical counterparts are A θ and τ θ solution of the system (see (7)),
Ψ ( x , θ , A θ , τ θ ) d P 0 ( x ) = 0 .
In the rest of the paper, we consider matrix A in its “vec” form, as defined above. This is necessary in order to apply some classical results, for example the uniform weak law of large numbers (UWLLN) or results regarding Z-estimators. Therefore, the argument A of the functions h c , n c and Ψ will be in fact vec A . For simplicity, we write A instead of vec A . The same is valid for A ^ θ and A θ .
Assumption 1.
(a) There exists compact neighborhood N θ of ( A θ , τ θ ) such that
sup ( A , τ ) N θ Ψ ( x , θ , A , τ ) d P 0 ( x ) < ;
(b) for any positive ε, the following condition holds
inf ( A , τ ) M θ Ψ ( x , θ , A , τ ) d P 0 ( x ) > 0 = Ψ ( x , θ , A θ , τ θ ) d P 0 ( x ) ,
where M θ : = ( A , τ ) s . t . ( A , τ ) ( A θ , τ θ ) > ε .
Proposition 2.
For each θ Θ , under Assumption 1, ( A ^ θ , τ ^ θ ) converges in probability to ( A θ , τ θ ) .
Proof. 
Since ( A , τ ) Ψ ( x , θ , A , τ ) is continuous, by the UWLLN, Assumption 1(a) implies
sup ( A , τ ) N θ Ψ ( x , θ , A , τ ) d P n ( x ) Ψ ( x , θ , A , τ ) d P 0 ( x ) 0 ,
in probability. This result, together with Assumption 1(b) ensure the convergence in probability of the estimators A ^ θ and τ ^ θ toward A θ and τ θ , respectively. The arguments are the same as those from [27], Theorem 5.9, p. 46. □

2.4.2. Consistency of the Estimator t θ c ^ of t θ c , for Fixed θ Θ

We state the consistency of the estimators t θ c ^ under the following assumptions.
Assumption 2.
(a) t θ c : = arg sup t Λ θ c ( P 0 ) m c ( y , θ , t ) d P 0 ( y ) exists, unique and interior point of Λ θ c ( P 0 ) ;
(b) there exists a compact neighborhood N t θ c Λ θ c ( P 0 ) of t θ c such that t θ c int ( N t θ c ) , and there exists compact neighborhood N θ of ( A θ , τ θ ) such that
sup t N t θ c , ( A , τ ) N θ | n c ( x , θ , t , A , τ ) | d P 0 ( x ) < ;
(c) there exists compact neighborhood N t θ c of t θ c , and there exists a sequence B n = O P ( 1 ) , such that, for all t , t N t θ c , it holds
n c ( x , θ , t , A ^ θ , τ ^ θ ) d P 0 ( x ) n c ( x , θ , t , A ^ θ , τ ^ θ ) d P 0 ( x ) B n t t .
Proposition 3.
Under Assumptions 1 and 2, we have
( 1 ) t θ c ^ converges in probability to t θ c ;
( 2 ) K L m ^ ( M θ c , n , P 0 ) converges in probability to K L m ( M θ c , P 0 ) .
Proof. 
(1) Using Assumption 2(b) and the continuity of the function n c ( x , θ , t , A , τ ) with respect to t , A , and τ , by the uniform weak law of large numbers (UWLLN), we get
sup t N t θ c , ( A , τ ) N θ n c ( x , θ , t , A , τ ) d P n ( x ) n c ( x , θ , t , A , τ ) d P 0 ( x ) 0 ,
in probability. The following inequality holds
sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t , A θ , τ θ ) d P 0 ( x ) sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t , A ^ θ , τ ^ θ ) d P 0 ( x ) + + sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P 0 ( x ) n c ( x , θ , t , A θ , τ θ ) d P 0 ( x ) .
The first term in the right hand side of the Inequality (70), tends to 0 in probability, on the basis of the result (69). The second term in the right hand side of (70) also tends to 0 in probability. Indeed, using the convergence in probability ( A ^ θ , τ ^ θ ) ( A θ , τ θ ) , by Assumption 2(b), we get the pointwise convergence for each t. Then, according to Corollary 2.1 from [28], using the point-wise convergence together with Assumption 2(c), we obtain the uniform convergence
sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P 0 ( x ) n c ( x , θ , t , A θ , τ θ ) d P 0 ( x ) = o P ( 1 ) .
Consequently
sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t , A θ , τ θ ) d P 0 ( x ) 0 ,
in probability. Using (72) and the fact that t θ c is unique and belongs to int ( N t θ c ) and the strict concavity of the function t n c ( y , θ , t , A θ , τ θ ) d P 0 ( y ) , on the basis of Theorem 5.7 in [27], we conclude that any value
t ̲ : = arg sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x )
converges in probability to t θ c . We show that t θ c ^ belongs to int ( N t θ c ) with probability one as n , and consequently it converges to t θ c . Since for n sufficiently large any t ̲ int ( N t θ c ) , the concavity of the criterion function t n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) ensures that no other point t in the complement of int ( N t θ c ) can maximize n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) over t R ; hence t θ c ^ belongs to int ( N t θ c ) .
(2) We have
K L m ^ ( M θ c , n , P 0 ) K L m ( M θ c , P 0 ) = n c ( x , θ , t θ c ^ , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t θ c , A θ , τ θ ) d P 0 ( x ) .
Note that
n c ( x , θ , t θ c , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t θ c , A θ , τ θ ) d P 0 ( x ) K L m ^ ( M θ c , n , P 0 ) K L m ( M θ c , n , P 0 ) n c ( x , θ , t θ c ^ , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t θ c ^ , A θ , τ θ ) d P 0 ( x ) .
Both the right hand side and the left hand side in the above display tend to 0 in probability using (72). Hence, K L ^ m ( M θ c , n , P 0 ) converges to K L m ( M θ c , P 0 ) , in probability. □

2.4.3. Consistency of the Estimator θ c ^

Assumption 3. ( a ) the function ( θ , t , A , τ ) ) h c ( X , θ , t , A , τ ) is continuous, with probability 1;
( b ) for each θ Θ , there exist compact neighborhood N t θ c of t θ c and compact neighborhood N θ of ( A θ , τ θ ) such that
sup θ Θ sup t N t θ c , ( A , τ ) N θ | n c ( x , θ , t , A , τ ) | d P 0 ( x ) < ;
( c ) Let t n * ( θ ) : = arg sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P 0 ( x ) n c ( x , θ , t , A θ , τ θ ) d P 0 ( x ) . There exists a sequence B n = O P ( 1 ) such that, for all θ , θ Θ , it holds
n c ( x , θ , t n * ( θ ) , A ^ θ , τ ^ θ ) d P 0 ( x ) n c ( x , θ , t n * ( θ ) , A ^ θ , τ ^ θ ) d P 0 ( x ) B n θ θ ;
( d ) the function θ n c ( x , θ , t θ c , A θ , τ θ ) d P 0 ( x ) is continuous on Θ;
( e ) θ 0 : = arg inf θ Θ n c ( x , θ , t θ c , A θ , τ θ ) d P 0 ( x ) exists, unique and interior point of Θ.
Proposition 4.
Under Assumptions 1–3, we have
( 1 ) t θ c ^ t θ c 0 in probability, uniformly with respect to θ Θ ;
( 2 ) θ c ^ θ 0 0 in probability.
Proof. 
(1) By Assumption 3(a), n c ( x , θ , t , A , τ ) is continuous in θ , t , A , τ . Using also Assumption 3(b), by applying UWLLN, we obtain the uniform convergence in probability
sup ( θ , t , A , τ ) C n c ( x , θ , t , A , τ ) d P n ( x ) n c ( x , θ , t , A , τ ) d P 0 ( x ) 0
over the compact set C : = ( θ , t , A , τ ) s . t . θ Θ , t N t θ c , ( A , τ ) N θ . We will prove the uniform convergence in probability
sup θ Θ t θ c ^ t θ c 0 .
Let η > 0 . We show that P 0 ( sup θ Θ t θ c ˜ t θ c η ) 0 for any
t θ c ˜ : = arg sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) .
We show that t θ c ^ belongs to int ( N t θ c ) with probability one, as n .
Let η > 0 be such that sup θ Θ t θ c ˜ t θ c η . Since Θ is compact, by continuity, there exists θ ¯ Θ , such that sup θ Θ t θ c ˜ t θ c = t θ ¯ c ˜ t θ ¯ c η . Hence, there exists ε > 0 such that
n c ( x , θ ¯ , t θ ¯ c , A θ ¯ , τ θ ¯ ) d P 0 ( x ) n c ( x , θ ¯ , t ˜ θ ¯ c , A θ ¯ , τ θ ¯ ) d P 0 ( x ) > ε .
Therefore,
P 0 sup θ Θ t θ c ˜ t θ c η P 0 n c ( x , θ ¯ , t θ ¯ c , A θ ¯ , τ θ ¯ ) d P 0 ( x ) n c ( x , θ ¯ , t θ ¯ c ˜ , A θ ¯ , τ θ ¯ ) d P 0 ( x ) > ε .
On the other hand, using (71) and (72), we can write
n c ( x , θ ¯ , t θ ¯ c , A θ ¯ , τ θ ¯ ) d P 0 ( x ) n c ( x , θ ¯ , t ˜ θ ¯ c , A θ ¯ , τ θ ¯ ) d P 0 ( x ) = n c ( x , θ ¯ , t θ ¯ c , A ^ θ ¯ , τ ^ θ ¯ ) d P n ( x ) n c ( x , θ ¯ , t ˜ θ ¯ c , A θ ¯ , τ θ ¯ ) d P 0 ( x ) + o P ( 1 ) n c ( x , θ ¯ , t ˜ θ ¯ c , A ^ θ ¯ , τ ^ θ ¯ ) d P n ( x ) n c ( x , θ ¯ , t ˜ θ ¯ c , A ^ θ ¯ , τ ^ θ ¯ ) d P 0 ( x ) + o P ( 1 ) sup C n c ( x , θ , t , A , τ ) d P n ( x ) n c ( x , θ , t , A , τ ) d P 0 ( x ) + o P ( 1 ) .
Using (76), (79), and (80), we deduce that sup θ Θ t θ c ˜ t θ c 0 in probability. In particular, for large n, t θ c ˜ int ( N t θ c ) uniformly in θ . Since t n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) is concave, the maximizer t θ c ^ belongs to int ( N t θ c ) for sufficiently large n. Then, sup θ Θ t θ c ^ t θ c 0 in probability. (2) For large n, we can write
sup θ Θ n c ( x , θ , t θ c ^ , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t θ c , A θ , τ θ ) d P 0 ( x ) = sup θ Θ n c ( x , θ , t θ c ˜ , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t θ c , A θ , τ θ ) d P 0 ( x ) = : sup θ Θ | B | .
Note that
n c ( x , θ , t θ c , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t θ c , A θ , τ θ ) d P 0 ( x ) B n c ( x , θ , t θ c ˜ , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t θ c ˜ , A θ , τ θ ) d P 0 ( x ) .
In order to prove that sup θ Θ | B | 0 in probability, we first prove that
sup θ Θ sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t , A θ , τ θ ) d P 0 ( x ) 0 ,
in probability. Notice that
sup θ Θ sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t , A θ , τ θ ) d P 0 ( x ) sup θ Θ sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t , A ^ θ , τ ^ θ ) d P 0 ( x ) + sup θ Θ sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P 0 ( x ) n c ( x , θ , t , A θ , τ θ ) d P 0 ( x ) .
The first term in the right-hand side of the above inequality tends to 0 in probability, using (76). Regarding the second term, we have the convergence (71), and combining this with Assumption 3(c), we obtain
sup θ Θ sup t N t θ c n c ( x , θ , t , A ^ θ , τ ^ θ ) d P 0 ( x ) n c ( x , θ , t , A θ , τ θ ) d P 0 ( x ) 0 ,
in probability. Consequently, (83) holds. Then, (81) and (82) lead to
sup θ Θ n c ( x , θ , t θ c ^ , A ^ θ , τ ^ θ ) d P n ( x ) n c ( x , θ , t θ c , A θ , τ θ ) d P 0 ( x ) 0 .
Assumptions 3(d) and (e) ensure that θ 0 is well-separated in the sense that, ε > 0 ,
inf { θ ; θ θ 0 ε } n c ( x , θ , t θ c , A θ , τ θ ) d P 0 ( x ) > 0 = n c ( x , θ 0 , t θ c , A θ 0 , τ θ 0 ) d P 0 ( x ) .
Finally, (85) and (86) imply that θ c ^ θ 0 in probability, on the basis of Theorem 5.7 p. 45 from [27]. □

3. Simulation Results

In order to compare the performance of the proposed robust EL estimate (26) with that of the EL estimator (27) in the case of contaminated data, we consider the moment condition model presented in Example 1 in Section 1.
Let X be a random variable with probability distribution P 0 = χ 1 2 , a chi-square distribution with one degree of freedom. Then, it holds that the equation R g ( x , θ ) d P 0 ( x ) = 0 , with g ( x , θ ) = ( x θ , x 2 θ 2 2 θ ) , has a unique solution θ = θ 0 = 1 . This is a particular case of the model from Example 1, namely when h ( θ ) = θ 2 + θ . Observe that, for this model, g ( x , θ ) is unbounded (in x).
We use i.i.d. data generated from a slight deviation of the model P 0 , namely from the model
( 1 ϵ ) χ 1 2 + ϵ χ 10 2 ,
with ϵ = 0.05 , respectively ϵ = 0.10 . The considered sample sizes are n = 100 and n = 500 . All the simulations are repeated 1000 times. The obtained estimates are compared through bias, variance, and mean square error, computed on the basis of the 1000 replications. We give the corresponding box-plots in Figure 1, Figure 2, Figure 3 and Figure 4, where the true value of the parameter θ 0 = 1 is presented with horizontal dashed line. For computing the proposed Robust EL estimate (26), we use the truncated functions g c ( x , θ ) and g n c ( x , θ ) with c = 2 . The algorithm for computing the estimate was obtained by adapting the one from [10] (Appendix A.1., p. 3196). Namely, for each iteration, the estimation of the parameter θ 0 , corresponding to the new orthogonality function in step iii, is computed using Uzawa algorithm for the saddle-point optimum in (26). The obtained results are presented in Table 1, Table 2, Table 3 and Table 4 and Figure 1, Figure 2, Figure 3 and Figure 4. All these results illustrate the fact that, in the case of contaminated data, the robust EL estimator outperforms the classical EL estimator.

4. Conclusions

We proposed a robust version of the EL estimator for moment condition models. This estimator is defined through the minimization of an empirical version of the modified Kullback–Leibler divergence in dual form, using truncated orthogonality functions based on multivariate Huber function. We proved the robustness by means of the influence function, as well the consistency of the new estimator. The results of the Monte Carlo simulation study show that, in the case of contaminated data, the robust EL estimator outperforms the classical EL estimator.

Author Contributions

Conceptualization, A.K. and A.T.; methodology, A.K. and A.T.; investigation, A.K. and A.T.; writing the manuscript, A.K. and A.T. All authors have read and agreed to the final version of the manuscript.

Funding

This work was supported by a grant of the Romanian Ministery of Education and Research, CNCS—UEFISCDI, project number PN-III-P4-ID-PCE-2020-1112, within PNCDI III.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hansen, L.P. Large sample properties of generalized method of moments estimators. Econometrica 1982, 50, 1029–1054. [Google Scholar] [CrossRef]
  2. Hansen, L.P.; Heaton, J.; Yaron, A. Finite-sample properties of some alternative generalized method of moments estimators. J. Bus. Econ. Stat. 1996, 14, 262–280. [Google Scholar]
  3. Qin, J.; Lawless, J. Empirical likelihood and general estimating equations. Ann. Stat. 1994, 22, 300–325. [Google Scholar] [CrossRef]
  4. Imbens, G.W. One-step estimators for over-identified generalized method of moments models. Rev. Econ. Stud. 1997, 64, 359–383. [Google Scholar] [CrossRef]
  5. Owen, A. Empirical Likelihood; Chapman and Hall: New York, NY, USA, 2001. [Google Scholar]
  6. Kitamura, Y.; Stutzer, M. An information-theoretic alternative to generalized method of moments estimation. Econometrica 1997, 65, 861–874. [Google Scholar] [CrossRef]
  7. Imbens, G.W. Generalized method of moments and empirical likelihood. J. Bus. Econ. Stat. 2002, 20, 493–506. [Google Scholar] [CrossRef] [Green Version]
  8. Newey, W.K.; Smith, R.J. Higher order properties of GMM and generalized empirical likelihood estimators. Econometrica 2004, 72, 219–255. [Google Scholar] [CrossRef] [Green Version]
  9. Ronchetti, E.; Trojani, F. Robust inference with GMM estimators. J. Econom. 2001, 101, 37–69. [Google Scholar] [CrossRef] [Green Version]
  10. Lô, S.N.; Ronchetti, E. Robust small sample accurate inference in moment condition models. Comput. Stat. Data Anal. 2012, 56, 3182–3197. [Google Scholar] [CrossRef]
  11. Felipe, A.; Martin, N.; Miranda, P.; Pardo, L. Testing with exponentially tilted empirical likelihood. Methodol. Comput. Appl. Probab. 2018, 20, 1–40. [Google Scholar] [CrossRef]
  12. Broniatowski, M.; Keziou, A. Divergences and duality for estimation and test under moment condition models. J. Stat. Plan. Inference 2012, 142, 2554–2573. [Google Scholar] [CrossRef] [Green Version]
  13. Broniatowski, M.; Keziou, A. Parametric estimation and tests through divergences and the duality technique. J. Multivar. Anal. 2009, 100, 16–36. [Google Scholar] [CrossRef] [Green Version]
  14. Toma, A.; Leoni-Aubin, S. Robust tests based on dual divergence estimators and saddlepoint approximations. J. Multivar. Anal. 2010, 101, 1143–1155. [Google Scholar] [CrossRef] [Green Version]
  15. Toma, A.; Broniatowski, M. Dual divergence estimators and tests: Robustness results. J. Multivar. Anal. 2011, 102, 20–36. [Google Scholar] [CrossRef] [Green Version]
  16. Toma, A. Model selection criteria using divergences. Entropy 2014, 16, 2686–2698. [Google Scholar] [CrossRef] [Green Version]
  17. Schennach, S.M. Point estimation with exponentially tilted empirical likelihood. Ann. Stat. 2007, 35, 634–672. [Google Scholar] [CrossRef] [Green Version]
  18. Toma, A. Robustness of dual divergence estimators for models satisfying linear constraints. C. R. Math. 2013, 351, 311–316. [Google Scholar] [CrossRef]
  19. Engle, R.F. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 1982, 50, 987–1007. [Google Scholar] [CrossRef]
  20. Bansal, R.; Hsieh, D.; Viswanathan, S. No arbitrage and arbitrage pricing: A new approach. J. Financ. 1993, 48, 1719–1747. [Google Scholar] [CrossRef]
  21. Hampel, F.R.; Ronchetti, E.M.; Rousseeuw, P.J.; Stahel, W.A. Robust Statistics: The Approach Based on Influence Functions; Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics; John Wiley & Sons Inc.: New York, NY, USA, 1986. [Google Scholar]
  22. Rüschendorf, L. On the minimum discrimination information theorem. Stat. Decis. 1984, 1984 (Suppl. 1), 263–283. [Google Scholar]
  23. Cressie, N.; Read, T.R.C. Multinomial goodness-of-fit tests. J. R. Stat. Soc. Ser. B 1984, 46, 440–464. [Google Scholar] [CrossRef]
  24. Broniatowski, M.; Keziou, A. Minimization of ϕ-divergences on sets of signed measures. Stud. Sci. Math. Hung. 2006, 43, 403–442. [Google Scholar]
  25. Basu, A.; Harris, I.R.; Hjort, N.L.; Jones, M.C. Robust and efficient estimation by minimizing a density power divergence. Biometrika 1998, 85, 549–559. [Google Scholar] [CrossRef] [Green Version]
  26. She, Y.; Owen, A. Outlier detection using nonconvex penalized regression. J. Am. Stat. Assoc. 2011, 106, 626–639. [Google Scholar] [CrossRef] [Green Version]
  27. van der Vaart, A. Asymptotic Statistics; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
  28. Newey, W.K. Uniform convergence in probability and stochastic equicontinuity. Econometrica 1991, 59, 1161–1167. [Google Scholar] [CrossRef]
Figure 1. Robust EL versus EL, for n = 100 and ϵ = 0.05 .
Figure 1. Robust EL versus EL, for n = 100 and ϵ = 0.05 .
Mathematics 09 00829 g001
Figure 2. Robust EL versus EL, for n = 100 and ϵ = 0.10 .
Figure 2. Robust EL versus EL, for n = 100 and ϵ = 0.10 .
Mathematics 09 00829 g002
Figure 3. Robust EL versus EL, for n = 500 and ϵ = 0.05 .
Figure 3. Robust EL versus EL, for n = 500 and ϵ = 0.05 .
Mathematics 09 00829 g003
Figure 4. Robust EL versus EL, for n = 500 and ϵ = 0.10 .
Figure 4. Robust EL versus EL, for n = 500 and ϵ = 0.10 .
Mathematics 09 00829 g004
Table 1. Robust Empirical Likelihood (EL) versus EL, for n = 100 and ϵ = 0.05 .
Table 1. Robust Empirical Likelihood (EL) versus EL, for n = 100 and ϵ = 0.05 .
EstimateSquared BiasVarianceMean Square Error
Robust EL 1.0477 0.0023 0.0034 0.0056
EL 1.0861 0.0074 0.0208 0.0282
Table 2. Robust EL versus EL, for n = 100 and ϵ = 0.10 .
Table 2. Robust EL versus EL, for n = 100 and ϵ = 0.10 .
EstimateSquared BiasVarianceMean Square Error
Robust EL 1.1281 0.0164 0.0116 0.0281
EL 1.2052 0.0421 0.0342 0.0763
Table 3. Robust EL versus EL, for n = 500 and ϵ = 0.05 .
Table 3. Robust EL versus EL, for n = 500 and ϵ = 0.05 .
EstimateSquared BiasVarianceMean Square Error
Robust EL 1.0336 0.0011 0.0005 0.0016
EL 1.0750 0.0056 0.0039 0.0095
Table 4. Robust EL versus EL, for n = 500 and ϵ = 0.10 .
Table 4. Robust EL versus EL, for n = 500 and ϵ = 0.10 .
EstimateSquared BiasVarianceMean Square Error
Robust EL 1.1033 0.0107 0.0019 0.0126
EL 1.1835 0.0337 0.0065 0.0402
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Keziou, A.; Toma, A. A Robust Version of the Empirical Likelihood Estimator. Mathematics 2021, 9, 829. https://doi.org/10.3390/math9080829

AMA Style

Keziou A, Toma A. A Robust Version of the Empirical Likelihood Estimator. Mathematics. 2021; 9(8):829. https://doi.org/10.3390/math9080829

Chicago/Turabian Style

Keziou, Amor, and Aida Toma. 2021. "A Robust Version of the Empirical Likelihood Estimator" Mathematics 9, no. 8: 829. https://doi.org/10.3390/math9080829

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop