A Robust Version of the Empirical Likelihood Estimator

: In this paper, we introduce a robust version of the empirical likelihood estimator for semiparametric moment condition models. This estimator is obtained by minimizing the modiﬁed Kullback–Leibler divergence, in its dual form, using truncated orthogonality functions. We prove the robustness and the consistency of the new estimator. The performance of the robust empirical likelihood estimator is illustrated through examples based on Monte Carlo simulations.


Introduction
Let X 1 , . . . , X n be an i.i.d. sample with unknown p.m. P 0 . We assume that the equation g(x, θ) dP 0 (x) = 0 has a unique solution (in θ), which will be denoted θ 0 . We consider the estimation problem of θ 0 from the data X 1 , . . . , X n . The traditional way of estimating the parameter θ 0 is given by the Generalized Method of Moments (GMM) [1]. The GMM estimators are consistent and have asymptotically normal distribution. Despite their desirable asymptotic properties, the finite sample performance of the GMM estimators is not satisfactory. Some other alternative methods have been proposed in the literature. The Continuous Updating (CU) estimator [2], the Empirical Likelihood (EL) estimator [3][4][5], and the Exponential Tilting (ET) [6], are three of the most known examples. Imbens [7] showed that EL and ET estimators are characterized by lower bias than GMM in nonlinear models. Newey and Smith [8] studied the theoretical properties of EL, ET, and CU estimators, by including them into the Generalized Empirical Likelihood (GEL) family of estimators, and showed that all GEL estimators are characterized by lower asymptotic bias than GMM. The information and entropy econometric (IEE) techniques have been proposed in order to improve the finite sample performance of the GMM estimators and tests [4,6]. Ronchetti and Trojani [9] and Lo and Ronchetti [10] have proposed robust alternatives for GMM estimators and tests, respectively, for the IEE techniques, keeping also the finite sample accuracy. Felipe et al. [11] proposed empirical divergence test statistics based on exponentially tilted empirical likelihood estimator with good robustness properties. Broniatowski and Keziou [12] proposed a general approach for estimation and testing in moment condition models, which includes some of the above-mentioned methods. This approach, based on minimizing divergences with their dual forms, allows the asymptotic study of the estimators (called minimum empirical divergence estimators) and of the associated test statistics, both under the model and under mis-specification of the model. The approach based on divergences and duality was firstly considered in the case of parametric models, for example, in [13][14][15]. Applications of the minimum dual divergence estimators in model selection problems are considered in [16].
The EL paradigm enters as a special case of the general methodology from Broniatowski and Keziou [12], namely, when using the modified Kullback-Leibler divergence. Although the EL estimator is preferable to other estimators due to higher-order asymptotic properties, these properties are valid only in the case of the correct specification of the moment conditions. On the other hand, when the support of the p.m. corresponding to the model and the orthogonality functions are not bounded, the EL estimator may cease to be root n consistent (Schennach [17]) under mis-specification. It is a known fact that the EL estimator for moment condition models is not robust. This fact is also justified by the results from [18], where it is shown that the influence function of a minimum dual divergence estimator, and particularly that of the EL estimator, is linear related to the orthogonality function g corresponding to the model. Thus, the influence function of the EL estimator is bounded if and only if the function g corresponding to the underlying model is bounded. Hence, the EL estimator is usually not robust, since the function g could be unbounded on the observation. For this reason, in practice, the classical EL estimator, as well as the minimum dual divergence estimators and also GMM estimators, is unstable even under small deviations from the assumed model.
As examples in this context, we mention models for which the orthogonality functions are unbounded [9]. The autoregressive models with heteroscedastic errors [19] can be written under the form of moment condition models, but the orthogonality functions defining the orthogonality conditions are unbounded. Moreover, the nonlinear empirical asset pricing models [20] can be written under the form of moment condition models and have natural orthogonality conditions (given by the given asset pricing equations), which are given by unbounded orthogonality functions. We also recall the following classical example, which is used in the last section of the paper, in the Monte Carlo simulation study. Example 1 ([3] p. 302). Consider a random variable X with unbounded support (R or R + for instance). Let E(X) = θ, and assume that E(X 2 ) = h(θ), with h(·) being a known function. The aim is to estimate the parameter θ using an i.i.d. sample X 1 , . . . , X n of X. The information on the probability distribution P 0 of X can be expressed in the context of model (1), with d = 1 and = 2, by taking g(x, θ) =: (g 1 (x, θ), g 2 (x, θ)) = (x − θ, x 2 − h(θ)) . One can see that the orthogonality functions g 1 (·, θ) and g 2 (·, θ) are unbounded (with respect to x).
For such models, the lack of robustness of the EL estimator, as well as the lack of robustness of other classical estimators, represents the motivation to study some robust alternatives.
In the present paper, we propose a locally robust version of the EL estimator for moment condition models. Locally robust in the sense that the functional associated with this estimator is locally approximated by the means of the influence function, and then, the boundedness of the influence function will imply the fact that, in the neighborhood of the model, the asymptotic bias of the estimator cannot become arbitrarily large (see [21]).
The new estimator is defined by minimization of an empirical version of the modified Kullback-Leibler divergence in dual form, using truncated orthogonality functions. This leads to a robust EL estimate. Moreover, we prove the consistency of this estimator, Finally, we present an example based on Monte Carlo simulations illustrating the performance of the robust EL estimator in the case of contaminated data.

Statistical Divergences
Let ϕ be a convex function on R onto [0, +∞] valued, satisfying ϕ(1) = 0. Let P be some p.m. on the measurable space (R m , B(R m )). For any signed finite measure Q, on the same measurable space (R m , B(R m )), absolutely continuous with respect to P, the ϕ-divergence (sometimes we simply say divergence) between Q and P is defined by where dQ dP is the Radon-Nikodym derivative. When Q is not absolutely continuous with respect to P, we set D ϕ (Q, P) = +∞. This definition extends the one given in [22] for divergences between p.m.'s. A known class of divergences between p.m.'s is the class of Cressie-Read divergences, introduced in [23] and defined by the functions , which may be finite or infinite. The Kullback-Leibler divergence (KL) is associated to ϕ 1 , the modified Kullback-Leibler (KL m ) to ϕ 0 , the χ 2 divergence to ϕ 2 , the modified χ 2 divergence (χ 2 m ) to ϕ −1 and the Hellinger (H) distance to ϕ 1/2 . The ϕ-divergence between some set Ω of probability measures and a probability measure P is defined by

Definition of the Estimator
We consider a reference identifiable model {P θ ; θ ∈ Θ} of probability measures such that, for each θ ∈ Θ, P θ ∈ M θ , which means that R m g(x, θ) dP θ (x) = 0, and assume that θ is the unique solution of the equation. We assume that the p.m. P 0 of the data corresponding to the true unknown value θ 0 of the parameter to be estimated belongs to this reference model. The reference model will be associated with the truncated orthogonality function that will be used to define the robust version of the EL estimator of the parameter θ 0 . We will use the notation · for the Euclidean norm. Similarly as in [9], using the reference model {P θ ; θ ∈ Θ}, we define the function g c : R m × Θ → R , where H c : R → R is the Huber's function and A θ , τ θ are, respectively, × -matrix and -vector, defined as the solutions of the system of implicit equations where I is the × identity matrix and c > 0 is a given positive constant. Therefore, we have g c (x, θ) ≤ c, for all x and θ. We also use the function when needed to work with the dependence on the matrix A and on the vector τ. Therefore, For given P θ from the reference model, the triplet (θ, see [9], p. 48. Consider the estimating problem of the triplet (θ 0 , A θ 0 , τ θ 0 ) on the basis of a sample X 1 , . . . , X n ∼ P 0 , P 0 ∈ M θ 0 . For each θ ∈ Θ, using the p.m. P θ from the reference model, we define A θ and τ θ , solutions of the system where P n (·) is the empirical measure associated with the sample, with δ x (·) being the Dirac measure at the point x, for any x. We denote Note that g c n (x, θ) depends on both the data and the reference probability P θ . We now consider the moment condition model associated to the function g c n (x, θ), namely, where The p.m. P 0 belongs to M c,n In what follows, we consider the modified Kullback-Leibler divergence, which corresponds to the strictly convex function ϕ(x) A straightforward calculus shows that the convex conjugate of the convex function ϕ, denote it ψ, is given by where g c n := (1 R m , g c n ) . We denote Λ c,n θ := Λ c,n θ (P 0 ) and Λ c,n,n θ := Λ c,n θ (P n ).
Since g c n (x, θ) is bounded (with respect to x), on the basis of Theorem 1.1 in [24] and Proposition 4.2 in [12], the following dual representation of KL m -divergence holds where and the supremum in (16) Furthermore, according to Proposition 4.2 from [12], for each θ ∈ Θ, the condition ensures that t c,n θ , defined as solution of optimization problem (18), is unique. Notice that the linear independence of the functions 1 R m , g c n,1 (·, θ), . . . , g c n, (·, θ) implies condition (19) whenever P 0 is not degenerate.
Moreover, using again Proposition 4.2 and Remark 4.4 from [12], for each θ ∈ Θ, one can show that the first component of the optimal solution t c,n θ in (18) equals to zero. One can then omit the first component of the vector t in displays ( (15)-(18)). Therefore, they will be replaced by where and t c,n θ := arg sup Denote In view of relation (23), a natural estimator of t c,n θ , is defined by Then, a "dual" plug-in estimator of the modified Kullback-Leibler divergence, between M c,n θ and P 0 , can be defined by where log(·) is the extended logarithm function, i.e., the function defined by log(u) = log(u) if u > 0, and log(u) = −∞ if u ≤ 0, for any u ∈ R. Finally, we define the following estimator of θ 0 which can be seen as a "robust" version of the well-known EL estimator.
Recall that the EL estimator can be written as (see, e.g., [5]) For establishing asymptotic properties of the proposed estimators, we need the following additional notations. Consider the moment condition model associated with the truncated function where A θ and τ θ are the solution to the system (7). Note that g c (x, θ) depends only on the reference model P θ and not on the data. This model is defined by where Let Therefore, as above, we have the following dual representation for KL m (M c θ , P 0 ) where and the supremum in (31) is reached, provided that KL m (M c θ , P 0 ) is finite. Moreover, the supremum in (31) is unique under the following assumption which is satisfied if the functions 1 R m , g c 1 (·, θ), . . . , g c (·, θ) are linearly independent and P 0 is not degenerate. We denote then Finally, we have We also use the function when needed to work with the dependence on matrix A and on vector τ. We have, then, with the above notation, m c (x, θ, t) = n c (x, θ, t, A θ , τ θ ), where A θ and τ θ are the solution of the system of Equation (7).

Robustness Property
In order to prove the robustness of the estimator θ c , we use the following well-known tools from the theory of robust statistics; see, e.g., [21]. A functional T, defined on a set of probability measures and parameter space valued, is called a statistical functional associated with an estimator θ of the parameter θ from the model P θ , if θ = T(P n ). The influence function of T at P θ is defined by where P εx := (1 − ε) P θ + ε δ x . A natural robustness requirement on the statistical functional corresponding to the estimator is the boundedness of its influence function. The statistical functional corresponding to the estimator θ c given by (26), is defined by where g c (x, θ, P) := h c (x, θ, A θ (P), τ θ (P)), with A θ (P) and τ θ (P) solutions of the system and m c (x, θ, t, P) := −ψ t g c (x, θ, P) = log 1 − t g c (x, θ, P) .
Note that, for a given θ, the function g c (x, θ, P), as well as m c (x, θ, t, P), both depend on the p.m. P θ . In addition, note that g c (x, θ, P θ ) coincides with g c (x, θ) defined in the preceding section. We denote t c θ (P) := arg sup t∈Λ c θ (P) m c (y, θ, t, P) dP(y).

Remark 1.
The classical empirical likelihood estimator of the parameter θ 0 of the moment condition model can be obtained as a particular case of the class of minimum empirical divergence estimators introduced by Broniatowski and Keziou [12]. Toma [18] proved that, in the case when P 0 belongs to the model M (1) , the influence functions for the estimators from this class, so particularly the influence function of the EL estimator, are all of the form irrespective of the used divergence. This influence function also coincides with the influence function of the GMM estimator obtained by Ronchetti and Trojani [9] and is linearly related to the function g(x, θ) of the model. When the orthogonality function g(x, θ) is not bounded in x, the minimum empirical divergence estimators, and particularly the EL estimator of θ 0 , are not robust. For many moment condition models, the orthogonality functions are linear and hence unbounded; therefore, these estimation methods are generally not robust. This is also the case of other known estimators, such as the least squares estimators, the GMM estimators, and the exponential tilting estimator for moment condition models. Instead, for the new estimator defined in the present paper, the influence function is linearly related to the function g c (x, θ), which is bounded; therefore, this estimator can be seen as a robust version of the classical EL estimator.
An important feature of the robust version of the EL estimator is that its robustness can be controlled by a positive constant c. This constant appears in the Huber function used. In addition, an advantage of this approach based on using the Huber function is that we can require the bound of influence function of the estimator to be satisfied in a norm that is self-standardized with respect to the covariance matrix of the estimator; this norm measures the influence of the estimator relative to its variability expressed by its covariance matrix. Such an approach is also suitable to induce stable testing procedures (see [9]), which could be useful in future studies regarding robust testing. The robust estimator proposed in the present paper has the self-standardized influence function bounded by the constant c appearing in the Huber function. In a similar manner as in [9], the constant c controls the degree of robustness of the estimator, and in practice, we could take a value close to the lower bound √ in order to enforce a maximum amount of robustness. On the other hand, the density power divergences [25] combined with the minimum divergence approach have proved useful for construction of robust estimators in different contexts, in particular for parameter density estimation. Other approaches for robust estimation in regression models is proposed by [26]. Such approaches could be considered in future research studies in order to be adapted to the context of the moment condition models.

Consistency of the Estimators
In this subsection, we establish consistency of the estimator t c θ of t c θ , for any fixed θ ∈ Θ, and the consistency of the estimator θ c of θ 0 . First, for any fixed θ ∈ Θ, we state the consistency of the estimators A θ and τ θ defined by the system (11).

Consistency of the Estimators A θ and τ θ , for Fixed θ ∈ Θ
The estimators A θ and τ θ , of A θ and τ θ defined by the theoretical system of Equation (7), are Z-estimators. We consider the following notations and Ψ(x, θ, A, τ) := (Ψ 1 (θ, A, τ) , vec(Ψ 2 (x, θ, A, τ)) ) , where "vec(·)" is the operator that transforms the matrix into vector, by putting all the columns of the matrix one under the other. Notice that, Ψ 1 (θ, A, τ) is a constant function with respect to x. With these notations, for a given θ ∈ Θ, the Z-estimators A θ and τ θ are solutions of the system (see (11)), and their theoretical counterparts are A θ and τ θ solution of the system (see (7)), In the rest of the paper, we consider matrix A in its "vec" form, as defined above. This is necessary in order to apply some classical results, for example the uniform weak law of large numbers (UWLLN) or results regarding Z-estimators. Therefore, the argument A of the functions h c , n c and Ψ will be in fact vecA. For simplicity, we write A instead of vecA. The same is valid for A θ and A θ . Assumption 1. (a) There exists compact neighborhood N θ of (A θ , τ θ ) such that (b) for any positive ε, the following condition holds
Proof. Since (A, τ) → Ψ(x, θ, A, τ) is continuous, by the UWLLN, Assumption 1(a) implies in probability. This result, together with Assumption 1(b) ensure the convergence in probability of the estimators A θ and τ θ toward A θ and τ θ , respectively. The arguments are the same as those from [27], Theorem 5.9, p. 46.

2.4.2.
Consistency of the Estimator t c θ of t c θ , for Fixed θ ∈ Θ We state the consistency of the estimators t c θ under the following assumptions.
m c (y, θ, t) dP 0 (y) exists, unique and interior point of , and there exists compact neighborhood N θ of (A θ , τ θ ) such that (c) there exists compact neighborhood N t c θ of t c θ , and there exists a sequence B n = O P (1), such that, for all t, t ∈ N t c θ , it holds in probability. The following inequality holds The first term in the right hand side of the Inequality (70), tends to 0 in probability, on the basis of the result (69). The second term in the right hand side of (70) also tends to 0 in probability. Indeed, using the convergence in probability ( A θ , τ θ ) → (A θ , τ θ ), by Assumption 2(b), we get the pointwise convergence for each t. Then, according to Corollary 2.1 from [28], using the point-wise convergence together with Assumption 2(c), we obtain the uniform convergence Consequently in probability. Using (72) and the fact that t c θ is unique and belongs to int(N t c θ ) and the strict concavity of the function t → n c (y, θ, t, A θ , τ θ ) dP 0 (y), on the basis of Theorem 5.7 in [27], we conclude that any value (d) the function θ → n c (x, θ, t c θ , A θ , τ θ ) dP 0 (x) is continuous on Θ; (e) θ 0 := arg inf concave, the maximizer t c θ belongs to int(N t c θ ) for sufficiently large n. Then, sup θ∈Θ t c θ − t c θ → 0 in probability. (2) For large n, we can write Note that In order to prove that sup θ∈Θ |B| → 0 in probability, we first prove that in probability. Notice that The first term in the right-hand side of the above inequality tends to 0 in probability, using (76). Regarding the second term, we have the convergence (71), and combining this with Assumption 3(c), we obtain in probability. Consequently, (83) holds. Then, (81) and (82) lead to Assumptions 3(d) and (e) ensure that θ 0 is well-separated in the sense that, ∀ ε > 0, Finally, (85) and (86) imply that θ c → θ 0 in probability, on the basis of Theorem 5.7 p. 45 from [27].

Simulation Results
In order to compare the performance of the proposed robust EL estimate (26) with that of the EL estimator (27) in the case of contaminated data, we consider the moment condition model presented in Example 1 in Section 1.
We use i.i.d. data generated from a slight deviation of the model P 0 , namely from the model (1 − ) χ 2 1 + χ 2 10 , with = 0.05, respectively = 0.10. The considered sample sizes are n = 100 and n = 500. All the simulations are repeated 1000 times. The obtained estimates are compared through bias, variance, and mean square error, computed on the basis of the 1000 replications. We give the corresponding box-plots in Figures 1-4, where the true value of the parameter θ 0 = 1 is presented with horizontal dashed line. For computing the proposed Robust EL estimate (26), we use the truncated functions g c (x, θ) and g c n (x, θ) with c = 2. The algorithm for computing the estimate was obtained by adapting the one from [10] (Appendix A.1., p. 3196). Namely, for each iteration, the estimation of the parameter θ 0 , corresponding to the new orthogonality function in step iii, is computed using Uzawa algorithm for the saddle-point optimum in (26). The obtained results are presented in Tables 1-4 and Figures 1-4. All these results illustrate the fact that, in the case of contaminated data, the robust EL estimator outperforms the classical EL estimator.

Conclusions
We proposed a robust version of the EL estimator for moment condition models. This estimator is defined through the minimization of an empirical version of the modified Kullback-Leibler divergence in dual form, using truncated orthogonality functions based on multivariate Huber function. We proved the robustness by means of the influence function, as well the consistency of the new estimator. The results of the Monte Carlo simulation study show that, in the case of contaminated data, the robust EL estimator outperforms the classical EL estimator.