Robust Z-Estimators for Semiparametric Moment Condition Models

In the present paper, we introduce a class of robust Z-estimators for moment condition models. These new estimators can be seen as robust alternatives for the minimum empirical divergence estimators. By using the multidimensional Huber function, we first define robust estimators of the element that realizes the supremum in the dual form of the divergence. A linear relationship between the influence function of a minimum empirical divergence estimator and the influence function of the estimator of the element that realizes the supremum in the dual form of the divergence led to the idea of defining new Z-estimators for the parameter of the model, by using robust estimators in the dual form of the divergence. The asymptotic properties of the proposed estimators were proven, including here the consistency and their asymptotic normality. Then, the influence functions of the estimators were derived, and their robustness is demonstrated.


Introduction
A moment condition model is a family M 1 of probability measures, all defined on the same measurable space (R m , B(R m )), such that g(x, θ)dQ(x) = 0, for all Q ∈ M 1 . (1) The parameter θ belongs to Θ ⊂ R d ; the function g := (g 1 , . . . , g l ) is defined on R m × Θ, each of the g i 's being real-valued, l ≥ d, and the functions g 1 , . . . , g l and 1 X are supposed to be linearly independent. Denote by M 1 the set of all probability measures on (R m , B(R m )) and such that Let X 1 , . . . , X n be an i.i.d. sample on the random vector X with unknown probability distribution P 0 . We considered the problem of the estimation of the parameter θ 0 for which the constraints of the model are satisfied: g(x, θ 0 )dP 0 (x) = 0.
We supposed that θ 0 is the unique solution of Equation (4). Thus, we assumed that information about θ 0 and P 0 is available in the form of l ≥ d functionally independent unbiased estimating functions, and we used this information to estimate θ 0 . Among the best-known estimation methods for moment condition models, we mention the generalized method of moments (GMM) [1], the continuous updating (CU) estimator [2], the empirical likelihood (EL) estimator [3,4], the exponential tilting (ET) estimator [5], and the generalized empirical likelihood (GEL) estimators [6]. Although the EL estimator is superior to other estimators in terms of higher-order asymptotic properties, these properties hold only under the correct specification of the moment conditions. In [7] was proposed the exponentially tilted empirical likelihood (ETEL) estimator, which has the same higher-order property as the EL estimator under the correct specification, while maintaining the usual asymptotic properties such as the consistency and asymptotic normality under misspecification. The so-called information and entropy econometric techniques have been proposed to improve the finite sample performance of the GMM-estimators and tests (see, e.g., [4,5]). Some recent methods for the estimation and testing of moment condition models are based on using divergences. Divergences between probability measures are widely used in statistics and data science in order to perform inference in models of various kinds, parametric or semiparametric. Statistical methods based on divergence minimization extend the likelihood paradigm and often have the advantage of providing a trade-off between efficiency and robustness [8][9][10][11]. A general methodology for the estimation and testing of moment condition models was developed in [12]. This approach is based on minimizing divergences in their dual form and allows the asymptotic study of the estimators, called minimum empirical divergence estimators, and of the associated test statistics, both under the model and under misspecification of the model. The approach based on minimizing dual forms of divergences was initially used in the case of parametric models, the results being published in a series of articles [13][14][15][16]. The broad class of minimum empirical divergence estimators contains in particular the EL estimator, the CU estimator, as well as the ET estimator mentioned above. Using the influence function as the robustness measure, it has been shown that the minimum empirical divergence estimators are not robust, because the corresponding influence functions are generally not bounded [17]. On the other hand, the minimum empirical divergence estimators have the same efficiency of first order, and moreover, the EL estimator, which belong to this class, is superior in higher-order efficiency. Therefore, proposing robust versions of the minimum empirical divergence estimators would bring a trade-off between robustness and efficiency. These aspects motivated our studies in the present paper.
Some robust estimation methods for moment condition models have been proposed in the literature, for example in [18][19][20][21][22]. In the present paper, we introduce a class of robust Z-estimators for moment condition models. These new estimators can be seen as robust alternatives for the minimum empirical divergence estimators. By using the multidimensional Huber function, we first define robust estimators of the element that realizes the supremum in the dual form of the divergence. A linear relationship between the influence function of a minimum empirical divergence estimator and the influence function of the estimator of the element that realizes the supremum in the dual form of the divergence led to the idea of defining new Z-estimators for the parameter of the model, by using robust estimators in the dual form of the divergence. The asymptotic properties of the proposed estimators were proven, including here the consistency and their asymptotic normality. Then, the influence functions of the estimators were derived, and their robustness is demonstrated.
The paper is organized as follows. In Section 2, we briefly recall the context and the definitions of the minimum empirical divergence estimators, these being necessary for defining the new estimators. In Section 3, the new Z-estimators for moment condition models are defined. The asymptotic properties of these estimators were proven, including here the consistency and their asymptotic normality. Then, the influence functions of the estimators were derived, and their robustness is demonstrated. The proofs of the theoretical results are deferred in Appendix A.

Statistical Divergences
Let ϕ be a convex function defined on R and [0, ∞]-valued, such that ϕ(1) = 0, and let P ∈ M 1 be some probability measure. For any signed finite measure Q defined on the same measurable space (R m , B(R m )), absolutely continuous (a.c.) with respect to P, the ϕ divergence between Q and P is defined by When Q is not a.c. with respect to P, we set D ϕ (Q, P) = ∞. This extension, for the case when Q is not absolutely continuous with respect to P, was considered in order to have a unique definition of divergences, appropriate for both cases-that of continuous probability laws and that of discrete probability laws. This definition extends the one of divergences between probability measures [23], and the necessity of working with signed finite measures will be explained in Section 2.2.
Largely used in information theory, the Kullback-Leibler divergence is associated with the real convex function ϕ(x) := x log x − x + 1 and is defined by The modified Kullback-Leibler divergence is associated with the convex function ϕ(x) := − log x + x − 1 and is defined through Other divergences, largely used in inferential statistics, are the χ 2 and the modified χ 2 divergences, namely these being associated with the convex functions ϕ(x) := 1 2 (x − 1) 2 and ϕ(x) := 1 2 (x − 1) 2 /x, respectively. The Hellinger distance and the L 1 distance are also ϕ divergences. They are associated with the convex functions ϕ(x) := 2( √ x − 1) 2 and ϕ(x) := |x − 1|, respectively.
All the preceding examples, except the L 1 distance, belong to the class of power divergences introduced by Cressie and Read [24] and defined by the convex functions: for γ ∈ R \ {0, 1} and ϕ 0 (x) : The Kullback-Leibler divergence is associated with ϕ 1 , the modified Kullback-Leibler with ϕ 0 , the χ 2 divergence with ϕ 2 , the modified χ 2 divergence with ϕ −1 , and the Hellinger distance with ϕ 1/2 . When ϕ γ is not defined on (−∞, 0) or when ϕ γ is not convex, the definition of the corresponding power divergence function Q ∈ M 1 → D ϕ γ (Q, P) can be extended to the whole set of signed finite measures by taking the following extension of ϕ γ : The ϕ divergence between some set Ω of signed finite measures and a probability measure P is defined by Assuming that

Minimum Empirical Divergence Estimators
Let X 1 , . . . , X n be an i.i.d. sample on the random vector X with the probability distribution P 0 . The "plug-in" estimator of the ϕ divergence D ϕ (M 1 θ , P 0 ) between the set M 1 θ and the probability measure P 0 is defined by replacing P 0 with the empirical measure associated with the sample. More precisely, where P n := 1 n ∑ n i=1 δ X i is the empirical measure associated with the sample, δ x being the Dirac measure putting all mass at x. If the projection of the measure P n on M 1 θ exists, it is a law a.c. with respect to P n . Then, it is natural to consider and then, the plug-in estimator (8) can be written as The infimum in the above expression (10) may be achieved at a point situated on the frontier of the set M (n) θ , a case in which the Lagrange method for characterizing the infimum and computing D ϕ (M 1 θ , P 0 ) cannot be applied. In order to avoid this difficulty, Broniatowski and Keziou [12,25] proposed to work on sets of signed finite measures and defined where M denotes the set of all signed finite measures on the measurable space (R m , B(R m )), and They showed that, if Q * 1 the projection of P n on M 1 θ is an interior point of M 1 θ and Q * the projection of P n on M θ is an interior point of M θ , then both approaches based on signed finite measures, respectively on probability measures, for defining minimum divergence estimators coincide. On the other hand, in the case when Q * 1 is a frontier point of M 1 θ , the estimator of the parameter θ 0 defined using the context of signed finite measures converges to θ 0 . These aspects justify the substitution of M 1 θ by M θ . In the following, we briefly recall the definitions of the estimators for the moment condition proposed in [12] in the context of signed finite measure sets.
Denote by g the function defined on R m × Θ and R l+1 -valued: Given a ϕ divergence, when the function ϕ is strictly convex on its domain, denote the convex conjugate of the function ϕ. For a given probability measure P ∈ M 1 and a fixed θ ∈ Θ, define We also use the notations Λ θ for Λ θ (P 0 ) and Λ (n) θ for Λ θ (P n ). Supposing that P 0 admits a projection Q * θ on M θ with the same support as P 0 and that the function ϕ is strictly convex on its domain, then the ϕ divergence D ϕ (M θ , P 0 ) admits the dual representation: where The supremum in (15) is unique and is reached at a point that we denote as t θ = t θ (P 0 ): Then, D ϕ (M θ , P 0 ), t θ , D ϕ (M, P 0 ) and θ 0 can be estimated respectively by The estimators defined in (20) are called minimum empirical divergence estimators. We refer to [12] for the complete study of the existence and of the asymptotic properties of the above estimators. The influence functions of these estimators and corresponding robustness properties were studied in [17]. According to those results, for θ ∈ Θ fixed, the influence function of the estimator t θ is given by where with the particular case θ = θ 0 : On the other hand, the influence function of the estimator θ ϕ is given by Since the function x → g(x, θ) is usually not bounded, for example, when we have linear constraints, the influence function IF(x; T ϕ , P 0 ) is not bounded; therefore, the minimum empirical divergence estimators θ ϕ defined in (20) are generally not robust. Through the calculations, it can be seen that there is a connection between the influence functions IF(x; t θ 0 , P 0 ) and IF(x; T ϕ , P 0 ), namely the relation Since IF(x; T ϕ , P 0 ) is linearly related to IF(x; t θ 0 , P 0 ), using a robust estimator of t θ = t θ (P 0 ) in the original duality Formula (15) would lead to a new robust estimator of θ 0 . This is the idea at the basis of our proposal in this paper, for constructing new robust estimators for moment condition models.

Definitions of New Estimators
In this section, we define robust versions of the estimators t θ from (18) and robust versions of minimum empirical divergence estimators θ ϕ from (20). First, we define robust estimators of t θ , by using a truncated version of the function x → ∂ ∂t m(x, θ, t), and then, we insert such a robust estimator in the estimating equation corresponding to the minimum empirical divergence estimator. The truncated function is based on the multidimensional Huber function and contains a shift vector τ θ and a scale matrix A θ to calibrate t θ and, thus, t θ , which realizes the supremum in the duality formula, will also be the solution of a new equation based on the new truncated function.
For simplicity, for fixed θ ∈ Θ, we also use the notation m θ (x, t) := m(x, θ, t). With this notation, t θ = t θ (P 0 ) defined in (16) is the unique solution of the equation: Consider the system where H c (y) := y · min 1, c is the multidimensional Huber function, with c > 0, I +1 the identity matrix, A is a (l + 1) × (l + 1) matrix, and τ ∈ R l+1 . For fixed θ, this system admits a unique solution (t, A, τ) = (t θ (P 0 ), A θ (P 0 ), τ θ (P 0 )) (according to [18], p. 17). The multidimensional Huber function is useful to define robust estimators; it transforms each point outside a hypersphere of c radius to the nearest point of it and leaves those inside unchanged (see [26], p. 239, [27]). By applying the multidimensional Huber function to the function y → ∂ ∂t m θ (y, t), together with considering the scale matrix A θ and the shift vector τ θ , a modification is produced there, where the norm exceeds the bound c, and in the meantime, the original t θ remains the solution of the equation based on the new truncated function. For parametric models, the multidimensional Huber function was also used in other contexts, for example to define optimal B s -robust estimators or optimal B i -robust estimators (see [26], p. 244).
The above arguments can be used for each probability measure P from the moment condition model M 1 . This context allows defining the truncated version of the function y → ∂ ∂t m θ (y, t), which we denote by ψ θ (y, t), such that the original t θ (P 0 ), the solution of Equation (26), is also the solution of the equation ψ θ (y, t θ (P 0 ))dP 0 (y) = 0.
For θ fixed and P a probability measure, the equation ∂ ∂t m θ (y, t)dP(y) = 0 has a unique solution t = t θ (P) ∈ Λ θ (P) assuring the supremum in the dual form of the divergence D ϕ (M θ , P) (see [12]). For each t, we define the A θ (t) and τ θ (t) solutions of the system: We define a new estimator t c θ of t θ = t θ (P 0 ), as a Z-estimator corresponding to the ψ-function: more precisely, t c θ is defined by the theoretical counterpart of this estimating equation being ψ θ (y, t θ (P 0 ))dP 0 (y) = 0.
For a given probability measure P, the statistical functional t c θ (P) associated with the estimator t c θ , whenever it exists, is defined by by construction.

Remark 1.
We notice a similarity between the Z-estimator defined in (34) and the classical optimal Bs-robust estimator for parametric models from [26]. In the case of the parametric models, the M-estimator corresponding to the ψ-function (33), but defined for the classical score function f t (x) instead of the function ∂ ∂t m θ (x, t) (inclusively in the system (31) and (32) defining A θ (t) and τ θ (t)), is the classical optimal Bs-robust estimator ( f t (x) denotes the density corresponding to a parametric model indexed by the parameter t). The classical optimal Bs-robust estimator for parametric models has the optimal property that minimizes a measure of the asymptotic mean-squared error, among all the Fisher-consistent estimators with a self-standardized sensitivity smaller than the positive constant c.
In the following, for a given divergence, using the estimators t c θ for t θ (P 0 ), we constructed new estimators of the parameter θ 0 of the model. In Section 3.3, we prove that all the estimators t c θ are robust, and this property will be transferred to the new estimators that we define for the parameter θ 0 .
Using (37), namely t c (θ, P 0 ) = t(θ, P 0 ), we obtain that θ = θ 0 is in fact the solution of equation: Then, we define a new estimator θ c ϕ of θ 0 , as a plug-in estimator solution of the equation: For a probability measure P, the statistical functional T c corresponding to the estimator θ c ϕ , whenever it exists, is defined by ∂ ∂θ m(y, T c (P), t c (T c (P), P))dP(y) = 0.
Since θ 0 is the unique solution of Equation (41) and, according to (48), T c (P 0 ) would be another solution to the same equation, we deduce (47). From (34) and (45), we have ψ θ c ϕ (y, t c ( θ c ϕ , P n ))dP n (y) = 0, ∂ ∂θ m(y, θ c ϕ , t c ( θ c ϕ , P n ))dP n (y) = 0, and then, can be viewed as a Z-estimator solution of the above system. Denoting the Z-estimators θ c ϕ , t c θ c ϕ are the solutions of the system: and the theoretical counterpart is given by

Asymptotic Properties
In this section, we establish the consistency and the asymptotic distributions for the estimators θ c ϕ and t c θ c ϕ . In order to prove the consistency of the estimators, we adopted the results from the general theory of Z-estimators as presented for example in [28]. Then, using the consistency of the estimators, as well as supplementary conditions, we proved that the asymptotic distributions of the estimators are multivariate normal: (a) There exist compact neighbourhoods V θ 0 of θ 0 and V t θ 0 of t θ 0 such that sup θ∈V θ 0 ,t∈V t θ 0 Ψ(y, θ, t) dP 0 (y) < ∞.
(b) For any positive ε, the following condition holds (a) Both estimators θ c ϕ and t c θ c ϕ converge in probability to θ 0 and t θ 0 , respectively.

Proposition 2.
Let P 0 belong to the model M 1 , and suppose that Assumption 2 holds. Then, both and The condition of Type (a) from Assumption 1 is usually considered to apply the uniform law of large numbers. For many choices of divergence (for example, those from the Cressie-Read family), the function Ψ is continuous in (θ, t), and consequently, this condition is verified. The second condition from Assumption 1 is imposed for the uniqueness of (θ 0 , t θ 0 ) as a solution of the equation and is verified, for example, whenever Ψ is continuous and the parameter space is compact ( [28], p. 46). Furthermore, the conditions of Type (b)-(d), included in Assumption 2, are often imposed in order to apply the law of large numbers or the central limit theorem and can be verified for the functions appearing in the definitions of estimators proposed in the present paper.

Influence Functions and Robustness
In this section, we derive the influence functions of the estimators t c θ and θ c ϕ and prove their B-robustness. The corresponding statistical functionals are defined by (36) and (46), respectively.
Recall that, a map T, defined on a set of probability measures and parameter-spacevalued, is a statistical functional corresponding to an estimator θ of the parameter θ 0 from the model P 0 , if θ = T(P n ), P n being the empirical measure corresponding to the sample. The influence function of T at P 0 is defined by where P εx := (1 − ε) P 0 + ε δ x , δ x being the Dirac measure. An unbounded influence function implies an unbounded asymptotic bias of a statistic under single-point contamination of the model. Therefore, a natural robustness requirement on a statistical functional is the boundedness of its influence function. Whenever the influence function is bounded with respect to x, the corresponding estimator is called B-robust [26].

Proposition 3.
For fixed θ, the influence function of the functional t c θ is given by Proposition 4. The influence function of the functional T c is given by On the basis of Propositions 3 and 4, since x → ψ θ (x, t θ (P 0 )) is bounded, all the estimators θ c ϕ are B-robust.

Conclusions
We introduced a class of robust Z-estimators for moment condition models. These new estimators can be seen as robust alternatives for the minimum empirical divergence estimators. By using truncated functions based on the multidimensional Huber function, we defined robust estimators of the element that realizes the supremum in the dual form of the divergence, as well as new robust estimators for the parameter of the model. The asymptotic properties were proven, including the consistency and the limit laws. The influence functions for all the proposed estimators are bounded; therefore, these estimators are B-robust. The truncated function that we used to define the new robust Z-estimators contains functions implicitly defined, for which analytic forms are not available. The implementation of the estimation method will be addressed in a future research study. The idea of using the multidimensional Huber function, together with a scale matrix and a shift vector, to create a bounded version of the function corresponding to the estimating equation for the parameter of interest, could be considered in other contexts as well and would lead to new robust Z-estimators. As one of the Referees suggested, some other bounded functions could be used to define new robust Z-estimators for moment condition models. For example, the Tukey biweight function used together with a norm inside, in order to be appropriate to be applied to functions with vector values, could also be considered. Again, the original parameter of interest should remain the solution of the estimating equation based on the new bounded function. Such an idea is interesting to be analysed in future studies, in order to provide new robust versions of minimum empirical divergence estimators or robust Z-estimators in other contexts. ( ∂ ∂t ψ(y, θ 0 , t θ 0 )dP n (y)) , ( ∂ ∂θ ψ(y, θ 0 , t θ 0 )dP n (y)) = ( ∂ ∂t ψ(y, θ 0 , t θ 0 )dP 0 (y)) , ( ∂ ∂θ ψ(y, θ 0 , t θ 0 )dP 0 (y)) + o P (1).

(A27)
For each θ, the influence function (55) is bounded with respect to x; therefore, the estimators t c θ are B-robust.