Higher Order Bias Correcting Moment Equation for M-Estimation and Its Higher Order Efficiency

Kim, Kyoo Il

doi:10.3390/econometrics4040048

Open AccessArticle

Higher Order Bias Correcting Moment Equation for M-Estimation and Its Higher Order Efficiency

by

Kyoo Il Kim

Department of Economics, Michigan State University, 486W Circle Dr, East Lansing, MI 48824, USA

Econometrics 2016, 4(4), 48; https://doi.org/10.3390/econometrics4040048

Submission received: 21 September 2016 / Revised: 15 November 2016 / Accepted: 23 November 2016 / Published: 8 December 2016

Download Versions Notes

Abstract

:

This paper studies an alternative bias correction for the M-estimator, which is obtained by correcting the moment equations in the spirit of Firth (1993). In particular, this paper compares the stochastic expansions of the analytically-bias-corrected estimator and the alternative estimator and finds that the third-order stochastic expansions of these two estimators are identical. This implies that at least in terms of the third-order stochastic expansion, we cannot improve on the simple one-step bias correction by using the bias correction of moment equations. This finding suggests that the comparison between the one-step bias correction and the method of correcting the moment equations or the fully-iterated bias correction should be based on the stochastic expansions higher than the third order.

Keywords:

third-order stochastic expansion; bias correction; M-estimation

JEL Classification:

C10

1. Introduction

Asymptotic bias corrections are pursued to make estimators closer to the truth values. There are several ways of achieving this goal, including analytical corrections, jackknife and bootstrap methods (see, e.g., Quenouille (1956) [1], Hall (1992) [2], Shao and Tu (1995) [3], MacKinnon and Smith (1998) [4], Andrews (2002) [5], Hahn and Newey (2004) [6], Bun and Carree (2005) [7], Bao and Ullah (2007) [8,9], Bao (2013) [10] and Yang (2015) [11]). This variety of bias correction methods evokes the issue whether one method is preferable to others at least on asymptotic efficiency grounds (e.g., see Hahn et al. (2004) [12]). For the maximum likelihood (ML) estimation, they show that a method of bias correction does not affect the higher order efficiency of any estimator that is first-order efficient in parametric or semiparametric models. An ML estimator is a class of M-estimators, and this paper extends their intuition to a general class of M-estimators.1

Specifically, this paper considers an alternative bias correction for the M-estimator, which is achieved by correcting moment equations in the spirit of Firth (1993) [13]. In particular, we compare the stochastic expansions of the analytically-bias-corrected estimator (which is referred to one-step bias correction) and the alternative estimator and find that the third-order stochastic expansions of these two estimators are identical. This is a stronger result than comparing higher order variances, since it implies that these two estimators do not only have the same higher order variances, but would also agree upon more properties in terms of their stochastic expansions.2 We do not consider other bias correction methods, such as bootstrap and jackknife methods, in this paper.

In the literature (see Hahn and Newey (2004) [6] and Fernandez-Val (2004) [14] for nonlinear panel data models), it has been discussed that removing the bias directly from the moment equations has an attractive feature that it does not use pre-estimated parameters that are not bias corrected, though this alternative approach requires more intensive computations.3 Because the analytically-bias-corrected estimator is a two-step estimator, for which an initial estimator needs to be plugged in, while the bias-corrected moment equations estimator is a one-step estimator that does not need an initial estimator, the higher order asymptotic equivalence of these two estimators is not obvious. This paper, however, shows that at least for the third-order stochastic expansion, there is no benefit of using the bias correction of the moment equations over the simple one-step bias correction in the context of M-estimators. This finding suggests that the comparison between the one-step bias correction and the method of correcting the moment equations should be based on the stochastic expansions higher than the third order.

Examples of the M-estimation include maximum likelihood estimation (MLE), least squares and instrumental variable (IV) estimation. Many other useful estimators can also fit into the M-estimation framework with the appropriate definition of the moment equations. It includes some cases of the generalized method of moments (GMM; see examples in Rilstone et al. (1996) [15]) and two-step estimators (Newey (1984) [16]). We note that the generalized empirical likelihood (GEL) can also fit into this framework. This suggests that Firth (1993)’s [13] correcting moment equations approach can be an alternative to Newey and Smith’s approach to obtain the higher order bias and variance terms of GEL (2004) [17].

Our paper is organized as follows. In Section 2, we derive the higher order stochastic expansion of the M-estimator and consider the one-step bias correction. Section 3 introduces the bias-corrected moment equations estimator and derives its higher order stochastic expansion. Section 4 discusses the higher order efficiency properties of several analytically-bias-corrected estimators. We conclude in Section 5. Primitive conditions for the validity of the higher order stochastic expansions and mathematical details are gathered in Appendix A and Appendix B.

2. Higher Order Expansion for the M-Estimator

Consider a moment condition:

E [s (z_{i}, θ_{0})] = 0

(1)

where

s (z_{i}, θ)

is a known

k \times 1

vector-valued function of the data, and a parameter vector

θ \in Θ \subset R^{k}

and

z_{i}

includes both endogenous and exogenous variables. The M-estimator is obtained by solving:

\frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, \hat{θ}) = 0 .

(2)

Examples for this class of estimators include MLE, least squares and IV estimation. In the MLE,

s (z_{i}, θ)

is the single observation score function. For the linear or nonlinear regression model of

y_{i} = f (X_{i}; θ_{0}) + ε_{i}

, we set

s (z_{i}, θ) = \frac{\partial f (X_{i}; θ)}{\partial θ} (y_{i} - f (X_{i}; θ))

and

z_{i} = (y_{i}

X_{i}^{'})^{'}

for a known function

f (\cdot)

. In the linear IV model, we have

s (z_{i}, θ) = w_{i} (y_{i} - X_{i}^{'} θ)

and

z_{i} = (y_{i}

X_{i}^{'}

w_{i}^{'})^{'}

for some instruments

w_{i}

with

dim (w_{i}) = dim (θ)

. Two-step estimators such as two-stage least squares, feasible generalized least squares (GLS) and Heckman (1979) [18]’s two-step estimator also fit into this framework (see Newey (1984) [16]). Rilstone et al. (1996) [15] provide some special cases of GMM estimators that can be put into the M-estimation, but the examples are not restricted to those. Partly motivated with this wide applicability, we study the stochastic expansion and the bias correction of the M-estimator.

We obtain the higher order stochastic expansion of the M-estimator using the iterative approach used in Rilstone et al. (1996) [15] up to a certain order. This approach is analytically convenient and straightforward to implement since the estimators are expressed as functions of the sums of random variables. Edgeworth expansion can be considered as an alternative whose validity has been derived in Bhattacharya and Ghosh (1978) [19], but the stochastic expansion approach is noted as a much simpler approach. Moreover, the main purpose of this paper is to provide the comparison of several estimators based on the higher order variance (

O (n^{- 1})

variance). Noting that rankings based on the higher order variances in a third-order stochastic expansion are equivalent to rankings based on the variances of an Edgeworth expansion as shown in Pfanzagl and Wefelmeyer (1978) [20] and Ghosh et al. (1980) [21] and as discussed in Rothenberg (1984) [22], it suffices to use the simple stochastic expansions for our purposes.

Here, we borrow Rilstone et al. (1996) [15]’s notation. We denote the matrix of υ-th order partial derivatives of a matrix

A (θ)

as

\nabla^{υ} A (θ)

. Specifically, if

A (θ)

is a

k \times 1

vector function,

\nabla A (θ)

is the usual Jacobian whose l-th row contains the partial derivatives of the l-th element of

A (θ)

.

\nabla^{υ} A (θ)

(a

k \times k^{υ}

matrix) is defined recursively, such that the j-th element of the l-th row of

\nabla^{υ} A (θ)

is the

1 \times k

vector

a_{l j}^{υ} (θ) = \partial a_{l j}^{υ - 1} (θ) / \partial θ^{'}

, where

a_{l j}^{υ - 1}

is the l-th row and the j-th element of

\nabla^{υ - 1} A (θ)

. We use ⊗ to denote a usual Kronecker product. Using this Kronecker product, we can express

\nabla^{υ} A (θ) = \frac{\partial^{υ} A (θ)}{\underset{υ Kronecker product of \partial θ^{'}}{\underset{︸}{\partial θ^{'} \otimes \partial θ^{'} \otimes \dots \otimes \partial θ^{'}}}}

. Finally, we use a matrix norm

∥A∥ = \sqrt{t r (A^{'} A)}

for a matrix A.

We first derive the higher order stochastic expansion of the M-estimator and consider the one-step bias correction here. In the next section, we introduce the bias-corrected moment equations estimator and derive its higher order stochastic expansion. Then, we compare these two approaches.

Before we derive the second-order expansion of the M-estimator to obtain the second-order bias analytically, we introduce simplifying notation. Let

H_{1} (θ) = E [\nabla s (z_{i}, θ)]

,

H_{2} (θ) = E [\nabla^{2} s (z_{i}, θ)]

,

Q (θ) = {(- E [\nabla s (z_{i}, θ)])}^{- 1}

, and write

H_{1} = H_{1} (θ_{0})

,

H_{2} = H_{2} (θ_{0})

,

Q = Q (θ_{0})

. Let

{\hat{H}}_{1} (θ) = \frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, θ),

{\hat{H}}_{2} (θ) = \frac{1}{n} \sum_{i = 1}^{n} \nabla^{2} s (z_{i}, θ),

\hat{Q} (θ) = {(- {\hat{H}}_{1} (θ))}^{- 1}

,

{\hat{H}}_{1} = {\hat{H}}_{1} (θ_{0})

,

{\hat{H}}_{2} = {\hat{H}}_{2} (θ_{0})

and

\hat{Q} = \hat{Q} (θ_{0})

. Furthermore, define

J \equiv \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0})

,

V \equiv \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla s (z_{i}, θ_{0}) - E [\nabla s (z_{i}, θ_{0})])

,

W \equiv \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla^{2} s (z_{i}, θ_{0}) - E [\nabla^{2} s (z_{i}, θ_{0})])

.

Lemma 1.

(Rilstone et al. (1996) [15]) Suppose

{z_{i}}_{i = 1}^{n}

are i.i.d.;

θ_{0}

is in the interior of Θ, and is the only

θ \in Θ

satisfying (1); and the M-estimator

\hat{θ}

defined in (2) is consistent. Further suppose that: (i)

s (z, θ)

is κ-times continuously differentiable in the neighborhood of

θ_{0}

, denoted by

Θ_{0} \subset Θ

for all

z \in Z \equiv S u p p o r t (z_{i})

,

κ \geq 3

with probability one; (iia)

\nabla^{υ} s (z, θ)

is integrable for each fixed

θ \in

Θ_{0}

,

υ = {0, 1, 2, \dots κ}

,

κ \geq 3

; and (iib)

E [\nabla^{3} s (z, θ)]

is continuous and bounded at

θ_{0}

; (iii)

∥\frac{1}{n} \sum_{i = 1}^{n} \nabla^{υ} s (z_{i}, \bar{θ}) - E [\nabla^{υ} s (z_{i}, θ_{0})]∥ = o_{p} (1)

for

\bar{θ} = θ_{0} + o_{p} (1)

and

υ = 1, 2;

(iv)

\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla^{2} s (z_{i}, \bar{θ}) - H_{2} (\bar{θ})) - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla^{2} s (z_{i}, θ_{0}) - H_{2} (θ_{0})) = o_{p} (1)

for

\bar{θ} = θ_{0} + o_{p} (1);

(v)

Q (θ_{0})

exists, i.e.,

E [\nabla s (z_{i}, θ_{0})]

is nonsingular; (vi)

J = O_{p} (1);

(vii)

V = O_{p} (1);

(viii)

W = O_{p} (1)

. Then, we have

\sqrt{n} (\hat{θ} - θ_{0}) = Q J + O_{p} (\frac{1}{\sqrt{n}})

, and moreover,

\sqrt{n} (\hat{θ} - θ_{0}) = Q J + \frac{1}{\sqrt{n}} Q (V Q J + \frac{1}{2} H_{2} (Q J \otimes Q J)) + O_{p} (n^{- 1})

.

This lemma and the following Lemma 2 are available in Rilstone et al. (1996) [15], but we reproduce them since some of their results are useful to derive our new results. From Lemma 1, the higher order bias of

\hat{θ}

is obtained as:

B i a s (\hat{θ}) \equiv \frac{1}{n} Q (E [V Q J] + \frac{1}{2} H_{2} E [(Q J \otimes Q J)]) .

Defining

d_{i} (θ) = Q (θ) s (z_{i}, θ)

and

v_{i} (θ) = \nabla s (z_{i}, θ) - E [\nabla s (z_{i}, θ)]

and letting

d_{i} = d_{i} (θ_{0})

and

v_{i} = v_{i} (θ_{0})

, it is not difficult to see

Q (E [V Q J] + \frac{1}{2} H_{2} E [(Q J \otimes Q J)]) = Q (E [v_{i} d_{i}] + \frac{1}{2} H_{2} E [(d_{i} \otimes d_{i})])

, as shown below. In this regard, we will write

B (θ) \equiv Q (θ) (E [v_{i} (θ) d_{i} (θ)] + \frac{1}{2} H_{2} (θ) E [d_{i} (θ) \otimes d_{i} (θ)])

.

Lemma 2.

(Rilstone et al. (1996) [15]) Suppose (1) holds and

{\{z_{i}\}}_{i = 1}^{n}

are i.i.d. Then,

E [V Q J] + \frac{1}{2} H_{2} E [Q J \otimes Q J] = E [v_{i} d_{i}] + \frac{1}{2} H_{2} E [d_{i} \otimes d_{i}],

where

d_{i} = Q s (z_{i}, θ_{0})

and

v_{i} = \nabla s (z_{i}, θ_{0}) - E [\nabla s (z_{i}, θ_{0})]

.

Thus, we can eliminate the second-order bias of the M-estimator

\hat{θ}

by subtracting a consistent estimator of the bias.4 Now, let

{\hat{θ}}_{b c}

denote the bias-corrected estimator of this sort defined by:

{\hat{θ}}_{b c} = \hat{θ} - \frac{1}{n} \hat{B} (\hat{θ})

(3)

where the function

\hat{B} (θ)

, a consistent estimator of

B (θ)

, is constructed as:

\hat{Q} (θ) (\frac{1}{n} \sum_{i = 1}^{n} {\hat{v}}_{i} (θ) {\hat{d}}_{i} (θ) + \frac{1}{2} {\hat{H}}_{2} (θ) \frac{1}{n} \sum_{i = 1}^{n} ({\hat{d}}_{i} (θ) \otimes {\hat{d}}_{i} (θ)))

(4)

for

{\hat{d}}_{i} (θ) = \hat{Q} (θ) s (z_{i}, θ)

and

{\hat{v}}_{i} (θ) = \nabla s (z_{i}, θ)

. In particular, we can replace

\hat{θ}

in

\hat{B} (\hat{θ})

with any

\sqrt{n}

-consistent estimator of

θ_{0}

. In this sense,

{\hat{θ}}_{b c}

is a two-step estimator.

To characterize the higher order efficiency based on the higher order variance (

O (n^{- 1})

variance) of the bias-corrected estimators, we need to expand the M-estimator to the third order. We use some additional simplifying terms:

H_{3} (θ) = E [\nabla^{3} s (z, θ)],

{\hat{H}}_{3} (θ) = \frac{1}{n} \sum_{i = 1}^{n} \nabla^{3} s (z_{i}, θ),

H_{3} = H_{3} (θ_{0})

,

W_{3} \equiv \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla^{3} s (z_{i}, θ_{0}) - E [\nabla^{3} s (z_{i}, θ_{0})])

. Furthermore, we write:

\begin{matrix} a_{- 1 / 2} & = Q J, a_{- 1} = Q (V a_{- 1 / 2} + \frac{1}{2} H_{2} (a_{- 1 / 2} \otimes a_{- 1 / 2})) \\ a_{- 3 / 2} & = Q V a_{- 1} + \frac{1}{2} Q W (a_{- 1 / 2} \otimes a_{- 1 / 2}) + \frac{1}{2} Q H_{2} (a_{- 1 / 2} \otimes a_{- 1} + a_{- 1} \otimes a_{- 1 / 2}) \\ + \frac{1}{6} Q H_{3} (a_{- 1 / 2} \otimes a_{- 1 / 2} \otimes a_{- 1 / 2}) \end{matrix}

for the ease of notation. We obtain:

Lemma 3.

Suppose

{z_{i}}_{i = 1}^{n}

are i.i.d.,

θ_{0}

is in the interior of Θ, is the only

θ \in Θ

satisfying (1) and the M-estimator

\hat{θ}

that solves (2) is consistent. Further suppose that: (i)

s (z, θ)

is κ-times continuously differentiable in a neighborhood of

θ_{0}

, denoted by

Θ_{0} \subset Θ

for all

z \in Z

,

κ \geq 4

with probability one; (iia)

\nabla^{υ} s (z, θ)

is integrable for each fixed

θ \in Θ_{0}

,

υ = {0, 1, 2, \dots κ}

,

κ \geq 4

; (iib)

E [\nabla^{4} s (z, θ)]

is continuous and bounded at

θ_{0}

; (iii)

\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla^{3} s (z_{i}, \bar{θ}) - H_{3} (\bar{θ})) - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla^{3} s (z_{i}, θ_{0}) - H_{3} (θ_{0})) = o_{p} (1)

for

\bar{θ} = θ_{0} + o_{p} (1);

(iv) Q is nonsingular; (v)

J = O_{p} (1)

; (vi)

V = O_{p} (1)

; (vii)

W = O_{p} (1)

; (viii)

W_{3} = O_{p} (1);

(ix)

\sqrt{n} (\hat{θ} - θ_{0}) = a_{- 1 / 2} + \frac{1}{\sqrt{n}} a_{- 1} + O_{p} (\frac{1}{n})

. Then, we have

\sqrt{n} (\hat{θ} - θ_{0}) = a_{- 1 / 2} + \frac{1}{\sqrt{n}} a_{- 1} + \frac{1}{n} a_{- 3 / 2} + O_{p} (n^{- 3 / 2})

.

Note that the conditions in Lemma 3 are all standard regularity conditions.

In the following section, we propose an alternative one-step estimator that eliminates the second-order bias by adjusting the moment equations inspired by Firth (1993) [13].

3. Bias-Corrected Moment Equation

Here, we consider an alternative higher order bias reduced estimator that solves bias-corrected moment equations. This idea was proposed in Firth (1993) [13] for the ML with a fixed number of parameters and exploited in Hahn and Newey (2004) [6] and Fernandez-Val (2004) [14] for the nonlinear panel data models with individual specific effects. We refer to this estimator as Firth’s estimator.

To be precise, consider:

0 = \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, θ) - \frac{1}{n} c (θ)

for a known function

c (θ)

that is given by:

c (θ) = Q {(θ)}^{- 1} B (θ) = \frac{1}{2} H_{2} (θ) E [Q (θ) s (z_{i}, θ) \otimes Q (θ) s (z_{i}, θ)] + E [\nabla s (z_{i}, θ) Q (θ) s (z_{i}, θ)] .

(5)

This correction term

c (θ)

is obtained following Firth (1993) [13] and using the bias term for the M-estimator. In the ML context, Firth (1993) [13] shows that by adjusting the score function (he refers to this as a modified score function) with the correction term defined by the product of the Fisher information matrix and the bias term, one can obtain a bias-corrected ML estimator.

c (θ)

has the same interpretation in the ML, since

- Q {(θ)}^{- 1}

is the Hessian matrix, and hence,

Q {(θ)}^{- 1}

is the Fisher information in the ML. Therefore, (5) is a generalization of Firth (1993) [13]’s approach to the M-estimation . In general

c (θ)

contains population terms, and hence, to implement this alternative estimator, we need to estimate the function

c (θ)

. We use a sample analogue of (5) as:

\begin{matrix} \hat{c} (θ) & = & \hat{Q} {(θ)}^{- 1} \hat{B} (θ) \\ = & \frac{1}{2} {\hat{H}}_{2} (θ) (\frac{1}{n} \sum_{i = 1}^{n} [\hat{Q} (θ) s (z_{i}, θ) \otimes \hat{Q} (θ) s (z_{i}, θ)]) + \frac{1}{n} \sum_{i = 1}^{n} [\nabla s (z_{i}, θ) \hat{Q} (θ) s (z_{i}, θ)] . \end{matrix}

(6)

Now, we estimate

θ_{0}

by solving:

0 = \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, θ) - \frac{1}{n} \hat{c} (θ),

(7)

and claim that the solution of this modified moment condition eliminates the second-order bias of

\hat{θ}

that solves the original moment condition (2).

Assumption 1.

(i)

{\{z_{i}\}}_{i = 1}^{n}

are i.i.d.; (ii)

s (z, θ)

is κ-times continuously differentiable in a neighborhood of

θ_{0}

, denoted by

Θ_{0}

for all

z \in Z

,

κ \geq 4

; (iii)

E [{sup}_{θ \in Θ_{0}} {∥\nabla^{υ} s (z, θ)∥}^{2}] < \infty

,

υ = {0, 1, 2, \dots κ}

,

κ \geq 4

; (iv) Θ is compact; (v)

θ_{0}

is in the interior of Θ and is the only

θ \in Θ

satisfying (1); (vi)

E [{∥\nabla^{\bar{υ}} s (z, θ_{0})∥}^{4}] < \infty

for

\bar{υ} = {0, 1, 2, \dots, \bar{κ}}, \bar{κ} \geq 3

.

Assumption 2.

For

θ \in Θ_{0}

,

E [\frac{\partial s (z_{i}, θ)}{\partial θ^{'}}]

is nonsingular.

Alternatively, we can assume the following instead of Assumption 1.

Assumption 3.

(i)

{\{z_{i}\}}_{i = 1}^{n}

are i.i.d.; (ii)

\nabla^{υ} s (z, θ)

satisfies the Lipschitz condition in θ as:

∥\nabla^{υ} s (z, θ_{1}) - \nabla^{υ} s (z, θ_{2})∥ \leq B_{υ} (z) ∥θ_{1} - θ_{2}∥ \forall θ_{1}, θ_{2} \in Θ_{0}

for some function

B_{υ} (\cdot) : Z \to R

and

E [B_{υ} {(\cdot)}^{2 t + δ}] < \infty

,

υ = {0, 1, 2, \dots κ},

with positive integer

t \geq 2

and for some

δ > 0

and

κ \geq 4

in a neighborhood of

θ_{0}

; (iii)

E [{sup}_{θ \in Θ_{0}} {∥\nabla^{υ} s (z, θ)∥}^{2 t + δ}] < \infty

,

υ = {0, 1, 2, \dots κ}

,

κ \geq 4

with positive integer

t \geq 2

and for some

δ > 0

; (iv) Θ is bounded; (v)

θ_{0}

is in the interior of Θ and is the only

θ \in Θ

satisfying (1).

Under Assumptions 1 and 2 or Assumptions 3 and 2, the following three conditions are satisfied (see Lemma A.9 in Appendix A).

Condition 1.

(i)

\hat{c} (θ_{0}) = O_{p} (1)

; (ii)

\hat{c} (θ_{0}) = c (θ_{0}) + O_{p} (\frac{1}{\sqrt{n}})

.

Condition 2.

\nabla \hat{c} (θ) = O_{p} (1)

in the

n^{- 1 / 2}

neighborhood of

θ_{0}

.

Condition 3.

\nabla^{2} \hat{c} (θ) = O_{p} (1)

in the

n^{- 1 / 2}

neighborhood of

θ_{0}

.

Note that these three conditions are required to control for the estimation error in

\hat{c} (θ)

in the stochastic expansions. Now, we are ready to present one of our main results.

Proposition 1.

Suppose

θ^{^{*}}

solves (7) where

\hat{c} (θ)

is given by (6) and that

θ^{^{*}}

is a consistent estimator of

θ_{0}

. Further, suppose that Conditions 1–3 and Conditions (i)–(viii) in Lemma 1 are satisfied, then we have:

\sqrt{n} (θ^{^{*}} - θ_{0}) = Q J + \frac{1}{\sqrt{n}} Q (V Q J + \frac{1}{2} H_{2} (Q J \otimes Q J) - c (θ_{0})) + O_{p} (\frac{1}{n}),

where

c (θ_{0}) = \frac{1}{2} H_{2} E [Q s (z_{i}, θ_{0}) \otimes Q s (z_{i}, θ_{0})] + E [\nabla s (z_{i}, θ_{0}) Q s (z_{i}, θ_{0})]

, and hence, the second-order bias of

θ^{^{*}}

is

B i a s (θ^{^{*}}) \equiv \frac{1}{n} E [Q (V Q J + \frac{1}{2} H_{2} (Q J \otimes Q J) - c (θ_{0}))] = 0

.

This concludes that we can eliminate the second-order bias by adjusting the moment equations as (7), and it is a proper alternative to the analytic bias correction of (3).

4. Higher Order Efficiency

Asymptotic bias corrections can provide estimators that have better bias properties in the finite sample. There are several ways of achieving bias correction, including analytical corrections that we focus on in this paper, the jackknife and bootstrap methods. These abundant ways of bias correction evoke the issue of which method is preferable to others at least on asymptotic efficiency grounds. For the ML estimation, Hahn et al. (2004) [12] show that the method of bias correction does not affect the higher order efficiency of any bias-corrected estimator that is first-order efficient. Although the ML estimator is a class of the M-estimator we consider, it is not trivial to conjecture that the same equivalence result will hold for a general class of M-estimators because the equivalence in the ML can hold due to some specific properties of the ML estimator. In this section, we formally extend the equivalence result to a general M-estimator.

We compare the higher order efficiency of several first-order efficient bias-corrected estimators by comparing the higher order variances, which are defined by the

O (\frac{1}{n})

variance in a third-order stochastic expansion of an estimator.

4.1. Third-Order Expansion of the One-Step Bias-Corrected Estimator

To compare with the estimator of interest

θ^{^{*}}

in (7), first we consider a one-step bias-corrected estimator

{\hat{θ}}_{b c}

defined in (3) as

{\hat{θ}}_{b c} = \hat{θ} - \frac{1}{n} \hat{B} (\hat{θ})

and observe that

\hat{B} (\hat{θ}) = \hat{Q} (\hat{θ}) \hat{c} (\hat{θ})

from (4) and (6). We also consider its infeasible version

{\hat{θ}}_{b}

as

{\hat{θ}}_{b} = \hat{θ} - \frac{1}{n} B (\hat{θ})

, where the function

B (\hat{θ})

is constructed as

B (\hat{θ}) = Q (\hat{θ}) c (\hat{θ})

, provided that both

\hat{B} (\hat{θ})

and

B (\hat{θ})

are consistent estimators of the higher order bias term

B (θ_{0}) = Q (θ_{0}) c (θ_{0})

. Note that for some

\tilde{θ}

between

\hat{θ}

and

θ_{0}

, a first-order Taylor expansion gives us:

c (\hat{θ}) - c (θ_{0}) = \nabla c (\tilde{θ}) (\hat{θ} - θ_{0}) = O_{p} (1) O_{p} (1 / \sqrt{n}) = o_{p} (1)

under Condition 2 and because

\hat{θ} - θ_{0} = O_{p} (\frac{1}{\sqrt{n}})

. Furthermore, we have:

\begin{matrix} ∥\hat{c} (\hat{θ}) - c (θ_{0})∥ & \leq & ∥\hat{c} (\hat{θ}) - c (\hat{θ})∥ + ∥c (\hat{θ}) - c (θ_{0})∥ \\ \leq & sup_{θ \in Θ_{0}} ∥\hat{c} (θ) - c (θ)∥ + ∥c (\hat{θ}) - c (θ_{0})∥ = o_{p} (1) + o_{p} (1) = o_{p} (1) \end{matrix}

by the triangle inequality, Lemma A.7 (in Appendix A) and the continuity of

c (θ)

at

θ_{0}

(applying the Slutsky theorem), hence, both

B (\hat{θ})

and

\hat{B} (\hat{θ})

are indeed consistent estimators of the higher order bias noting that

Q (\hat{θ}) = Q (θ_{0}) + o_{p} (1)

by the continuity of

Q (θ)

at

θ_{0}

and

\hat{Q} (\hat{θ}) = Q (θ_{0}) + o_{p} (1)

.

Now, from the result of Lemma 3 and a second-order Taylor expansion of

B (\hat{θ})

, it follows that:

\begin{matrix} \sqrt{n} ({\hat{θ}}_{b} - θ_{0}) & = & \sqrt{n} (\hat{θ} - θ_{0}) - \frac{1}{\sqrt{n}} B (\hat{θ}) \\ = & \begin{matrix} a_{- 1 / 2} + \frac{1}{\sqrt{n}} a_{- 1} + \frac{1}{n} a_{- 3 / 2} + O_{p} (n^{- 3 / 2}) \\ - \frac{1}{\sqrt{n}} B (θ_{0}) - \frac{1}{\sqrt{n}} \nabla B (θ_{0}) (\hat{θ} - θ_{0}) - \frac{1}{2 \sqrt{n}} \nabla^{2} B (\tilde{θ}) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \end{matrix} \end{matrix}

where

\tilde{θ}

is a point between

\hat{θ}

and

θ_{0}

, and hence:

\sqrt{n} ({\hat{θ}}_{b} - θ_{0}) = a_{- 1 / 2} + \frac{1}{\sqrt{n}} (a_{- 1} - B (θ_{0})) + \frac{1}{n} (a_{- 3 / 2} - \nabla B (θ_{0}) a_{- 1 / 2}) + O_{p} (n^{- 3 / 2}),

(8)

since

\sqrt{n} (\hat{θ} - θ_{0}) = a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}})

and

\nabla^{2} B (\tilde{θ}) = \nabla^{2} B (θ_{0}) + o_{p} (1) = O_{p} (1)

by the Slutsky theorem, from which we conclude

\frac{1}{2 \sqrt{n}} \nabla^{2} B (\tilde{θ}) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) = O_{p} (n^{- 3 / 2})

.

Now, similarly for

{\hat{θ}}_{b c}

, we obtain:

\begin{matrix} \sqrt{n} ({\hat{θ}}_{b c} - θ_{0}) & = & \sqrt{n} (\hat{θ} - θ_{0}) - \frac{1}{\sqrt{n}} \hat{B} (\hat{θ}) \\ = & \begin{matrix} a_{- 1 / 2} + \frac{1}{\sqrt{n}} a_{- 1} + \frac{1}{n} a_{- 3 / 2} + O_{p} (n^{- 3 / 2}) \\ - \frac{1}{\sqrt{n}} \hat{B} (θ_{0}) - \frac{1}{\sqrt{n}} \nabla \hat{B} (θ_{0}) (\hat{θ} - θ_{0}) - \frac{1}{2 \sqrt{n}} \nabla^{2} \hat{B} (\tilde{θ}) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) . \end{matrix} \end{matrix}

(9)

Then, applying the following three results (that hold under Assumptions 1 and 2 or 3 and 2, as shown in Lemma A.12 in Appendix A):

Condition 4.

\hat{B} (θ_{0}) = B (θ_{0}) + O_{p} (1 / \sqrt{n})

,

Condition 5.

\nabla \hat{B} (θ_{0}) = \nabla B (θ_{0}) + O_{p} (1 / \sqrt{n})

,

Condition 6.

\nabla^{2} \hat{B} (θ) = O_{p} (1)

in the neighborhood of

θ_{0}

,

we obtain:

\begin{matrix} \sqrt{n} ({\hat{θ}}_{b c} - θ_{0}) \\ = & a_{- 1 / 2} + \frac{1}{\sqrt{n}} (a_{- 1} - B (θ_{0})) + \frac{1}{n} (a_{- 3 / 2} - \nabla B (θ_{0}) a_{- 1 / 2} - \sqrt{n} (\hat{B} (θ_{0}) - B (θ_{0}))) + O_{p} (n^{- 3 / 2}) \end{matrix}

(10)

noting that

\frac{1}{\sqrt{n}} \nabla \hat{B} (θ_{0}) (\hat{θ} - θ_{0}) = \frac{1}{n} \nabla \hat{B} (θ_{0}) (a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}})) = \frac{1}{n} \nabla B (θ_{0}) a_{- 1 / 2} + O_{p} (n^{- 3 / 2})

by Condition 5 and that

\frac{1}{2 \sqrt{n}} \nabla^{2} \hat{B} (\tilde{θ}) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) = O_{p} (n^{- 3 / 2})

by Condition 6 and the fact that

\hat{θ} - θ_{0} = O_{p} (1 / \sqrt{n})

.

4.2. Third-Order Expansion of the Bias-Corrected Moment Equations Estimator

Now, we derive the higher order expansion of the proposed bias-corrected estimator

θ^{^{*}}

up to the third order. For this, we need to verify an additional condition below, which is satisfied under Assumptions 1 and 2 or 3 and 2 with

κ \geq 5

as shown in Lemma A.10 and A.11 in Appendix A.

Condition 7.

(i)

\nabla \hat{c} (θ_{0}) = \nabla c (θ_{0}) + O_{p} (\frac{1}{\sqrt{n}})

; (ii)

\nabla^{3} \hat{c} (θ) = O_{p} (1)

in the

n^{- 1 / 2}

neighborhood of

θ_{0}

.

Recall that

c (θ) = Q^{- 1} (θ) B (θ)

and

\hat{c} (θ) = \hat{Q} {(θ)}^{- 1} \hat{B} (θ)

, and we obtain:

Proposition 2.

Suppose

θ^{^{*}}

solves (7), where

\hat{c} (θ)

is given in (6), and that

θ^{^{*}}

is consistent. Further, suppose that Conditions 1–7 and Conditions (i)–(viii) in Lemma 3 are satisfied, and assume

\sqrt{n} (\hat{θ} - θ_{0}) = a_{- 1 / 2} + \frac{1}{\sqrt{n}} (a_{- 1} - B (θ_{0})) + O_{p} (\frac{1}{n})

. Then, we have:

\begin{matrix} \sqrt{n} (θ^{^{*}} - θ_{0}) \\ {= a}_{- 1 / 2} + \frac{1}{\sqrt{n}} (a_{- 1} - B (θ_{0})) + \frac{1}{n} (a_{- 3 / 2} - \nabla B (θ_{0}) a_{- 1 / 2} - \sqrt{n} (\hat{B} (θ_{0}) - B (θ_{0}))) {+ O}_{p} {(n}^{- 3 / 2}) . \end{matrix}

(11)

Comparing (10) and (11), we therefore conclude that

\sqrt{n} (θ^{^{*}} - θ_{0})

and

\sqrt{n} ({\hat{θ}}_{b c} - θ_{0})

are identical up to

O_{p} (\frac{1}{n})

order terms. This implies that

θ^{^{*}}

and

{\hat{θ}}_{b c}

at least agree upon their higher order variances, as we discuss in the following section.

4.3. Higher Order Variances

For a three-term stochastic expansion of an estimator

\overset{ˇ}{θ}

, such as:

\sqrt{n} (\overset{ˇ}{θ} - θ_{0}) = T_{- 1 / 2} + \frac{1}{\sqrt{n}} T_{- 1} + \frac{1}{n} T_{- 3 / 2} + O_{p} (n^{- 3 / 2}),

the higher order variance is given by:

Λ_{\overset{ˇ}{θ}} \equiv Σ + \frac{1}{n} Ξ,

with

Σ =

Var

[T_{- 1 / 2}]

and

Ξ = Var [T_{- 1}] + E [(\sqrt{n} T_{- 1} + T_{- 3 / 2}) T_{- 1 / 2}^{'}] + E [T_{- 1 / 2} {(\sqrt{n} T_{- 1} + T_{- 3 / 2})}^{'}]

. Then, from the third-order stochastic expansions of the bias-corrected estimators derived in (8), (10) and (11), we can obtain the higher order variances of three alternative estimators, denoted by

Λ_{{\hat{θ}}_{b}}

,

Λ_{{\hat{θ}}_{b c}}

, and

Λ_{θ^{^{*}}}

, respectively, as:5

\begin{matrix} Λ_{{\hat{θ}}_{b}} & = & \{\begin{matrix} E [a_{- 1 / 2} a_{- 1 / 2}^{'}] + \frac{1}{n} E [(a_{- 1} - B (θ_{0})) {(a_{- 1} - B (θ_{0}))}^{'}] \\ + \frac{1}{n} E [a_{- 1 / 2} {(a_{- 3 / 2} - \nabla B (θ_{0}) a_{- 1 / 2})}^{'}] + \frac{1}{n} E [(a_{- 3 / 2} - \nabla B (θ_{0}) a_{- 1 / 2}) a_{- 1 / 2}^{'}] \\ + \frac{1}{n} E [\sqrt{n} a_{- 1 / 2} {(a_{- 1} - B (θ_{0}))}^{'}] + \frac{1}{n} E [\sqrt{n} (a_{- 1} - B (θ_{0})) a_{- 1 / 2}^{'}] \end{matrix}\} \end{matrix}

\begin{matrix} Λ_{{\hat{θ}}_{b c}} & = & Λ_{{\hat{θ}}_{b}} - \frac{1}{n} E [a_{- 1 / 2} \sqrt{n} {(\hat{B} (θ_{0}) - B (θ_{0}))}^{'}] - \frac{1}{n} E [\sqrt{n} (\hat{B} (θ_{0}) - B (θ_{0})) a_{- 1 / 2}^{'}] \end{matrix}

(12)

\begin{matrix} Λ_{θ^{^{*}}} & = & Λ_{{\hat{θ}}_{b c}} . \end{matrix}

(13)

First note that the result of (12) reveals that the higher order variance of

{\hat{θ}}_{b c}

has additional terms compared with

{\hat{θ}}_{b}

(infeasible estimator) because we use the sample analogue of the second-order bias, unless

E [a_{- 1 / 2} \sqrt{n} (\hat{B} (θ_{0}) - B (θ_{0}))] = 0

. These additional terms contribute to the cost of estimating the bias term

\hat{B} (\cdot)

in the analytic bias correction approach. Now, the result of (13) tells that the higher order variances of two alternative bias-corrected estimators are the same, so on the grounds of this higher order variances’ comparison, we find that the bias correction method by adjusting moment equations does not improve over the analytic bias correction.

Indeed, it is more remarkable that comparing the third-order expansions of (10) and (11), we further find:

n^{3 / 2} ({\hat{θ}}_{b c} - θ^{^{*}}) = o_{p} (1) .

(14)

This is a stronger result than just comparing the higher order variances because it implies that these two estimators do not only have the same higher order variance, but also agree on more properties in terms of their stochastic expansions. In the literature, it has been argued that removing the bias directly from the moment equations has an attractive feature that it does not use pre-estimated parameters that are not bias corrected, though this alternative approach requires more intensive computations, since it requires solving some nonlinear equation. However, in view of the result (14), this paper concludes that at least for the third-order stochastic expansion comparison, there is no benefit of using such bias correction of the moment equations over the simple bias-corrected estimator.

4.4. Further Comparison of Alternative Bias Corrections

To have a better understanding of the equivalence result (14), here we compare several versions of bias-corrected estimators that are infeasible by their nature. First, let

θ_{1}^{^{*}}

be the solution of

0 = \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, θ) - \frac{1}{n} c (θ)

where

c (θ)

is known. We also define two other bias-corrected estimators:

{\hat{θ}}_{2} = \hat{θ} - \hat{Q} (\hat{θ}) c (\hat{θ}) and {\hat{θ}}_{3} = \hat{θ} - Q (\hat{θ}) \hat{c} (\hat{θ}),

so for

{\hat{θ}}_{2}

,

c (θ)

is known, but

Q (θ)

is estimated, while for

{\hat{θ}}_{3}

,

c (θ)

is estimated, but

Q (θ)

is known. For these estimators, we obtain the following results.6

\begin{matrix} \sqrt{n} (θ_{1}^{^{*}} - θ_{0}) & = \sqrt{n} ({\hat{θ}}_{b} - θ_{0}) - \frac{1}{n} Q V B (θ_{0}) + O_{p} (n^{- 3 / 2}), \\ \sqrt{n} (θ_{1}^{^{*}} - θ_{0}) & = \sqrt{n} ({\hat{θ}}_{2} - θ_{0}) + O_{p} (n^{- 3 / 2}), \\ \sqrt{n} (θ^{^{*}} - θ_{0}) & = \sqrt{n} ({\hat{θ}}_{3} - θ_{0}) + \frac{1}{n} Q V B (θ_{0}) + O_{p} (n^{- 3 / 2}), \\ \sqrt{n} (θ^{^{*}} - θ_{0}) & = \sqrt{n} ({\hat{θ}}_{b c} - θ_{0}) + O_{p} (n^{- 3 / 2}) . \end{matrix}

The results illustrate that using

\hat{Q} (\cdot)

rather than

Q (\cdot)

in the bias correction term plays a critical role for equating the stochastic expansions (up to the third order) of the one-step bias-corrected estimator and the bias-corrected moment equations estimator. To see the point compare

θ_{1}^{^{*}}

and

{\hat{θ}}_{2}

and compare

θ^{^{*}}

and

{\hat{θ}}_{b c}

where in the former

c (\cdot)

is known, and in the latter,

c (\cdot)

is also estimated.

Next, we consider the possible iteration of the bias correction. Hahn and Newey (2004) [6] discuss the relationship between the bias correction of moment equations and the iterated bias correction. The iteration idea is that one can update

\hat{B} (\cdot)

several times using the previous estimator of

\hat{θ}

. To be precise, denoting

\hat{B} (θ)

as a function of θ, we can write the one-step bias-corrected estimator as

{\hat{θ}}_{b c}^{1} = \hat{θ} - \hat{B} (\hat{θ}) / n

. The k-th iteration will give us

{\hat{θ}}_{b c}^{k} = \hat{θ} - \hat{B} ({\hat{θ}}_{b c}^{k - 1}) / n

for

k \geq 2

where

{\hat{θ}}_{b c}^{1} = {\hat{θ}}_{b c}

. If we iterate this procedure until convergence, we will obtain

{\hat{θ}}_{b c}^{\infty} = \hat{θ} - \hat{B} ({\hat{θ}}_{b c}^{\infty}) / n

, which implies that

{\hat{θ}}_{b c}^{\infty}

solves (note

\hat{B} (θ) = \hat{Q} (θ) \hat{c} (θ)

):

0 = \hat{Q} {(θ)}^{- 1} (\hat{θ} - θ) - \frac{1}{n} \hat{c} (θ) = \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, \hat{θ}) + \hat{Q} {(θ)}^{- 1} (\hat{θ} - θ) - \frac{1}{n} \hat{c} (θ),

(15)

where the second equality is from the definition of

\hat{θ}

in (2). Observing that

\hat{Q} {(θ)}^{- 1} = - \frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, θ)

, if

s (z_{i}, θ)

is linear in θ, then we find that the Equation (15) is the same as Equation (7) for the bias-corrected moment equations; hence,

{\hat{θ}}_{b c}^{\infty}

is exactly the same with

θ^{^{*}}

. Otherwise, (15) is an approximation of (7). From this, we conclude that the fully-iterated bias-corrected estimator

{\hat{θ}}_{b c}^{\infty}

can be interpreted as the solution to an approximation of the bias-corrected moment Equation (7). Similarly to (9), for

\tilde{\tilde{θ}}

between

{\hat{θ}}_{b c}^{\infty}

and

θ_{0}

, we can show that:

\begin{matrix} \sqrt{n} ({\hat{θ}}_{b c}^{\infty} - θ_{0}) = \sqrt{n} (\hat{θ} - θ_{0}) - \frac{1}{\sqrt{n}} \hat{B} ({\hat{θ}}_{b c}^{\infty}) \\ = \begin{matrix} a_{- 1 / 2} + \frac{1}{\sqrt{n}} a_{- 1} + \frac{1}{n} a_{- 3 / 2} \\ - \frac{1}{\sqrt{n}} \hat{B} (θ_{0}) - \frac{1}{\sqrt{n}} \nabla \hat{B} (θ_{0}) ({\hat{θ}}_{b c}^{\infty} - θ_{0}) - \frac{1}{2 \sqrt{n}} \nabla^{2} \hat{B} (\tilde{\tilde{θ}}) (({\hat{θ}}_{b c}^{\infty} - θ_{0}) \otimes ({\hat{θ}}_{b c}^{\infty} - θ_{0})) + O_{p} (n^{- 3 / 2}) \end{matrix} \\ = a_{- 1 / 2} + \frac{1}{\sqrt{n}} (a_{- 1} - B (θ_{0})) + \frac{1}{n} (a_{- 3 / 2} - \nabla B (θ_{0}) a_{- 1 / 2} - \sqrt{n} (\hat{B} (θ_{0}) - B (θ_{0}))) + O_{p} (n^{- 3 / 2}) \end{matrix}

using Conditions 4, 5 and 6 and

\sqrt{n} ({\hat{θ}}_{b c}^{\infty} - θ_{0}) = Q J + O_{p} (1 / \sqrt{n})

. This result confirms that

\sqrt{n} ({\hat{θ}}_{b c}^{\infty} - {\hat{θ}}_{b c}) = O_{p} (n^{- 3 / 2})

, which actually holds for all

{\hat{θ}}_{b c}^{k}

(

k \geq 2

).

From this equivalence of the higher order expansions for

{\hat{θ}}_{b c}^{\infty}

and

{\hat{θ}}_{b c}

at least up to the third order term, one would expect that the higher order expansion of

θ^{^{*}}

will be equivalent to that of

{\hat{θ}}_{b c}

at least up to the third order, and we have verified that this intuition is correct. However, as observed in some Monte Carlo examples of Hahn and Newey (2004) [6] and Fernandez-Val (2004) [14], the iterative bias correction can lower the bias for small samples and so can the bias correction of the moment equations. This suggests that the comparison between the one-step bias correction and the method of correcting the moment equations (or the fully-iterated bias correction) should be based on the stochastic expansions higher than the third order. The comparison of the two alternative bias-corrected estimators over the fourth order or even higher order stochastic expansions can be challenging and is beyond the scope of this paper.

5. Conclusions

This paper considers an alternative bias correction for the M-estimator, which is achieved by correcting the moment equations in the spirit of Firth (1993) [13]. In particular, this paper compares the stochastic expansions of the analytically-bias-corrected estimator and the alternative estimator and finds that the third-order stochastic expansions of these two estimators are identical. This implies that these two estimators do not only have the same higher order variances, but also agree upon more properties in terms of their stochastic expansions.

We conclude that at least in terms of the third-order stochastic expansion, we cannot improve on the simple one-step bias-correction by using the bias correction of the moment equations. The intuition is that the fully-iterated bias-corrected estimator can be interpreted as the solution of an approximation to the bias-corrected moment equations, and the iteration will not improve the asymptotic properties in general; neither will the alternative estimator. We have verified this intuition in this paper.

multiple

Supplementary Materials

The following are available online at https://www.mdpi.com/2225-1146/4/4/48/s1, Technical Lemmas and Proofs.

Acknowledgments

I am truly grateful to Jinyong Hahn for his advice and encouragement. Three anonymous referees that greatly improved the paper with their comments and suggestions are gratefully acknowledged.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Technical Lemmas and Proofs

Some preliminary Lemmas and their proofs are available in the Supplementary Material, which are useful to derive the main results presented in the paper.

Appendix B. Proofs of the Main Lemmas and Propositions

This section collects proofs for the main results in the paper.

Appendix B.1. Proposition 1

Proof.

By the first-order Taylor series approximation of (7), we have:

0 = \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + (\frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, \tilde{θ})) (θ^{^{*}} - θ_{0}) - \frac{1}{n} \hat{c} (θ_{0}) - \frac{1}{n} \nabla \hat{c} (\tilde{θ}) (θ^{^{*}} - θ_{0})

for

\tilde{θ}

between

θ^{^{*}}

and

θ_{0}

and, hence:

\begin{matrix} \sqrt{n} (θ^{^{*}} - θ_{0}) \\ = - {(\frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, \tilde{θ}) - \frac{1}{n} \nabla \hat{c} (\tilde{θ}))}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) - \frac{1}{\sqrt{n}} \hat{c} (θ_{0})) \\ = - {(E [\nabla s (z_{i}, θ_{0})] + o_{p} (1) + O_{p} (\frac{1}{n}))}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + O_{p} (\frac{1}{\sqrt{n}})) = Q J + o_{p} (1), \end{matrix}

(B1)

by Conditions 1(i), 2 and

\tilde{θ} = θ_{0} + o_{p} (1)

provided that

∥\frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, \bar{θ}) - E [\nabla s (z_{i}, θ_{0})]∥ = o_{p} (1)

for

\bar{θ} = θ_{0} + o_{p} (1)

. This confirms that the estimator has the same first-order asymptotic distribution with

\sqrt{n} (\hat{θ} - θ_{0})

. Recalling

{\hat{H}}_{1} (θ) \equiv \frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, θ)

and

H_{1} (θ_{0}) (= - Q^{- 1}) \equiv E [\nabla s (z_{i}, θ_{0})]

, we can rewrite (B1) as:

\begin{matrix} \sqrt{n} (θ^{^{*}} - θ_{0}) \\ = - {(H_{1} (θ_{0}) - \frac{1}{n} \nabla \hat{c} (θ_{0}))}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) - \frac{1}{\sqrt{n}} \hat{c} (θ_{0})) \\ - ({({\hat{H}}_{1} (\tilde{θ}) - \frac{1}{n} \nabla \hat{c} (\tilde{θ}))}^{- 1} - {(H_{1} (θ_{0}) - \frac{1}{n} \nabla \hat{c} (θ_{0}))}^{- 1}) (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) - \frac{1}{\sqrt{n}} \hat{c} (θ_{0})) \\ = - {(H_{1} (θ_{0}) + O_{p} (1 / n))}^{- 1} (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + O_{p} (1 / \sqrt{n})) \\ - ({({\hat{H}}_{1} (\tilde{θ}) + O_{p} (1 / n))}^{- 1} - {(H_{1} (θ_{0}) + O_{p} (\frac{1}{n}))}^{- 1}) (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + O_{p} (1 / \sqrt{n})) \\ = - (H_{1} {(θ_{0})}^{- 1} + O_{p} (1 / n)) (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + O_{p} (1 / \sqrt{n})) \\ - ({\hat{H}}_{1} {(\tilde{θ})}^{- 1} - H_{1} {(θ_{0})}^{- 1} + O_{p} (1 / n)) (\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + O_{p} (1 / \sqrt{n})) \\ = - H_{1} {(θ_{0})}^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) - ({\hat{H}}_{1} {(\tilde{θ})}^{- 1} - H_{1} {(θ_{0})}^{- 1}) \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + O_{p} (1 / n) \\ = - H_{1} {(θ_{0})}^{- 1} \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + O_{p} (1 / n), \end{matrix}

where the second inequality is by Condition 2 and the last equality is obtained by

({\hat{H}}_{1} {(\tilde{θ})}^{- 1} - H_{1} {(θ_{0})}^{- 1}) = O_{p} (1 / \sqrt{n})

and

\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) = O_{p} (1)

, and hence, we have

\sqrt{n} (θ^{^{*}} - θ_{0}) = Q J + O_{p} (1 / \sqrt{n}) .

This implies that

θ^{^{*}}

and

\hat{θ}

have the same first order asymptotics. In order to analyze the higher order asymptotic distribution, we make a second-order Taylor series expansion:

0 = \{\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + (\frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, θ_{0})) (θ^{^{*}} - θ_{0}) \\ + \frac{1}{2} (\frac{1}{n} \sum_{i = 1}^{n} \nabla^{2} s (z_{i}, \tilde{θ})) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ - \frac{1}{n} \hat{c} (θ_{0}) - \frac{1}{n} \nabla \hat{c} (θ_{0}) (θ^{^{*}} - θ_{0}) - \frac{1}{2 n} \nabla^{2} \hat{c} (\tilde{θ}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \end{matrix}\} .

(B2)

We rewrite (B2) as:

\begin{matrix} 0 & = & \{\begin{matrix} \frac{1}{\sqrt{n}} J + (- Q^{- 1} + \frac{1}{\sqrt{n}} V) (θ^{^{*}} - θ_{0}) \\ + \frac{1}{2} (H_{2} + \frac{1}{\sqrt{n}} W) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) - \frac{1}{n} \hat{c} (θ_{0}) - \frac{1}{n} \nabla \hat{c} (θ_{0}) (θ^{^{*}} - θ_{0}) \\ - \frac{1}{2 n} \nabla^{2} \hat{c} (\tilde{θ}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) + O_{p} (n^{- 3 / 2}) \end{matrix}\} \\ = & \{\begin{matrix} \frac{1}{\sqrt{n}} J + (- Q^{- 1} + \frac{1}{\sqrt{n}} V) (θ^{^{*}} - θ_{0}) \\ + \frac{1}{2} (H_{2} + \frac{1}{\sqrt{n}} W) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) - \frac{1}{n} \hat{c} (θ_{0}) + O_{p} (n^{- 3 / 2}) \end{matrix}\} \end{matrix}

(B3)

since (a)

\frac{1}{n} \nabla \hat{c} (θ_{0}) (θ^{^{*}} - θ_{0}) = O_{p} (n^{- 3 / 2})

by Condition 2 and

θ^{^{*}} = θ_{0} + O_{p} (\frac{1}{\sqrt{n}})

from (B1) noting

J = O_{p} (1)

and since (b):

\begin{matrix} ∥\frac{1}{2 n} \nabla^{2} \hat{c} (\tilde{θ}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0}))∥ \\ \leq & \frac{1}{2 n} ∥\nabla^{2} \hat{c} (\tilde{θ})∥ {∥θ^{^{*}} - θ_{0}∥}^{2} = O (n^{- 1}) O_{p} (1) O_{p} (n^{- 1}) = O_{p} (n^{- 2}) \end{matrix}

by Condition 3 and

θ^{^{*}} = θ_{0} + O_{p} (1 / \sqrt{n})

.

From (B3), by observing that

θ^{^{*}}

and

\hat{θ}

have the same first-order asymptotics, we obtain:

\begin{matrix} \sqrt{n} (θ^{^{*}} - θ_{0}) & = & Q J + \frac{1}{\sqrt{n}} Q (V Q J + \frac{1}{2} H_{2} (Q J \otimes Q J) - \hat{c} (θ_{0})) + O_{p} (\frac{1}{n}) \\ = & Q J + \frac{1}{\sqrt{n}} Q (V Q J + \frac{1}{2} H_{2} (Q J \otimes Q J) - c (θ_{0})) + O_{p} (\frac{1}{n}), \end{matrix}

as in Lemma 1. The second equality comes from Condition 1(ii) (

\hat{c} (θ_{0}) = c (θ_{0}) + O_{p} (1 / \sqrt{n})

), and thus, the second-order bias

B i a s (θ^{^{*}}) \equiv \frac{1}{n} E [Q (V Q J + \frac{1}{2} H_{2} (Q J \otimes Q J) - c (θ_{0}))] = 0

since (noting

Q \equiv Q (θ_{0})

and

H_{2} \equiv H_{2} (θ_{0})

):

\begin{matrix} E [V Q J + \frac{1}{2} H_{2} (Q J \otimes Q J)] \\ = E [\nabla s (z_{i}, θ_{0}) Q (θ_{0}) s (z_{i}, θ_{0})] + \frac{1}{2} H_{2} E [Q (θ_{0}) s (z_{i}, θ_{0}) \otimes Q (θ_{0}) s (z_{i}, θ_{0})] = c (θ_{0}) \end{matrix}

by the definition of

c (θ)

in (5) and Lemma 2. ☐

Appendix B.2. Lemma 3

Proof.

Consider a higher order Taylor expansion of (2) around the true value of

θ = θ_{0}

up to the third order as:

\begin{matrix} 0 = \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + (\frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, θ_{0})) (\hat{θ} - θ_{0}) + \frac{1}{2} (\frac{1}{n} \sum_{i = 1}^{n} \nabla^{2} s (z_{i}, θ_{0})) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \\ + \frac{1}{6} (\frac{1}{n} \sum_{i = 1}^{n} \nabla^{3} s (z_{i}, \tilde{θ})) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})), \end{matrix}

where

\tilde{θ}

lies between

θ_{0}

and

\hat{θ}

. Now, by the stochastic equicontinuity Condition (iii) and

\tilde{θ} = θ_{0} + o_{p} (1)

, we have:

\frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla^{3} s (z_{i}, \tilde{θ}) - H_{3} (\tilde{θ})) - \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla^{3} s (z_{i}, θ_{0}) - H_{3} (θ_{0})) = o_{p} (1)

and hence:

\begin{matrix} (\frac{1}{n} \sum_{i = 1}^{n} \nabla^{3} s (z_{i}, \tilde{θ}) - \frac{1}{n} \sum_{i = 1}^{n} \nabla^{3} s (z_{i}, θ_{0})) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \\ = (H_{3} (\tilde{θ}) - H_{3} (θ_{0}) + o_{p} (\frac{1}{\sqrt{n}})) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \\ = ({\nabla (E [\nabla^{3} s (z_{i}, θ)])|}_{θ = \tilde{\tilde{θ}}} (\tilde{θ} - θ_{0}) + o_{p} (\frac{1}{\sqrt{n}})) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \\ = (E [\nabla^{4} s (z_{i}, θ_{0})] (\tilde{θ} - θ_{0}) + o_{p} (\frac{1}{\sqrt{n}})) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) = O_{p} (n^{- 2}), \end{matrix}

applying the mean value theorem where

\tilde{\tilde{θ}}

lies between

\tilde{θ}

and

θ_{0}

and from standard results on differentiating inside the integral. The second to last equality is from the continuity of

E [\nabla^{4} s (z_{i}, θ_{0})]

at

θ_{0}

and since

\tilde{\tilde{θ}} = θ_{0} + o_{p} (1)

. We, thus, obtain:

0 = \{\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + (\frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, θ_{0})) (\hat{θ} - θ_{0}) \\ + \frac{1}{2} (\frac{1}{n} \sum_{i = 1}^{n} \nabla^{2} s (z_{i}, θ_{0})) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \\ + \frac{1}{6} (\frac{1}{n} \sum_{i = 1}^{n} \nabla^{3} s (z_{i}, θ_{0})) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) + O_{p} (n^{- 2}) \end{matrix}\} .

(B4)

Now, note:

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, θ_{0}) = - Q^{- 1} + \frac{1}{\sqrt{n}} V, \frac{1}{n} \sum_{i = 1}^{n} \nabla^{2} s (z_{i}, θ_{0}) = H_{2} + \frac{1}{\sqrt{n}} W \\ \frac{1}{n} \sum_{i = 1}^{n} \nabla^{3} s (z_{i}, θ_{0}) = H_{3} + \frac{1}{\sqrt{n}} W_{3} with W_{3} \equiv \frac{1}{\sqrt{n}} \sum_{i = 1}^{n} (\nabla^{3} s (z_{i}, θ_{0}) - E [\nabla^{3} s (z_{i}, θ_{0})]) = O_{p} (1) . \end{matrix}

We then rewrite (B4) as:

0 = \{\begin{matrix} \frac{1}{\sqrt{n}} J + (- Q^{- 1} + \frac{1}{\sqrt{n}} V) (\hat{θ} - θ_{0}) + \frac{1}{2} (H_{2} + \frac{1}{\sqrt{n}} W) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \\ + \frac{1}{6} (H_{3} + \frac{1}{\sqrt{n}} W_{3}) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) + O_{p} (n^{- 2}) \end{matrix}\} .

(B5)

Now, note that we can expand:

\begin{matrix} {(- Q^{- 1} + \frac{1}{\sqrt{n}} V)}^{- 1} = {(I - Q \frac{1}{\sqrt{n}} V)}^{- 1} (- Q) \\ = & - Q + O_{p} (\frac{1}{\sqrt{n}}) = - Q - \frac{1}{\sqrt{n}} Q V Q + O_{p} (n^{- 1}) = - Q - \frac{1}{\sqrt{n}} Q V Q - \frac{1}{n} Q V Q V Q + O_{p} (n^{- 3 / 2}) \end{matrix}

(B6)

depending on the expansions we need. Plugging (B6) into (B5) (depending on the orders we need) and inspecting the orders, we have:

\begin{matrix} \hat{θ} - θ_{0} \\ = & \{\begin{matrix} - {(- Q^{- 1} + \frac{1}{\sqrt{n}} V)}^{- 1} \frac{1}{\sqrt{n}} J \\ - \frac{1}{2} {(- Q^{- 1} + \frac{1}{\sqrt{n}} V)}^{- 1} (H_{2} + \frac{1}{\sqrt{n}} W) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \\ - \frac{1}{6} {(- Q^{- 1} + \frac{1}{\sqrt{n}} V)}^{- 1} (H_{3} + \frac{1}{\sqrt{n}} W_{3}) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) + O_{p} (n^{- 2}) \end{matrix}\} \\ = & \{\begin{matrix} - (- Q - \frac{1}{\sqrt{n}} Q V Q - \frac{1}{n} Q V Q V Q + O_{p} (n^{- 3 / 2})) \frac{1}{\sqrt{n}} J \\ - \frac{1}{2} (- Q - \frac{1}{\sqrt{n}} Q V Q + O_{p} (n^{- 1})) (H_{2} + \frac{1}{\sqrt{n}} W) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \\ - \frac{1}{6} (- Q + O_{p} (\frac{1}{\sqrt{n}})) (H_{3} + \frac{1}{\sqrt{n}} W_{3}) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) + O_{p} (n^{- 2}) \end{matrix}\} \end{matrix}

(B7)

\begin{matrix} = & \{\begin{matrix} (\frac{1}{\sqrt{n}} Q J + \frac{1}{n} Q V Q J + \frac{1}{n^{3 / 2}} Q V Q V Q J) \\ + \frac{1}{2} (Q (H_{2} + \frac{1}{\sqrt{n}} W) + \frac{1}{\sqrt{n}} Q V Q H_{2}) ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) \\ + \frac{1}{6} Q H_{3} ((\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0}) \otimes (\hat{θ} - θ_{0})) + O_{p} (n^{- 2}) \end{matrix}\} . \end{matrix}

(B8)

Now, plugging

\sqrt{n} (\hat{θ} - θ_{0}) = a_{- 1 / 2} + O_{p} (1 / \sqrt{n})

or

\sqrt{n} (\hat{θ} - θ_{0}) = a_{- 1 / 2} + \frac{1}{\sqrt{n}} a_{- 1} + O_{p} (1 / n)

into (B8) depending on the orders required, we obtain:

\begin{matrix} \hat{θ} - θ_{0} \\ = & \{\begin{matrix} (\frac{1}{\sqrt{n}} Q J + \frac{1}{n} Q V Q J + \frac{1}{n^{3 / 2}} Q V Q V Q J) \\ + \frac{1}{2} \frac{1}{n} (Q (H_{2} + \frac{W}{\sqrt{n}}) + \frac{Q V Q H_{2}}{\sqrt{n}}) ((a_{- 1 / 2} + \frac{a_{- 1}}{\sqrt{n}} + O_{p} (\frac{1}{n})) \otimes (a_{- 1 / 2} + \frac{a_{- 1}}{\sqrt{n}} + O_{p} (\frac{1}{n}))) \\ + \frac{1}{6} \frac{1}{n^{3 / 2}} Q H_{3} ((a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}})) \otimes (a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}})) \otimes (a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}}))) \\ + O_{p} (n^{- 2}) \end{matrix}\} \\ = & \{\begin{matrix} (\frac{1}{\sqrt{n}} Q J + \frac{1}{n} Q V Q J + \frac{1}{n^{3 / 2}} Q V Q V Q J) \\ + \frac{1}{2} \frac{1}{n} (Q (H_{2} + \frac{1}{\sqrt{n}} W) + \frac{1}{\sqrt{n}} Q V Q H_{2}) ((a_{- 1 / 2} + \frac{1}{\sqrt{n}} a_{- 1}) \otimes (a_{- 1 / 2} + \frac{1}{\sqrt{n}} a_{- 1})) \\ + \frac{1}{6} \frac{1}{n^{3 / 2}} Q H_{3} (a_{- 1 / 2} \otimes a_{- 1 / 2} \otimes a_{- 1 / 2}) + O_{p} (n^{- 2}) \end{matrix}\} \\ = & \{\begin{matrix} (\frac{1}{\sqrt{n}} Q J + \frac{1}{n} Q V Q J + \frac{1}{n^{3 / 2}} Q V Q V Q J) + \frac{1}{2} \frac{1}{n} Q H_{2} (a_{- 1 / 2} \otimes a_{- 1 / 2}) \\ + \frac{1}{2} \frac{1}{n^{3 / 2}} (Q H_{2} ((a_{- 1 / 2} \otimes a_{- 1}) + (a_{- 1} \otimes a_{- 1 / 2})) + (Q W + Q V Q H_{2}) (a_{- 1 / 2} \otimes a_{- 1 / 2})) \\ + \frac{1}{6} \frac{1}{n^{3 / 2}} Q H_{3} (a_{- 1 / 2} \otimes a_{- 1 / 2} \otimes a_{- 1 / 2}) + O_{p} (n^{- 2}) \end{matrix}\} . \end{matrix}

(B9)

Finally, rearranging (B9) according to the orders, we have:

\begin{matrix} \hat{θ} - θ_{0} \\ = & \{\begin{matrix} \frac{1}{\sqrt{n}} Q J + \frac{1}{n} Q (V Q J + \frac{1}{2} H_{2} (a_{- 1 / 2} \otimes a_{- 1 / 2})) \\ + \frac{1}{n^{3 / 2}} (Q V Q (V Q J + \frac{1}{2} H_{2} (a_{- 1 / 2} \otimes a_{- 1 / 2})) + \frac{1}{2} Q W (a_{- 1 / 2} \otimes a_{- 1 / 2})) \\ + \frac{1}{n^{3 / 2}} (\frac{1}{2} Q H_{2} ((a_{- 1 / 2} \otimes a_{- 1}) + (a_{- 1} \otimes a_{- 1 / 2})) + \frac{1}{6} Q H_{3} (a_{- 1 / 2} \otimes a_{- 1 / 2} \otimes a_{- 1 / 2})) \\ + O_{p} (n^{- 2}) \end{matrix}\} \\ = & \{\begin{matrix} \frac{1}{\sqrt{n}} a_{- 1 / 2} + \frac{1}{n} a_{- 1} \\ + \frac{1}{n^{3 / 2}} (Q V a_{- 1} + \frac{1}{2} Q W (a_{- 1 / 2} \otimes a_{- 1 / 2}) + \frac{1}{2} Q H_{2} ((a_{- 1 / 2} \otimes a_{- 1}) + (a_{- 1} \otimes a_{- 1 / 2}))) \\ + \frac{1}{n^{3 / 2}} (\frac{1}{6} Q H_{3} (a_{- 1 / 2} \otimes a_{- 1 / 2} \otimes a_{- 1 / 2})) + O_{p} (n^{- 2}) \end{matrix}\} . \end{matrix}

☐

Appendix B.3. Proposition 2

Proof.

Now, consider a third-order Taylor series expansion of

0 = \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, θ^{^{*}}) - \frac{1}{n} \hat{c} (θ^{^{*}})

:

\begin{matrix} 0 = \frac{1}{n} \sum_{i = 1}^{n} s (z_{i}, θ_{0}) + (\frac{1}{n} \sum_{i = 1}^{n} \nabla s (z_{i}, θ_{0})) (θ^{^{*}} - θ_{0}) \\ + \frac{1}{2} (\frac{1}{n} \sum_{i = 1}^{n} \nabla^{2} s (z_{i}, θ_{0})) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ + \frac{1}{6} (\frac{1}{n} \sum_{i = 1}^{n} \nabla^{3} s (z_{i}, \tilde{θ})) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ - \frac{1}{n} \hat{c} (θ_{0}) - \frac{1}{n} \nabla \hat{c} (θ_{0}) (θ^{^{*}} - θ_{0}) - \frac{1}{2} \frac{1}{n} \nabla^{2} \hat{c} (θ_{0}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ - \frac{1}{6} \frac{1}{n} \nabla^{3} \hat{c} (\tilde{θ}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \end{matrix}

From this, similarly as (B4) to (B5), we obtain:

\begin{matrix} 0 = \frac{1}{\sqrt{n}} J + (- Q^{- 1} + \frac{1}{\sqrt{n}} V) (θ^{^{*}} - θ_{0}) + \frac{1}{2} (H_{2} + \frac{1}{\sqrt{n}} W) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ - \frac{1}{6} (H_{3} + \frac{1}{\sqrt{n}} W_{3}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ - \frac{1}{n} \hat{c} (θ_{0}) - \frac{1}{n} \nabla \hat{c} (θ_{0}) (θ^{^{*}} - θ_{0}) + O_{p} (n^{- 2}), \end{matrix}

since

\frac{1}{2} \frac{1}{n} \nabla^{2} \hat{c} (θ_{0}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) = O_{p} (n^{- 2})

by Condition 3 and

θ^{^{*}} = θ_{0} + O_{p} (\frac{1}{\sqrt{n}})

and since:

\begin{matrix} ∥\frac{1}{2} \frac{1}{n} \nabla^{3} \hat{c} (\tilde{θ}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0}))∥ \\ \leq \frac{1}{2} \frac{1}{n} ∥\nabla^{3} \hat{c} (\tilde{θ})∥ {∥θ^{^{*}} - θ_{0}∥}^{3} = O (n^{- 1}) O_{p} (1) O_{p} (n^{- 3 / 2}) = O_{p} (n^{- 5 / 2}) \end{matrix}

by Condition 7(ii) and

θ^{^{*}} = θ_{0} + O_{p} (\frac{1}{\sqrt{n}})

. Similarly to (B7), we obtain:

\begin{matrix} θ^{^{*}} - θ_{0} \\ = & \{\begin{matrix} - (- Q - \frac{1}{\sqrt{n}} Q V Q - \frac{1}{n} Q V Q V Q + O_{p} (n^{- 3 / 2})) \frac{1}{\sqrt{n}} J \\ - \frac{1}{2} (- Q - \frac{1}{\sqrt{n}} Q V Q + O_{p} (n^{- 1})) (H_{2} + \frac{1}{\sqrt{n}} W) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ - \frac{1}{6} (- Q + O_{p} (\frac{1}{\sqrt{n}})) (H_{3} + \frac{1}{\sqrt{n}} W_{3}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ + \frac{1}{n} (- Q - \frac{1}{\sqrt{n}} Q V Q + O_{p} (n^{- 1})) \hat{c} (θ_{0}) + \frac{1}{n} (- Q + O_{p} (\frac{1}{\sqrt{n}})) \nabla \hat{c} (θ_{0}) (θ^{^{*}} - θ_{0}) \\ + O_{p} (n^{- 2}) \end{matrix}\} \end{matrix}

(B10)

\begin{matrix} = & \{\begin{matrix} (\frac{1}{\sqrt{n}} Q J + \frac{1}{n} Q V Q J + \frac{1}{n^{3 / 2}} Q V Q V Q J) \\ + \frac{1}{2} (Q (H_{2} + \frac{1}{\sqrt{n}} W) + \frac{Q V Q H_{2}}{\sqrt{n}}) ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ + \frac{1}{6} Q H_{3} ((θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0}) \otimes (θ^{^{*}} - θ_{0})) \\ - \frac{1}{n} (Q + \frac{1}{\sqrt{n}} Q V Q) \hat{c} (θ_{0}) - \frac{1}{n} Q \nabla \hat{c} (θ_{0}) (θ^{^{*}} - θ_{0}) + O_{p} (n^{- 2}) . \end{matrix}\} \end{matrix}

(B11)

Now, replacing

\sqrt{n} (θ^{^{*}} - θ_{0}) = a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}})

or

\sqrt{n} (θ^{^{*}} - θ_{0}) = a_{- 1 / 2} + \frac{1}{\sqrt{n}} (a_{- 1} - Q c (θ_{0})) + O_{p} (\frac{1}{n})

in (B11) depending on the orders required, we obtain:

\begin{matrix} θ^{^{*}} - θ_{0} \\ = & \{\begin{matrix} (\frac{1}{\sqrt{n}} Q J + \frac{1}{n} Q V Q J + \frac{1}{n^{3 / 2}} Q V Q V Q J) \\ + \frac{1}{2 n} (Q (H_{2} + \frac{1}{\sqrt{n}} W) + \frac{Q V Q H_{2}}{\sqrt{n}}) (\begin{matrix} (a_{- 1 / 2} + \frac{a_{- 1} - Q c (θ_{0})}{\sqrt{n}} + O_{p} (n^{- 1})) \\ \otimes (a_{- 1 / 2} + \frac{a_{- 1} - Q c (θ_{0})}{\sqrt{n}} + O_{p} (n^{- 1})) \end{matrix}) \\ + \frac{1}{6 n^{3 / 2}} Q H_{3} ((a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}})) \otimes (a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}})) \otimes (a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}}))) \\ - \frac{1}{n} (Q + \frac{1}{\sqrt{n}} Q V Q) \hat{c} (θ_{0}) - \frac{1}{n^{3 / 2}} Q \nabla \hat{c} (θ_{0}) (a_{- 1 / 2} + O_{p} (\frac{1}{\sqrt{n}})) + O_{p} (n^{- 2}) \end{matrix}\} \\ = & \{\begin{matrix} (\frac{1}{\sqrt{n}} Q J + \frac{1}{n} Q V Q J + \frac{1}{n^{3 / 2}} Q V Q V Q J) \\ + \frac{1}{2 n} (Q (H_{2} + \frac{W}{\sqrt{n}}) + \frac{Q V Q H_{2}}{\sqrt{n}}) ((a_{- 1 / 2} + \frac{a_{- 1} - Q c (θ_{0})}{\sqrt{n}}) \otimes (a_{- 1 / 2} + \frac{a_{- 1} - Q c (θ_{0})}{\sqrt{n}})) \\ + \frac{1}{6 n^{3 / 2}} Q H_{3} (a_{- 1 / 2} \otimes a_{- 1 / 2} \otimes a_{- 1 / 2}) - \frac{1}{n} (Q + \frac{1}{\sqrt{n}} Q V Q) \hat{c} (θ_{0}) - \frac{1}{n^{3 / 2}} Q \nabla c (θ_{0}) a_{- 1 / 2} \\ + O_{p} (n^{- 2}), \end{matrix}\} \end{matrix}

where we replaced

\nabla \hat{c} (θ_{0})

with

\nabla c (θ_{0}) + O_{p} (\frac{1}{\sqrt{n}})

from Condition 7(i). Rearranging terms according to the orders, we have:

\begin{matrix} θ^{^{*}} - θ_{0} \\ = & \{\begin{matrix} \frac{1}{\sqrt{n}} Q J + \frac{1}{n} (Q V Q J + \frac{1}{2} Q H_{2} (a_{- 1 / 2} \otimes a_{- 1 / 2}) - Q \hat{c} (θ_{0})) \\ + \frac{1}{n^{3 / 2}} (\begin{matrix} Q V Q V Q J + \frac{1}{2} Q H_{2} ((a_{- 1 / 2} \otimes (a_{- 1} - Q c (θ_{0}))) + ((a_{- 1} - Q c (θ_{0})) \otimes a_{- 1 / 2})) \\ + \frac{1}{2} (Q W + Q V Q H_{2}) (a_{- 1 / 2} \otimes a_{- 1 / 2}) \\ + \frac{1}{6} Q H_{3} (a_{- 1 / 2} \otimes a_{- 1 / 2} \otimes a_{- 1 / 2}) - Q V Q \hat{c} (θ_{0}) - Q \nabla c (θ_{0}) a_{- 1 / 2} \end{matrix}) \\ + O_{p} (n^{- 2}) \end{matrix}\} \\ = & \{\begin{matrix} \frac{1}{\sqrt{n}} Q J + \frac{1}{n} (Q V Q J + \frac{1}{2} Q H_{2} (a_{- 1 / 2} \otimes a_{- 1 / 2}) - Q c (θ_{0})) \\ + \frac{1}{n^{3 / 2}} (\begin{matrix} Q V a_{- 1} + \frac{1}{2} Q H_{2} ((a_{- 1 / 2} \otimes (a_{- 1} - Q c (θ_{0}))) + ((a_{- 1} - Q c (θ_{0})) \otimes a_{- 1 / 2})) \\ + \frac{1}{2} Q W (a_{- 1 / 2} \otimes a_{- 1 / 2}) + \frac{1}{6} Q H_{3} (a_{- 1 / 2} \otimes a_{- 1 / 2} \otimes a_{- 1 / 2}) \\ - Q V Q (c (θ_{0}) + O_{p} (\frac{1}{\sqrt{n}})) - Q \nabla c (θ_{0}) a_{- 1 / 2} - \sqrt{n} Q (\hat{c} (θ_{0}) - c (θ_{0})) \end{matrix}) \\ + O_{p} (n^{- 2}) \end{matrix}\} \\ = & \{\begin{matrix} \frac{1}{\sqrt{n}} Q J + \frac{1}{n} (a_{- 1} - Q c (θ_{0})) \\ + \frac{1}{n^{3 / 2}} (\begin{matrix} a_{- 3 / 2} - \frac{1}{2} Q H_{2} ((a_{- 1 / 2} \otimes Q c (θ_{0})) + (Q c (θ_{0}) \otimes a_{- 1 / 2})) \\ - Q V Q c (θ_{0}) - Q \nabla c (θ_{0}) a_{- 1 / 2} - \sqrt{n} Q (\hat{c} (θ_{0}) - c (θ_{0})) \end{matrix}) + O_{p} (n^{- 2}), \end{matrix}\} \end{matrix}

(B12)

noting that

\hat{c} (θ_{0}) = c (θ_{0}) + O_{p} (\frac{1}{\sqrt{n}})

.

Now, we rewrite the higher order expansion of

θ^{^{*}}

in terms of

B (θ)

recalling that

Q {(θ)}^{- 1} B (θ) = c (θ)

, and hence:

\nabla c (θ) = Q {(θ)}^{- 1} \nabla B (θ) - v e c^{^{*}} (B {(θ)}^{'} \nabla (H_{1} {(θ)}^{'}))

(B13)

from Remark A.2 in Appendix A. From (B12), note:

\begin{matrix} \sqrt{n} (θ^{^{*}} - θ_{0}) \\ = & \begin{matrix} a_{- 1 / 2} + \frac{1}{\sqrt{n}} (a_{- 1} - Q c (θ_{0})) \\ + \frac{1}{n} (a_{- 3 / 2} - \frac{1}{2} Q H_{2} ((a_{- 1 / 2} \otimes Q c (θ_{0})) + (Q c (θ_{0}) \otimes a_{- 1 / 2})) - Q V Q c (θ_{0}) - Q \nabla c (θ_{0}) a_{- 1 / 2}) \\ - \frac{1}{n} Q \sqrt{n} (\hat{c} (θ_{0}) - c (θ_{0})) + O_{p} (n^{- 3 / 2}) \end{matrix} \end{matrix}

(B14)

from (11), and also note that:

\begin{matrix} \frac{1}{2} Q H_{2} ((a_{- 1 / 2} \otimes Q c (θ_{0})) + (Q c (θ_{0}) \otimes a_{- 1 / 2})) + Q V Q c (θ_{0}) + Q \nabla c (θ_{0}) a_{- 1 / 2} \\ = \frac{1}{2} Q H_{2} ((a_{- 1 / 2} \otimes B (θ_{0})) + (B (θ_{0}) \otimes a_{- 1 / 2})) + Q V B (θ_{0}) \\ + Q (Q {(θ_{0})}^{- 1} \nabla B (θ_{0}) - v e c^{^{*}} (B {(θ_{0})}^{'} \nabla (H_{1} {(θ_{0})}^{'}))) a_{- 1 / 2} \\ = \frac{1}{2} Q H_{2} ((a_{- 1 / 2} \otimes Q c (θ_{0})) + (Q c (θ_{0}) \otimes a_{- 1 / 2})) - Q v e c^{^{*}} (B {(θ_{0})}^{'} \nabla (H_{1} {(θ_{0})}^{'})) a_{- 1 / 2} \\ + \nabla B (θ_{0}) a_{- 1 / 2} + Q V B (θ_{0}) \end{matrix}

(B15)

from (B13) and

B (θ) = Q (θ) c (θ)

. We claim that:

\frac{1}{2} H_{2} ((a_{- 1 / 2} \otimes B (θ_{0})) + (B (θ_{0}) \otimes a_{- 1 / 2})) - v e c^{^{*}} (B {(θ_{0})}^{'} \nabla (H_{1} {(θ_{0})}^{'})) a_{- 1 / 2} = 0,

(B16)

which simplifies (B15) to

\nabla B (θ_{0}) a_{- 1 / 2} + Q V B (θ_{0})

. This is obvious when

dim (θ_{0}) = 1

, since:

\frac{1}{2} H_{2} ((a_{- 1 / 2} \otimes B (θ_{0})) + (B (θ_{0}) \otimes a_{- 1 / 2})) = H_{2} B (θ_{0}) a_{- 1 / 2}

and

v e c^{^{*}} (B {(θ_{0})}^{'} \nabla (H_{1} {(θ_{0})}^{'})) a_{- 1 / 2} = B (θ_{0}) H_{2} a_{- 1 / 2}

noting

\nabla (H_{1} {(θ_{0})}^{'}) = H_{2}

for the scalar case. To verify this for a general case with

dim (θ_{0}) = k

, we note

v e c (A B) = (I \otimes A) v e c (B) = (B^{'} \otimes I) v e c (A)

, and hence:

\begin{matrix} \frac{1}{2} H_{2} ((a_{- 1 / 2} \otimes B (θ_{0})) + (B (θ_{0}) \otimes a_{- 1 / 2})) = \frac{1}{2} H_{2} (v e c (B (θ_{0}) a_{- 1 / 2}^{'}) + v e c (a_{- 1 / 2} B {(θ_{0})}^{'})) \\ = \frac{1}{2} H_{2} ((I \otimes B (θ_{0})) a_{- 1 / 2} + (B (θ_{0}) \otimes I) a_{- 1 / 2}) = \frac{1}{2} H_{2} (I \otimes B (θ_{0}) + B (θ_{0}) \otimes I) a_{- 1 / 2} . \end{matrix}

Finally, after some tedious algebra, we find

\frac{1}{2} H_{2} (I \otimes B (θ_{0}) + B (θ_{0}) \otimes I) = v e c^{^{*}} (B {(θ_{0})}^{'} \nabla (H_{1} {(θ_{0})}^{'}))

, which concludes (B16). Therefore, we can rewrite (B14) as:

\begin{matrix} \sqrt{n} (θ^{^{*}} - θ_{0}) \\ = a_{- 1 / 2} + \frac{1}{\sqrt{n}} (a_{- 1} - B (θ_{0})) + \frac{1}{n} (a_{- 3 / 2} - \nabla B (θ_{0}) a_{- 1 / 2} - \sqrt{n} (\hat{B} (θ_{0}) - B (θ_{0}))) \\ + \frac{1}{n} (\sqrt{n} (\hat{Q} (θ_{0}) - Q (θ_{0})) \hat{c} (θ_{0}) - Q V B (θ_{0})) + O_{p} (n^{- 3 / 2}) . \end{matrix}

Now, we have

\sqrt{n} (\hat{Q} (θ_{0}) - Q (θ_{0})) \hat{c} (θ_{0}) - Q V B (θ_{0}) = O_{p} (\frac{1}{\sqrt{n}})

, and hence:

\begin{matrix} \sqrt{n} (θ^{^{*}} - θ_{0}) \\ = & a_{- 1 / 2} + \frac{1}{\sqrt{n}} (a_{- 1} - B (θ_{0})) + \frac{1}{n} (a_{- 3 / 2} - \nabla B (θ_{0}) a_{- 1 / 2} - \sqrt{n} (\hat{B} (θ_{0}) - B (θ_{0}))) + O_{p} (n^{- 3 / 2}) . \end{matrix}

This completes the proof. ☐

Appendix C. Higher Order Variances

Here, we derive the analytic forms of the higher order variances for several alternative estimators. Note that

E [(a_{- 1} - B (θ_{0})) {(a_{- 1} - B (θ_{0}))}^{'}] = E [a_{- 1} a_{- 1}^{'}] - B (θ_{0}) B {(θ_{0})}^{'}

,

E [\sqrt{n} a_{- 1 / 2} {(a_{- 1} - B (θ_{0}))}^{'}] = E [\sqrt{n} a_{- 1 / 2} a_{- 1}^{'}]

,

E [a_{- 1 / 2} {(a_{- 3 / 2} - \nabla B (θ_{0}) a_{- 1 / 2})}^{'}] = E [a_{- 1 / 2} a_{- 3 / 2}^{'}] - E [a_{- 1 / 2} a_{- 1 / 2}^{'}] {(\nabla B (θ_{0}))}^{'}

from

E [a_{- 1}] = B (θ_{0})

and

E [a_{- 1 / 2}] = 0

, and hence:

Λ_{{\hat{θ}}_{b}} = \{\begin{matrix} E [a_{- 1 / 2} a_{- 1 / 2}^{'}] + \frac{1}{n} (E [\sqrt{n} a_{- 1} a_{- 1 / 2}^{'}] + E [\sqrt{n} a_{- 1 / 2} a_{- 1}^{'}]) \\ + \frac{1}{n} (E [a_{- 1} a_{- 1}^{'}] + E [a_{- 3 / 2} a_{- 1 / 2}^{'}] + E [a_{- 1 / 2} a_{- 3 / 2}^{'}]) \\ - B (θ_{0}) B {(θ_{0})}^{'} - E [a_{- 1 / 2} a_{- 1 / 2}^{'}] {(\nabla B (θ_{0}))}^{'} - \nabla B (θ_{0}) E [a_{- 1 / 2} a_{- 1 / 2}^{'}] \end{matrix}\} .

Rilstone et al. (1996) [15] derive the second-order mean squared error (MSE) of the M-estimator that solves the moment condition (2). Proposition 3.4 in Rilstone et al. (1996) [15] implies that:

Λ_{{\hat{θ}}_{b}} = γ_{1} + \frac{1}{n} (γ_{2} + γ_{2}^{'}) + \frac{1}{n} (γ_{3} + γ_{4} + γ_{4}^{'}) - \frac{1}{n} (B (θ_{0}) B {(θ_{0})}^{'} + γ_{1} {(\nabla B (θ_{0}))}^{'} + \nabla B (θ_{0}) γ_{1}^{'}) + O (n^{- 2})

where (denoting the expectation of a function

A (θ)

as

\bar{A (θ)} = E [A (θ)]

for notational convenience):

\begin{matrix} γ_{1} & = & \bar{d_{1} d_{1}^{'}}, γ_{2} = Q \{\bar{v_{1} d_{1} d_{1}^{'}} + \frac{1}{2} H_{2} \bar{(d_{1} \otimes d_{1}) d_{1}^{'}}\} \\ γ_{3} & = & \{\begin{matrix} Q \{\bar{v_{1} d_{1} d_{2}^{'} V_{2}^{'}} + \bar{v_{1} d_{2} d_{1}^{'} v_{2}^{'}} + \bar{v_{1} d_{2} d_{2}^{'} v_{1}^{'}}\} Q^{'} \\ + Q H_{2} \{\bar{(d_{1} \otimes d_{1})} \bar{(d_{2}^{'} \otimes d_{2}^{'})} + \bar{(d_{1} \otimes d_{2}) (d_{1}^{'} \otimes d_{2}^{'})} + \bar{(d_{1} \otimes d_{2}) (d_{2}^{'} \otimes d_{1}^{'})}\} H_{2}^{'} Q^{'} \\ - Q \{\bar{v_{1} d_{1} (d_{2}^{'} \otimes d_{2}^{'})} + \bar{v_{1} d_{2} (d_{1}^{'} \otimes d_{2}^{'})} + \bar{v_{1} d_{2} (d_{2}^{'} \otimes d_{1}^{'})}\} H_{2}^{'} Q^{'} \\ - Q H_{2} \{\bar{d_{1} \otimes d_{1} d_{2}^{'} v_{2}^{'}} + \bar{(d_{1} \otimes d_{2}) d_{1}^{'} v_{2}^{'}} + \bar{(d_{1} \otimes d_{2}) d_{2}^{'} v_{1}^{'}}\} Q^{'} \end{matrix}\} \\ γ_{4} & = & \{\begin{matrix} Q \{\bar{v_{1} Q v_{1} d_{2} d_{2}^{'}} + \bar{v_{1} Q v_{2} d_{1} d_{2}^{'}} + \bar{v_{1} Q v_{2} d_{2} d_{1}^{'}}\} \\ + \frac{1}{2} Q \{\bar{v_{1} Q H_{2} (d_{1} \otimes d_{2}) d_{2}^{'}} + \bar{v_{1} Q H_{2} (d_{2} \otimes d_{1}) d_{2}^{'}} + \bar{v_{1} Q H_{2} (d_{2} \otimes d_{2}) d_{1}^{'}}\} \\ + \frac{1}{2} Q \{\bar{w_{1} (d_{1} \otimes d_{2}) d_{2}^{'}} + \bar{w_{1} (d_{2} \otimes d_{1}) d_{2}^{'}} + \bar{w_{1} (d_{2} \otimes d_{2}) d_{1}^{'}}\} \\ + \frac{1}{2} Q H_{2} \{\bar{(d_{1} \otimes Q v_{1} d_{2}) d_{2}^{'}} + \bar{(d_{1} \otimes Q v_{2} d_{1}) d_{2}^{'}} + \bar{(d_{1} \otimes Q v_{2} d_{2}) d_{1}^{'}}\} \\ + \frac{1}{4} Q H_{2} \{\bar{d_{1} \otimes Q H_{2} (d_{1} \otimes d_{2}) d_{2}^{'}} + \bar{d_{1} \otimes Q H_{2} (d_{2} \otimes d_{1}) d_{2}^{'}} + \bar{d_{1} \otimes Q H_{2} (d_{2} \otimes d_{2}) d_{1}^{'}}\} \\ + \frac{1}{2} Q H_{2} \{\bar{(Q V_{1} d_{1} \otimes d_{2}) d_{2}^{'}} + \bar{(Q V_{1} d_{2} \otimes d_{1}) d_{2}^{'}} + \bar{(Q V_{1} d_{2} \otimes d_{2}) d_{1}^{'}}\} \\ + \frac{1}{4} Q H_{2} \{\bar{(Q H_{2} (d_{1} \otimes d_{1}) \otimes d_{2}) d_{2}^{'}} + \bar{(Q H_{2} (d_{1} \otimes d_{2}) \otimes d_{1}) d_{2}^{'}} + \bar{(Q H_{2} (d_{1} \otimes d_{2}) \otimes d_{2}) d_{1}^{'}}\} \\ + \frac{1}{6} Q H_{3} \{\bar{[d_{1} \otimes d_{1} \otimes d_{2}] d_{2}^{'}} + \bar{[d_{1} \otimes d_{2} \otimes d_{1}] d_{2}^{'}} + \bar{[d_{1} \otimes d_{2} \otimes d_{2}] d_{1}^{'}}\} \end{matrix}\} . \end{matrix}

for

d_{i} = Q s (z_{i}, θ_{0})

,

v_{i} = \nabla s (z_{i}, θ_{0}) - E [\nabla s (z_{i}, θ_{0})]

and

w_{i} = \nabla^{2} s (z_{i}, θ_{0}) - E [\nabla^{2} s (z_{i}, θ_{0})]

. We also note

B (θ_{0}) = (Q \bar{v_{1} d_{1}} + \frac{1}{2} H_{2} \bar{d_{1} \otimes d_{1}})

from Lemma 2. Finally, we derive

\nabla B (θ_{0})

as follows. Noting

v e c^{*} (s {(z_{i}, θ_{0})}^{'} \nabla (Q {(θ_{0})}^{'})) = v e c^{*} (s {(z_{i}, θ_{0})}^{'} Q^{'} Q H_{2})

from Remark A.5 in Appendix A, we can show:

\begin{matrix} \nabla c (θ_{0}) = \frac{1}{2} v e c^{^{*}} ({(\bar{d_{1} \otimes d_{1}})}^{'} {\nabla (H_{2} {(θ)}^{'})|}_{θ = θ_{0}}) + \frac{1}{2} H_{2} (\bar{e_{1} \otimes^{^{*}} d_{1}} + \bar{v e c^{*} (d_{1}^{'} Q H_{2}) \otimes^{^{*}} d_{1}}) \\ + \frac{1}{2} H_{2} (\bar{d_{1} \otimes e_{1}} + \bar{d_{1} \otimes v e c^{*} (d_{1}^{'} Q H_{2})}) + \bar{\nabla s_{1} (e_{1} + v e c^{*} (d_{1}^{'} Q H_{2}))} + \bar{v e c^{*} (d_{1}^{'} {\nabla ({(\nabla s_{1} (θ))}^{'})|}_{θ = θ_{0}})} \end{matrix}

where

e_{1} = Q \nabla s (z_{1}, θ_{0})

,

\nabla s_{1} (θ) = \nabla s (z_{1}, θ)

and

\nabla s_{1} = \nabla s (z_{1}, θ_{0})

. Combining this result with

\nabla B (θ_{0}) = Q (θ_{0}) \nabla c (θ_{0}) + v e c^{^{*}} (c {(θ_{0})}^{'} \nabla (Q {(θ_{0})}^{'}))

and

B (θ_{0}) = Q \{\bar{v_{1} d_{1}} + \frac{1}{2} H_{2} \bar{d_{1} \otimes d_{1}}\}

, we obtain:

\begin{matrix} \nabla B (θ_{0}) & = & Q (θ_{0}) \nabla c (θ_{0}) + v e c^{^{*}} (c {(θ_{0})}^{'} \nabla (Q {(θ_{0})}^{'})) \\ = & Q (θ_{0}) \nabla c (θ_{0}) + v e c^{^{*}} (c {(θ_{0})}^{'} Q^{'} Q H_{2}) = Q (θ_{0}) \nabla c (θ_{0}) + v e c^{^{*}} (B {(θ_{0})}^{'} Q H_{2}) \\ = & \{\begin{matrix} \frac{1}{2} Q v e c^{^{*}} ({(\bar{d_{1} \otimes d_{1}})}^{'} {\nabla (H_{2} {(θ)}^{'})|}_{θ = θ_{0}}) + \frac{1}{2} Q H_{2} (\bar{e_{1} \otimes^{^{*}} d_{1}} + \bar{v e c^{*} (d_{1}^{'} Q H_{2}) \otimes^{^{*}} d_{1}}) \\ + \frac{1}{2} Q H_{2} (\bar{d_{1} \otimes e_{1}} + \bar{d_{1} \otimes v e c^{*} (d_{1}^{'} Q H_{2})}) + \bar{e_{1} (e_{1} + v e c^{*} (d_{1}^{'} Q H_{2}))} \\ + Q \bar{v e c^{*} (d_{1}^{'} {\nabla ({(\nabla s_{1} (θ))}^{'})|}_{θ = θ_{0}})} + v e c^{^{*}} (\{\bar{d_{1}^{'} v_{1}^{'}} + \frac{1}{2} \bar{{(d_{1} \otimes d_{1})}^{'}} H_{2}^{'}\} Q^{'} Q H_{2}) \end{matrix}\} . \end{matrix}

References

M. Quenouille. “Notes on Bias in Estimation.” Biometrika 43 (1956): 353–360. [Google Scholar] [CrossRef]
P. Hall. The Bootstrap and Edgeworth Expansion. New York, NY, USA: Springer, 1992. [Google Scholar]
J. Shao, and D. Tu. The Jackknife and Bootstrap. New York, NY, USA: Springer, 1995. [Google Scholar]
J. MacKinnon, and A. Smith. “Approximate Bias Correction in Econometrics.” J. Econom. 85 (1998): 205–230. [Google Scholar] [CrossRef]
D.W.K. Andrews. “Higher-Order Improvements of a Computationally Attractive k-Step Bootstrap for Extremum Estimators.” Econometrica 70 (2002): 119–162. [Google Scholar] [CrossRef]
J. Hahn, and W.K. Newey. “Jackknife and Analytical Bias Reduction for Nonlinear Panel Models.” Econometrica 72 (2004): 1295–1319. [Google Scholar] [CrossRef]
M.J.G. Bun, and M.A. Carree. “Bias-corrected Estimation in Dynamic Panel Data Models.” J. Bus. Econ. Stat. 23 (2005): 200–210. [Google Scholar] [CrossRef]
Y. Bao, and A. Ullah. “Finite Sample Properties of Maximum Likelihood Estimator in Spatial Models.” J. Econom. 137 (2007): 396–413. [Google Scholar] [CrossRef]
Y. Bao, and A. Ullah. “The Second-order Bias and Mean Squared Error of Estimators in Time-series Models.” J. Econom. 140 (2007): 650–669. [Google Scholar] [CrossRef]
Y. Bao. “Finite Sample Bias of the QMLE in Spatial Autoregressive Models.” Econom. Theory 29 (2013): 68–88. [Google Scholar] [CrossRef]
Z. Yang. “A General Method for Third-order Bias and Variance Corrections on a Nonlinear Estimator.” J. Econom. 186 (2015): 178–200. [Google Scholar] [CrossRef]
J. Hahn, G. Kuersteiner, and W. Newey. Higher Order Efficiency of Bias Corrections. Working Paper. 2004. Available online: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.421.1735&rep=rep1&type=pdf (accessed on 2 March 2016).
D. Firth. “Bias Reduction of Maximum Likelihood Estimation.” Biometrika 80 (1993): 27–38. [Google Scholar] [CrossRef]
I. Fernandez-Val. “Bias Correction in Panel Data Models with Individual Specific Parameters.” Ph.D. Thesis, MIT, Cambridge, MA, USA, 2004. [Google Scholar]
P. Rilstone, V.K. Srivastava, and A. Ullah. “The Second-order Bias and Mean Squared Error of Nonlinear Estimators.” J. Econom. 75 (1996): 369–395. [Google Scholar] [CrossRef]
W.K. Newey. “A Method of Moments Interpretation of Sequential Estimators.” Econom. Lett. 14 (1984): 201–206. [Google Scholar] [CrossRef]
W.K. Newey, and R. Smith. “Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators.” Econometrica 72 (2004): 219–255. [Google Scholar] [CrossRef]
J. Heckman. “Sample Selection Bias as a Specification Error.” Econometrica 47 (1979): 153–161. [Google Scholar] [CrossRef]
R.N. Bhattacharya, and J.K. Ghosh. “On the Validity of the Formal Edgeworth Expansion.” Ann. Math. Stat. 6 (1978): 434–451. [Google Scholar] [CrossRef]
J. Pfanzagl, and W. Wefelmeyer. “A Third-Order Optimum Property of the Maximum Likelihood Estimator.” J. Multivar. Anal. 8 (1978): 1–29. [Google Scholar] [CrossRef]
J.K. Ghosh, B.K. Sinha, and H.S. Wieand. “Second Order Efficiency of the MLE with Respect to Any Bowl Shaped Loss Function.” Ann. Stat. 8 (1980): 506–521. [Google Scholar] [CrossRef]
T.J. Rothenberg. “Approximating the Distributions of Econometric Estimators and Test Statistics.” In Handbook of Econometrics V2. Edited by Z. Griliches and M.D. Intriligator. Amsterdam, The Netherlands: North Holland Publishing Co., 1984, pp. 881–935. [Google Scholar]

^1.This possible extension was noted in Hahn and Newey (2004) [6].
^2.This is subject to some caveats, such as the existence of moments and other negligible remainder terms in the stochastic expansions.
^3.Note that the bias correction problem in nonlinear panel data models is the correction for the first-order bias due to the incidental parameters, while the bias correction in this paper is for the second-order bias.
^4.The fact that the estimating equation is the sum of n independent terms allows one to simply estimate the bias terms using their sample analogues. This approach does not have a direct extension to cases where the estimating function takes a more general form.
^5.The analytic forms of these variances are provided in Appendix C.
^6.Detailed derivations are available upon request.

© 2016 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license ( http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, K.I. Higher Order Bias Correcting Moment Equation for M-Estimation and Its Higher Order Efficiency. Econometrics 2016, 4, 48. https://doi.org/10.3390/econometrics4040048

AMA Style

Kim KI. Higher Order Bias Correcting Moment Equation for M-Estimation and Its Higher Order Efficiency. Econometrics. 2016; 4(4):48. https://doi.org/10.3390/econometrics4040048

Chicago/Turabian Style

Kim, Kyoo Il. 2016. "Higher Order Bias Correcting Moment Equation for M-Estimation and Its Higher Order Efficiency" Econometrics 4, no. 4: 48. https://doi.org/10.3390/econometrics4040048

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Higher Order Bias Correcting Moment Equation for M-Estimation and Its Higher Order Efficiency

Abstract

1. Introduction

2. Higher Order Expansion for the M-Estimator

3. Bias-Corrected Moment Equation

4. Higher Order Efficiency

4.1. Third-Order Expansion of the One-Step Bias-Corrected Estimator

4.2. Third-Order Expansion of the Bias-Corrected Moment Equations Estimator

4.3. Higher Order Variances

4.4. Further Comparison of Alternative Bias Corrections

5. Conclusions

Supplementary Materials

Acknowledgments

Conflicts of Interest

Appendix A. Technical Lemmas and Proofs

Appendix B. Proofs of the Main Lemmas and Propositions

Appendix B.1. Proposition 1

Appendix B.2. Lemma 3

Appendix B.3. Proposition 2

Appendix C. Higher Order Variances

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI