1. Introduction
Asymptotic bias corrections are pursued to make estimators closer to the truth values. There are several ways of achieving this goal, including analytical corrections, jackknife and bootstrap methods (see, e.g., Quenouille (1956) [
1], Hall (1992) [
2], Shao and Tu (1995) [
3], MacKinnon and Smith (1998) [
4], Andrews (2002) [
5], Hahn and Newey (2004) [
6], Bun and Carree (2005) [
7], Bao and Ullah (2007) [
8,
9], Bao (2013) [
10] and Yang (2015) [
11]). This variety of bias correction methods evokes the issue whether one method is preferable to others at least on asymptotic efficiency grounds (e.g., see Hahn et al. (2004) [
12]). For the maximum likelihood (ML) estimation, they show that a method of bias correction does not affect the higher order efficiency of any estimator that is first-order efficient in parametric or semiparametric models. An ML estimator is a class of M-estimators, and this paper extends their intuition to a general class of M-estimators.
1Specifically, this paper considers an alternative bias correction for the M-estimator, which is achieved by correcting moment equations in the spirit of Firth (1993) [
13]. In particular, we compare the stochastic expansions of the analytically-bias-corrected estimator (which is referred to one-step bias correction) and the alternative estimator and find that the third-order stochastic expansions of these two estimators are identical. This is a stronger result than comparing higher order variances, since it implies that these two estimators do not only have the same higher order variances, but would also agree upon more properties in terms of their stochastic expansions.
2 We do not consider other bias correction methods, such as bootstrap and jackknife methods, in this paper.
In the literature (see Hahn and Newey (2004) [
6] and Fernandez-Val (2004) [
14] for nonlinear panel data models), it has been discussed that removing the bias directly from the moment equations has an attractive feature that it does not use pre-estimated parameters that are not bias corrected, though this alternative approach requires more intensive computations.
3 Because the analytically-bias-corrected estimator is a two-step estimator, for which an initial estimator needs to be plugged in, while the bias-corrected moment equations estimator is a one-step estimator that does not need an initial estimator, the higher order asymptotic equivalence of these two estimators is not obvious. This paper, however, shows that at least for the third-order stochastic expansion, there is no benefit of using the bias correction of the moment equations over the simple one-step bias correction in the context of M-estimators. This finding suggests that the comparison between the one-step bias correction and the method of correcting the moment equations should be based on the stochastic expansions higher than the third order.
Examples of the M-estimation include maximum likelihood estimation (MLE), least squares and instrumental variable (IV) estimation. Many other useful estimators can also fit into the M-estimation framework with the appropriate definition of the moment equations. It includes some cases of the generalized method of moments (GMM; see examples in Rilstone et al. (1996) [
15]) and two-step estimators (Newey (1984) [
16]). We note that the generalized empirical likelihood (GEL) can also fit into this framework. This suggests that Firth (1993)’s [
13] correcting moment equations approach can be an alternative to Newey and Smith’s approach to obtain the higher order bias and variance terms of GEL (2004) [
17].
Our paper is organized as follows. In
Section 2, we derive the higher order stochastic expansion of the M-estimator and consider the one-step bias correction.
Section 3 introduces the bias-corrected moment equations estimator and derives its higher order stochastic expansion.
Section 4 discusses the higher order efficiency properties of several analytically-bias-corrected estimators. We conclude in
Section 5. Primitive conditions for the validity of the higher order stochastic expansions and mathematical details are gathered in
Appendix A and
Appendix B.
2. Higher Order Expansion for the M-Estimator
Consider a moment condition:
where
is a known
vector-valued function of the data, and a parameter vector
and
includes both endogenous and exogenous variables. The M-estimator is obtained by solving:
Examples for this class of estimators include MLE, least squares and IV estimation. In the MLE,
is the single observation score function. For the linear or nonlinear regression model of
, we set
and
for a known function
. In the linear IV model, we have
and
for some instruments
with
. Two-step estimators such as two-stage least squares, feasible generalized least squares (GLS) and Heckman (1979) [
18]’s two-step estimator also fit into this framework (see Newey (1984) [
16]). Rilstone et al. (1996) [
15] provide some special cases of GMM estimators that can be put into the M-estimation, but the examples are not restricted to those. Partly motivated with this wide applicability, we study the stochastic expansion and the bias correction of the M-estimator.
We obtain the higher order stochastic expansion of the M-estimator using the iterative approach used in Rilstone et al. (1996) [
15] up to a certain order. This approach is analytically convenient and straightforward to implement since the estimators are expressed as functions of the sums of random variables. Edgeworth expansion can be considered as an alternative whose validity has been derived in Bhattacharya and Ghosh (1978) [
19], but the stochastic expansion approach is noted as a much simpler approach. Moreover, the main purpose of this paper is to provide the comparison of several estimators based on the higher order variance (
variance). Noting that rankings based on the higher order variances in a third-order stochastic expansion are equivalent to rankings based on the variances of an Edgeworth expansion as shown in Pfanzagl and Wefelmeyer (1978) [
20] and Ghosh et al. (1980) [
21] and as discussed in Rothenberg (1984) [
22], it suffices to use the simple stochastic expansions for our purposes.
Here, we borrow Rilstone et al. (1996) [
15]’s notation. We denote the matrix of
υ-th order partial derivatives of a matrix
as
. Specifically, if
is a
vector function,
is the usual Jacobian whose
l-th row contains the partial derivatives of the
l-th element of
.
(a
matrix) is defined recursively, such that the
j-th element of the
l-th row of
is the
vector
, where
is the
l-th row and the
j-th element of
. We use ⊗ to denote a usual Kronecker product. Using this Kronecker product, we can express
. Finally, we use a matrix norm
for a matrix
A.
We first derive the higher order stochastic expansion of the M-estimator and consider the one-step bias correction here. In the next section, we introduce the bias-corrected moment equations estimator and derive its higher order stochastic expansion. Then, we compare these two approaches.
Before we derive the second-order expansion of the M-estimator to obtain the second-order bias analytically, we introduce simplifying notation. Let , , , and write , , . Let , , and . Furthermore, define , , .
Lemma 1. (Rilstone et al. (1996) [15]) Suppose are i.i.d.; is in the interior of Θ
, and is the only satisfying (1); and the M-estimator defined in (2) is consistent. Further suppose that: (i) is κ-times continuously differentiable in the neighborhood of , denoted by for all , with probability one; (iia) is integrable for each fixed , , ; and (iib) is continuous and bounded at ; (iii) for and (iv) for (v) exists, i.e., is nonsingular; (vi) (vii) (viii) . Then, we have , and moreover, . This lemma and the following Lemma 2 are available in Rilstone et al. (1996) [
15], but we reproduce them since some of their results are useful to derive our new results. From Lemma 1, the higher order bias of
is obtained as:
Defining
and
and letting
and
, it is not difficult to see
, as shown below. In this regard, we will write
.
Lemma 2. (Rilstone et al. (1996) [15]) Suppose (1) holds and are i.i.d. Then, where and . Thus, we can eliminate the second-order bias of the M-estimator
by subtracting a consistent estimator of the bias.
4 Now, let
denote the bias-corrected estimator of this sort defined by:
where the function
, a consistent estimator of
, is constructed as:
for
and
. In particular, we can replace
in
with any
-consistent estimator of
. In this sense,
is a two-step estimator.
To characterize the higher order efficiency based on the higher order variance (
variance) of the bias-corrected estimators, we need to expand the M-estimator to the third order. We use some additional simplifying terms:
,
. Furthermore, we write:
for the ease of notation. We obtain:
Lemma 3. Suppose are i.i.d., is in the interior of Θ
, is the only satisfying (1) and the M-estimator that solves (2) is consistent. Further suppose that: (i) is κ-times continuously differentiable in a neighborhood of , denoted by for all , with probability one; (iia) is integrable for each fixed , , ; (iib) is continuous and bounded at ; (iii) for (iv) Q is nonsingular; (v) ; (vi) ; (vii) ; (viii) (ix) . Then, we have . Note that the conditions in Lemma 3 are all standard regularity conditions.
In the following section, we propose an alternative one-step estimator that eliminates the second-order bias by adjusting the moment equations inspired by Firth (1993) [
13].
3. Bias-Corrected Moment Equation
Here, we consider an alternative higher order bias reduced estimator that solves bias-corrected moment equations. This idea was proposed in Firth (1993) [
13] for the ML with a fixed number of parameters and exploited in Hahn and Newey (2004) [
6] and Fernandez-Val (2004) [
14] for the nonlinear panel data models with individual specific effects. We refer to this estimator as Firth’s estimator.
To be precise, consider:
for a known function
that is given by:
This correction term
is obtained following Firth (1993) [
13] and using the bias term for the M-estimator. In the ML context, Firth (1993) [
13] shows that by adjusting the score function (he refers to this as a modified score function) with the correction term defined by the product of the Fisher information matrix and the bias term, one can obtain a bias-corrected ML estimator.
has the same interpretation in the ML, since
is the Hessian matrix, and hence,
is the Fisher information in the ML. Therefore, (
5) is a generalization of Firth (1993) [
13]’s approach to the M-estimation . In general
contains population terms, and hence, to implement this alternative estimator, we need to estimate the function
. We use a sample analogue of (
5) as:
Now, we estimate
by solving:
and claim that the solution of this modified moment condition eliminates the second-order bias of
that solves the original moment condition (
2).
Assumption 1. (i) are i.i.d.; (ii) is κ-times continuously differentiable in a neighborhood of , denoted by for all , ; (iii) , , ; (iv) Θ
is compact; (v) is in the interior of Θ
and is the only satisfying (1); (vi) for . Assumption 2. For , is nonsingular.
Alternatively, we can assume the following instead of Assumption 1.
Assumption 3. (i) are i.i.d.; (ii) satisfies the Lipschitz condition in θ as:for some function and , with positive integer and for some and in a neighborhood of ; (iii) , , with positive integer and for some ; (iv) Θ
is bounded; (v) is in the interior of Θ
and is the only satisfying (1). Under Assumptions 1 and 2 or Assumptions 3 and 2, the following three conditions are satisfied (see Lemma A.9 in
Appendix A).
Condition 1. (i) ; (ii) .
Condition 2. in the neighborhood of .
Condition 3. in the neighborhood of .
Note that these three conditions are required to control for the estimation error in in the stochastic expansions. Now, we are ready to present one of our main results.
Proposition 1. Suppose solves (7) where is given by (6) and that is a consistent estimator of . Further, suppose that Conditions 1–3 and Conditions (i)–(viii) in Lemma 1 are satisfied, then we have:where , and hence, the second-order bias of is . This concludes that we can eliminate the second-order bias by adjusting the moment equations as (
7), and it is a proper alternative to the analytic bias correction of (
3).