1. Introduction
Recently, sparse regression models have received considerable attention in business, economics, genetics, and various other fields. In these models, the number of possible regressors can be potentially large; however, only a relatively small number of these regressors are relevant.
Penalization is an alternative to a classical subset selection. One of the drawbacks of subset selection is lack of stability due to its discrete nature, meaning that variables are either retained or are dropped from a model. As a result, a small perturbation in a sample may cause a drastic change in the post-selection results (
Breiman 1996). Penalization addresses this issue by achieving variable selection and estimation simultaneously, through a continuous process.
Several penalization methods have been advocated for linear regression models. Examples include the bridge penalty (
Frank and Friedman 1993), LASSO (
Tibshirani 1996), the smoothly clipped absolute deviation (SCAD) penalty (
Fan and Li 2001), and the elastic net penalty (
Zou and Hastie 2005). However, penalized least squares methods are not applicable when endogeneity exists (
Fan and Liao 2014). When endogeneity exists, parameters of interest are identified often by moment restrictions, using instrumental variables.
This study investigates the asymptotic properties of a penalized empirical likelihood (PEL) estimator for moment restriction models, when the number of parameters and/or the number of moment restrictions increases with the sample size. We extend the EL estimator of
Qin and Lawless (
1994) by employing the SCAD penalty, so that we can achieve estimation and variable selection simultaneously.
Some penalized estimators for moment restriction models have been proposed in the econometric literature.
Caner (
2009) and
Shi (
2016b) considered the GMM estimator with a LASSO-type penalty.
Caner and Zhang (
2014) proposed the adaptive elastic net GMM estimator.
Fan and Liao (
2014) proposed the penalized focused GMM estimator.
Leng and Tang (
2012) and
Chang et al. (
2015) studied the asymptotic properties of the PEL estimator for independent and weakly dependent observations, respectively.
Tang et al. (
2018) considered a penalized exponential tilting estimator.
This paper shows that the SCAD-penalized EL estimator is
-consistent, where
is the number of parameters.
Leng and Tang (
2012) showed that the non-penalized EL estimator is
-consistent under the assumption that
, where
is the number of moment restrictions. Thus, essentially, they only proved
-consistency.
Chang et al. (
2015) proved
-consistency of the non-penalized EL estimator without imposing
, but they only obtained
-consistency for the PEL estimator. We prove
-consistency of the PEL estimator under a reasonable condition on the regularization parameter of the penalty function. Our result is important because it implies
-consistency of the estimator when
is fixed and only
increases with the sample size. This is consistent with previous results in the EL literature such as
Donald et al. (
2003). In contrast,
-consistency implies that only a slow rate of convergence can be achieved even when
is finite and fixed.
This paper also shows that the PEL estimator satisfies the oracle property in the sense of
Fan and Peng (
2004) when the truth is sparse. That is, if the true parameter vector has some zero components, then they are estimated as zeros with probability approaching one, and the other nonzero components are estimated well, similar to the case when the zero components are known a priori. Although
Leng and Tang (
2012) and
Chang et al. (
2015) also discussed the oracle property of the PEL estimator, they obtained their results under high-level assumptions. As far as we know, this paper is the first to specify sufficient conditions for both
-consistency and the oracle property of the PEL estimator.
Recently,
Chang et al. (
2018) proposed an alternative PEL estimator that regularizes both parameters and Lagrange multipliers. Their estimator allows the case where
and
increase at an exponential rate, while our PEL estimator allows a polynomial rate only. Their method is useful when the truth is actually sparse. In contrast, our estimator is valid even when the truth is not sparse because
-consistency can be established without imposing sparsity.
This paper is organized as follows. We first show
-consistency of the SCAD-penalized EL estimator and compare our assumptions with those of
Leng and Tang (
2012) and
Chang et al. (
2015). Then, we obtain the asymptotic distribution. Our proofs are new in the EL literature. All the proofs are found in the
Appendix A.
2. PEL Estimator and Asymptotic Results
Let
be a random sample from an unknown distribution on
. This study considers the moment restriction model
where
is a
-dimensional true parameter and
is an
-dimensional moment function. For instance, the model includes the linear instrumental variable model
where
is an
vector of instrumental variables and
is a
vector of explanatory variables. We consider the case where
. The subscript indicates that
,
, and
may increase with the sample size.
The PEL estimator for
is
where
and
is a penalty function with a regularization parameter
. Thus, the estimator is the same as that of
Leng and Tang (
2012).
For concreteness, we employ the SCAD penalty of
Fan and Li (
2001):
for some
. Similar asymptotic results are obtained also by using a different penalty function, such as the minimax concave penalty of
Zhang (
2010).
The true model may be sparse, that is, some elements of may be zero. Let be the number of nonzero elements in . Without loss of generality, we can write with and . For now, the sparsity assumption is not crucial. It is possible that .
Let and . Also, let and . We define and . Moreover, we use and to denote and , respectively, where is a subset in , such that . Let and denote the minimum and maximum eigenvalues of a matrix A. Also, let denote the Euclidean (Frobenius) norm.
We impose the following conditions for -consistency.
Assumption 1. (i) The true parameter vector is the unique minimizer of and belongs to the interior of ; (ii) There are positive functions and such that for any where ; (iii) . Assumption 2. (i) for some ; (ii) .
Assumption 3. (i) There exists C such that in a neighborhood of ; (ii) There exists C such that ; (iii) There exists C such that in a neighborhood of .
Assumption 4. (i) The moment function is twice continuously differentiable in for all y in a neighborhood of ; (ii) There exists C such that in a neighborhood of with probability approaching one.
Assumption 5. .
Assumption 1 is similar to condition 2.1 of
Chang et al. (
2015). Assumption 1 (iii) is an extension of the uniform convergence. If we restrict the parameter space such that
is compact and
, then Assumption 1 (iii) is satisfied with
. Assumption 1 is used to show that
. Any condition that guarantees consistency of the estimator can replace 1.
Assumptions 2 (i) and (ii) are similar to Assumptions 2 and 4 in
Leng and Tang (
2012). However, we do not assume that
. Thus,
can grow faster than
. We can allow the case where
is fixed and only
increases with the sample size.
Assumption 4 states that the objective function of the EL estimator is strictly convex in in a neighborhood of . When and are fixed, this condition is satisfied under fairly weak conditions. We can also relax the condition so that with a positive sequence such that . In that case, we obtain a different convergence rate of the estimator. Under certain conditions, we have .
Assumption 5 is similar to condition (B2) in
Huang and Xie (
2007), who obtained the convergence rate of the SCAD-penalized least squares estimator. Assumption 5 states that the minimum of nonzero elements in
may converge to 0, but the convergence rate must be sufficiently slow. If nonzero elements are too small compared to
, then the PEL estimator cannot distinguish between zero and nonzero elements. Following
Huang and Xie (
2007), we prove
-consistency of the PEL estimator in two steps. We first prove
under Assumptions 1–4 and
(see Lemma A3 in the
Appendix A). Then, we improve the convergence rate by using Assumption 5. Notice that if we assume
, then
-consistency of the PEL estimator is obtained immediately from Lemma A3. However, as we will see later, this condition contradicts Assumption 6 (i), which is a key condition for the oracle property. Assumption 5 is imposed so that
-consistency and the oracle property are satisfied simultaneously.
Theorem 1. Suppose that Assumptions 1–5 hold. Then, we have .
The sparsity assumption is not necessary for this theorem. The same result is obtained even if all elements in
are nonzero. Moreover, because Assumption 5 does not exclude
, the theorem also applies to the non-penalized EL estimator, whose
-consistency has been established by
Chang et al. (
2015). As we will see in the next theorem, if the truth is sparse, then we obtain
-consistency of the PEL estimator under certain additional assumptions.
Our convergence rate of the PEL estimator is better than that of
Chang et al. (
2015). Roughly speaking, different convergence rates are based on different equalities. The asymptotic analyses of
Leng and Tang (
2012) and
Chang et al. (
2015) are based on the moment equality
, which implies
.
Leng and Tang (
2012) obtained
-consistency of the non-penalized EL estimator by assuming
and hence
. On the other hand, our asymptotic analysis is based on the first-order condition
, which implies
. Therefore, our proof is not a straightforward extension of that of
Leng and Tang (
2012) and
Chang et al. (
2015).
To obtain a convergence rate in line with the proof of
Leng and Tang (
2012) and
Chang et al. (
2015), we need a rather strong condition on the regularization parameter. For instance,
Chang et al. (
2015) assumed that
to prove
-consistency, where
M is the block length, which is equal to unity when the observations are independent. The condition of
Chang et al. (
2015) corresponds to the condition that
in our case. As stated before, although this condition simplifies the proof of
-consistency, it causes a problem for the oracle property of the estimator.
Next, we show sparsity and asymptotic normality of the PEL estimator. Let and be the corresponding estimators of and , respectively. Furthermore, let . We define and .
We impose additional conditions.
Assumption 6. (i) ; (ii)
Assumption 7. There exists such that and for all and in a neighborhood of .
Assumption 8. There exists C such that .
Assumption 6 (i) is a key condition for sparsity of the PEL estimator. It requires that the regularization parameter is not too small so that zero elements in
are estimated as zero. The same condition is also employed by
Leng and Tang (
2012).
Theorem 2. Suppose that Assumptions 1–8 hold. Let be an matrix such that , where G is an matrix with fixed l. Then, the PEL estimator satisfies the following:
- 1.
Sparsity: with probability approaching one.
- 2.
-consistency: .
- 3.
Asymptotic normality:
The selection of the matrix depends on the parameter of interest. For instance, suppose that the parameter of interest is the first element of . Let and denote first elements of and , respectively. Then, we choose and obtain , where is the limit of the first diagonal element of .
Although a detailed proof is given in the
Appendix A, we give a sketch of the proof for asymptotic normality here. If
were known, then
can be estimated by
which is a penalized maximum likelihood estimator using a least favorable submodel of the moment restriction model (see
Sueishi 2016, for instance). Because
is the penalized maximum likelihood estimator, its distribution can be obtained in a manner similar to
Fan and Peng (
2004). We derive the asymptotic distribution of
by showing that
is asymptotically equivalent to
.
By modifying the proof of Theorem 2, we can obtain easily the asymptotic distribution of the non-penalized EL estimator. Because the asymptotic distribution of the non-penalized EL estimator has already been derived by
Leng and Tang (
2012), we omit the derivation. We see that the efficiency of the PEL estimator for
is the same as that of the non-penalized EL estimator for which it is known a priori that
. Thus, our estimator satisfies the oracle property in the sense of
Fan and Peng (
2004).
Theorem 2 is similar to Theorem 3 of
Leng and Tang (
2012). However, they proved sparsity by assuming that the PEL estimator is
-consistent. They did not state explicitly the conditions under which the non-penalized and penalized EL estimators have the same convergence rate.
Chang et al. (
2015) showed a similar result to Theorem 2 for weakly dependent observations. They obtained
-consistency and sparsity under two separate
rate conditions. Specifically, they assume: (i)
for
-consistency and (ii)
for sparsity. If condition (ii) is satisfied, however, condition (i) requires that
, which is clearly impossible. This causes a trouble because their proof of sparsity requires
-consistency of the estimator. We relaxed condition (i) and obtained sufficient conditions under which both
-consistency and sparsity are satisfied.