1. Introduction
Recently, sparse regression models have received considerable attention in business, economics, genetics, and various other fields. In these models, the number of possible regressors can be potentially large; however, only a relatively small number of these regressors are relevant.
Penalization is an alternative to a classical subset selection. One of the drawbacks of subset selection is lack of stability due to its discrete nature, meaning that variables are either retained or are dropped from a model. As a result, a small perturbation in a sample may cause a drastic change in the post-selection results (
Breiman 1996). Penalization addresses this issue by achieving variable selection and estimation simultaneously, through a continuous process.
Several penalization methods have been advocated for linear regression models. Examples include the bridge penalty (
Frank and Friedman 1993), LASSO (
Tibshirani 1996), the smoothly clipped absolute deviation (SCAD) penalty (
Fan and Li 2001), and the elastic net penalty (
Zou and Hastie 2005). However, penalized least squares methods are not applicable when endogeneity exists (
Fan and Liao 2014). When endogeneity exists, parameters of interest are identified often by moment restrictions, using instrumental variables.
This study investigates the asymptotic properties of a penalized empirical likelihood (PEL) estimator for moment restriction models, when the number of parameters and/or the number of moment restrictions increases with the sample size. We extend the EL estimator of 
Qin and Lawless (
1994) by employing the SCAD penalty, so that we can achieve estimation and variable selection simultaneously.
Some penalized estimators for moment restriction models have been proposed in the econometric literature. 
Caner (
2009) and 
Shi (
2016b) considered the GMM estimator with a LASSO-type penalty. 
Caner and Zhang (
2014) proposed the adaptive elastic net GMM estimator. 
Fan and Liao (
2014) proposed the penalized focused GMM estimator. 
Leng and Tang (
2012) and 
Chang et al. (
2015) studied the asymptotic properties of the PEL estimator for independent and weakly dependent observations, respectively. 
Tang et al. (
2018) considered a penalized exponential tilting estimator.
This paper shows that the SCAD-penalized EL estimator is 
-consistent, where 
 is the number of parameters. 
Leng and Tang (
2012) showed that the non-penalized EL estimator is 
-consistent under the assumption that 
, where 
 is the number of moment restrictions. Thus, essentially, they only proved 
-consistency. 
Chang et al. (
2015) proved 
-consistency of the non-penalized EL estimator without imposing 
, but they only obtained 
-consistency for the PEL estimator. We prove 
-consistency of the PEL estimator under a reasonable condition on the regularization parameter of the penalty function. Our result is important because it implies 
-consistency of the estimator when 
 is fixed and only 
 increases with the sample size. This is consistent with previous results in the EL literature such as 
Donald et al. (
2003). In contrast, 
-consistency implies that only a slow rate of convergence can be achieved even when 
 is finite and fixed.
This paper also shows that the PEL estimator satisfies the oracle property in the sense of 
Fan and Peng (
2004) when the truth is sparse. That is, if the true parameter vector has some zero components, then they are estimated as zeros with probability approaching one, and the other nonzero components are estimated well, similar to the case when the zero components are known a priori. Although 
Leng and Tang (
2012) and 
Chang et al. (
2015) also discussed the oracle property of the PEL estimator, they obtained their results under high-level assumptions. As far as we know, this paper is the first to specify sufficient conditions for both 
-consistency and the oracle property of the PEL estimator.
Recently, 
Chang et al. (
2018) proposed an alternative PEL estimator that regularizes both parameters and Lagrange multipliers. Their estimator allows the case where 
 and 
 increase at an exponential rate, while our PEL estimator allows a polynomial rate only. Their method is useful when the truth is actually sparse. In contrast, our estimator is valid even when the truth is not sparse because 
-consistency can be established without imposing sparsity.
This paper is organized as follows. We first show 
-consistency of the SCAD-penalized EL estimator and compare our assumptions with those of 
Leng and Tang (
2012) and 
Chang et al. (
2015). Then, we obtain the asymptotic distribution. Our proofs are new in the EL literature. All the proofs are found in the 
Appendix A.
  2. PEL Estimator and Asymptotic Results
Let 
 be a random sample from an unknown distribution on 
. This study considers the moment restriction model
      
      where 
 is a 
-dimensional true parameter and
      
      is an 
-dimensional moment function. For instance, the model includes the linear instrumental variable model
      
      where 
 is an 
 vector of instrumental variables and 
 is a 
 vector of explanatory variables. We consider the case where 
. The subscript indicates that 
, 
, and 
 may increase with the sample size.
The PEL estimator for 
 is
      
      where 
 and 
 is a penalty function with a regularization parameter 
. Thus, the estimator is the same as that of 
Leng and Tang (
2012).
For concreteness, we employ the SCAD penalty of 
Fan and Li (
2001):
      for some 
. Similar asymptotic results are obtained also by using a different penalty function, such as the minimax concave penalty of 
Zhang (
2010).
The true model may be sparse, that is, some elements of  may be zero. Let  be the number of nonzero elements in . Without loss of generality, we can write  with  and . For now, the sparsity assumption is not crucial. It is possible that .
Let  and . Also, let  and . We define  and . Moreover, we use  and  to denote  and , respectively, where  is a subset in , such that . Let  and  denote the minimum and maximum eigenvalues of a matrix A. Also, let  denote the Euclidean (Frobenius) norm.
We impose the following conditions for -consistency.
Assumption 1. (i) The true parameter vector  is the unique minimizer of  and belongs to the interior of ; (ii) There are positive functions  and  such that for any where ; (iii) .  Assumption 2. (i)  for some ; (ii) .
 Assumption 3. (i) There exists C such that  in a neighborhood of ; (ii) There exists C such that ; (iii) There exists C such that  in a neighborhood of .
 Assumption 4. (i) The moment function  is twice continuously differentiable in  for all y in a neighborhood of ; (ii) There exists C such that  in a neighborhood of  with probability approaching one.
 Assumption 5. .
 Assumption 1 is similar to condition 2.1 of 
Chang et al. (
2015). Assumption 1 (iii) is an extension of the uniform convergence. If we restrict the parameter space such that 
 is compact and 
, then Assumption 1 (iii) is satisfied with 
. Assumption 1 is used to show that 
. Any condition that guarantees consistency of the estimator can replace 1.
Assumptions 2 (i) and (ii) are similar to Assumptions 2 and 4 in 
Leng and Tang (
2012). However, we do not assume that 
. Thus, 
 can grow faster than 
. We can allow the case where 
 is fixed and only 
 increases with the sample size.
Assumption 4 states that the objective function of the EL estimator is strictly convex in  in a neighborhood of . When  and  are fixed, this condition is satisfied under fairly weak conditions. We can also relax the condition so that  with a positive sequence  such that . In that case, we obtain a different convergence rate of the estimator. Under certain conditions, we have .
Assumption 5 is similar to condition (B2) in 
Huang and Xie (
2007), who obtained the convergence rate of the SCAD-penalized least squares estimator. Assumption 5 states that the minimum of nonzero elements in 
 may converge to 0, but the convergence rate must be sufficiently slow. If nonzero elements are too small compared to 
, then the PEL estimator cannot distinguish between zero and nonzero elements. Following 
Huang and Xie (
2007), we prove 
-consistency of the PEL estimator in two steps. We first prove 
 under Assumptions 1–4 and 
 (see Lemma A3 in the 
Appendix A). Then, we improve the convergence rate by using Assumption 5. Notice that if we assume 
, then 
-consistency of the PEL estimator is obtained immediately from Lemma A3. However, as we will see later, this condition contradicts Assumption 6 (i), which is a key condition for the oracle property. Assumption 5 is imposed so that 
-consistency and the oracle property are satisfied simultaneously.
Theorem 1. Suppose that Assumptions 1–5 hold. Then, we have .
 The sparsity assumption is not necessary for this theorem. The same result is obtained even if all elements in 
 are nonzero. Moreover, because Assumption 5 does not exclude 
, the theorem also applies to the non-penalized EL estimator, whose 
-consistency has been established by 
Chang et al. (
2015). As we will see in the next theorem, if the truth is sparse, then we obtain 
-consistency of the PEL estimator under certain additional assumptions.
Our convergence rate of the PEL estimator is better than that of 
Chang et al. (
2015). Roughly speaking, different convergence rates are based on different equalities. The asymptotic analyses of 
Leng and Tang (
2012) and 
Chang et al. (
2015) are based on the moment equality 
, which implies 
. 
Leng and Tang (
2012) obtained 
-consistency of the non-penalized EL estimator by assuming 
 and hence 
. On the other hand, our asymptotic analysis is based on the first-order condition 
, which implies 
. Therefore, our proof is not a straightforward extension of that of 
Leng and Tang (
2012) and 
Chang et al. (
2015).
To obtain a convergence rate in line with the proof of 
Leng and Tang (
2012) and 
Chang et al. (
2015), we need a rather strong condition on the regularization parameter. For instance, 
Chang et al. (
2015) assumed that 
 to prove 
-consistency, where 
M is the block length, which is equal to unity when the observations are independent. The condition of 
Chang et al. (
2015) corresponds to the condition that 
 in our case. As stated before, although this condition simplifies the proof of 
-consistency, it causes a problem for the oracle property of the estimator.
Next, we show sparsity and asymptotic normality of the PEL estimator. Let  and  be the corresponding estimators of  and , respectively. Furthermore, let . We define  and .
We impose additional conditions.
Assumption 6. (i) ; (ii) 
 Assumption 7. There exists  such that  and  for all  and  in a neighborhood of .
 Assumption 8. There exists C such that .
 Assumption 6 (i) is a key condition for sparsity of the PEL estimator. It requires that the regularization parameter is not too small so that zero elements in 
 are estimated as zero. The same condition is also employed by 
Leng and Tang (
2012).
Theorem 2. Suppose that Assumptions 1–8 hold. Let  be an  matrix such that , where G is an  matrix with fixed l. Then, the PEL estimator satisfies the following:
- 1. 
- Sparsity:  with probability approaching one. 
- 2. 
- -consistency: . 
- 3. 
- Asymptotic normality:  
 The selection of the matrix  depends on the parameter of interest. For instance, suppose that the parameter of interest is the first element of . Let  and  denote first elements of  and , respectively. Then, we choose  and obtain , where  is the limit of the first diagonal element of .
Although a detailed proof is given in the 
Appendix A, we give a sketch of the proof for asymptotic normality here. If 
 were known, then 
 can be estimated by
      
      which is a penalized maximum likelihood estimator using a least favorable submodel of the moment restriction model (see 
Sueishi 2016, for instance). Because 
 is the penalized maximum likelihood estimator, its distribution can be obtained in a manner similar to 
Fan and Peng (
2004). We derive the asymptotic distribution of 
 by showing that 
 is asymptotically equivalent to 
.
By modifying the proof of Theorem 2, we can obtain easily the asymptotic distribution of the non-penalized EL estimator. Because the asymptotic distribution of the non-penalized EL estimator has already been derived by 
Leng and Tang (
2012), we omit the derivation. We see that the efficiency of the PEL estimator for 
 is the same as that of the non-penalized EL estimator for which it is known a priori that 
. Thus, our estimator satisfies the oracle property in the sense of 
Fan and Peng (
2004).
Theorem 2 is similar to Theorem 3 of 
Leng and Tang (
2012). However, they proved sparsity by assuming that the PEL estimator is 
-consistent. They did not state explicitly the conditions under which the non-penalized and penalized EL estimators have the same convergence rate.
Chang et al. (
2015) showed a similar result to Theorem 2 for weakly dependent observations. They obtained 
-consistency and sparsity under two separate 
 rate conditions. Specifically, they assume: (i) 
 for 
-consistency and (ii) 
 for sparsity. If condition (ii) is satisfied, however, condition (i) requires that 
, which is clearly impossible. This causes a trouble because their proof of sparsity requires 
-consistency of the estimator. We relaxed condition (i) and obtained sufficient conditions under which both 
-consistency and sparsity are satisfied.