Next Article in Journal
Monte Carlo Inference on Two-Sided Matching Models
Previous Article in Journal
Indirect Inference: Which Moments to Match?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Convergence Rate of the SCAD-Penalized Empirical Likelihood Estimator

1
Melbourne Business School, University of Melbourne, 200 Leicester Street, Carlton, Victoria 3053, Australia
2
Graduate School of Economics, Kobe University, 2-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
*
Author to whom correspondence should be addressed.
Econometrics 2019, 7(1), 15; https://doi.org/10.3390/econometrics7010015
Submission received: 18 October 2018 / Revised: 18 March 2019 / Accepted: 18 March 2019 / Published: 20 March 2019

Abstract

:
This paper investigates the asymptotic properties of a penalized empirical likelihood estimator for moment restriction models when the number of parameters ( p n ) and/or the number of moment restrictions increases with the sample size. Our main result is that the SCAD-penalized empirical likelihood estimator is n / p n -consistent under a reasonable condition on the regularization parameter. Our consistency rate is better than the existing ones. This paper also provides sufficient conditions under which n / p n -consistency and an oracle property are satisfied simultaneously. As far as we know, this paper is the first to specify sufficient conditions for both n / p n -consistency and the oracle property of the penalized empirical likelihood estimator.

1. Introduction

Recently, sparse regression models have received considerable attention in business, economics, genetics, and various other fields. In these models, the number of possible regressors can be potentially large; however, only a relatively small number of these regressors are relevant.
Penalization is an alternative to a classical subset selection. One of the drawbacks of subset selection is lack of stability due to its discrete nature, meaning that variables are either retained or are dropped from a model. As a result, a small perturbation in a sample may cause a drastic change in the post-selection results (Breiman 1996). Penalization addresses this issue by achieving variable selection and estimation simultaneously, through a continuous process.
Several penalization methods have been advocated for linear regression models. Examples include the bridge penalty (Frank and Friedman 1993), LASSO (Tibshirani 1996), the smoothly clipped absolute deviation (SCAD) penalty (Fan and Li 2001), and the elastic net penalty (Zou and Hastie 2005). However, penalized least squares methods are not applicable when endogeneity exists (Fan and Liao 2014). When endogeneity exists, parameters of interest are identified often by moment restrictions, using instrumental variables.
This study investigates the asymptotic properties of a penalized empirical likelihood (PEL) estimator for moment restriction models, when the number of parameters and/or the number of moment restrictions increases with the sample size. We extend the EL estimator of Qin and Lawless (1994) by employing the SCAD penalty, so that we can achieve estimation and variable selection simultaneously.
Some penalized estimators for moment restriction models have been proposed in the econometric literature. Caner (2009) and Shi (2016b) considered the GMM estimator with a LASSO-type penalty. Caner and Zhang (2014) proposed the adaptive elastic net GMM estimator. Fan and Liao (2014) proposed the penalized focused GMM estimator. Leng and Tang (2012) and Chang et al. (2015) studied the asymptotic properties of the PEL estimator for independent and weakly dependent observations, respectively. Tang et al. (2018) considered a penalized exponential tilting estimator.
This paper shows that the SCAD-penalized EL estimator is n / p n -consistent, where p n is the number of parameters. Leng and Tang (2012) showed that the non-penalized EL estimator is n / p n -consistent under the assumption that p n / r n c ( 0 , 1 ) , where r n is the number of moment restrictions. Thus, essentially, they only proved n / r n -consistency. Chang et al. (2015) proved n / p n -consistency of the non-penalized EL estimator without imposing p n / r n c ( 0 , 1 ) , but they only obtained n / r n -consistency for the PEL estimator. We prove n / p n -consistency of the PEL estimator under a reasonable condition on the regularization parameter of the penalty function. Our result is important because it implies n -consistency of the estimator when p n is fixed and only r n increases with the sample size. This is consistent with previous results in the EL literature such as Donald et al. (2003). In contrast, n / r n -consistency implies that only a slow rate of convergence can be achieved even when p n is finite and fixed.
This paper also shows that the PEL estimator satisfies the oracle property in the sense of Fan and Peng (2004) when the truth is sparse. That is, if the true parameter vector has some zero components, then they are estimated as zeros with probability approaching one, and the other nonzero components are estimated well, similar to the case when the zero components are known a priori. Although Leng and Tang (2012) and Chang et al. (2015) also discussed the oracle property of the PEL estimator, they obtained their results under high-level assumptions. As far as we know, this paper is the first to specify sufficient conditions for both n / p n -consistency and the oracle property of the PEL estimator.
Recently, Chang et al. (2018) proposed an alternative PEL estimator that regularizes both parameters and Lagrange multipliers. Their estimator allows the case where r n and p n increase at an exponential rate, while our PEL estimator allows a polynomial rate only. Their method is useful when the truth is actually sparse. In contrast, our estimator is valid even when the truth is not sparse because n / p n -consistency can be established without imposing sparsity.
There is also a large literature on instrument (moment) selection that addresses the problem of selecting/constructing optimal instruments when a large number of instruments are available (e.g., Donald and Newey 2001; Bai and Ng 2009; Kuersteiner and Okui 2010; Belloni et al. 2012; Caner and Fan 2015; Cheng and Liao 2015; Shi 2016a). In contrast to these papers, here we focus on variable selection in a structural model.
This paper is organized as follows. We first show n / p n -consistency of the SCAD-penalized EL estimator and compare our assumptions with those of Leng and Tang (2012) and Chang et al. (2015). Then, we obtain the asymptotic distribution. Our proofs are new in the EL literature. All the proofs are found in the Appendix A.

2. PEL Estimator and Asymptotic Results

Let { y 1 , , y n } be a random sample from an unknown distribution on R d n . This study considers the moment restriction model
E m ( y i , θ 0 ) = 0 ,
where θ 0 = ( θ 10 , , θ p n 0 ) Θ n is a p n -dimensional true parameter and
m ( y , θ ) = ( m 1 ( y , θ ) , , m r n ( y , θ ) )
is an r n -dimensional moment function. For instance, the model includes the linear instrumental variable model
E [ z i ( y i x i θ 0 ) ] = 0 ,
where z i is an r n × 1 vector of instrumental variables and x i is a p n × 1 vector of explanatory variables. We consider the case where r n p n . The subscript indicates that d n , p n , and r n may increase with the sample size.
The PEL estimator for θ 0 is
θ ^ n = arg min θ Θ n max λ Λ ^ n ( θ ) 1 n i = 1 n log ( 1 λ m ( y i , θ ) ) + j = 1 p n p κ n ( θ j ) ,
where Λ ^ n ( θ ) = { λ R r n : λ m ( y i , θ ) < 1 , i = 1 , , n } and p κ ( · ) is a penalty function with a regularization parameter κ . Thus, the estimator is the same as that of Leng and Tang (2012).
For concreteness, we employ the SCAD penalty of Fan and Li (2001):
p κ ( u ) = κ | u | | u | κ ( u 2 2 a κ | u | + κ 2 ) / [ 2 ( a 1 ) ] κ < | u | a κ ( a + 1 ) κ 2 / 2 | u | > a κ
for some a > 2 . Similar asymptotic results are obtained also by using a different penalty function, such as the minimax concave penalty of Zhang (2010).
The true model may be sparse, that is, some elements of θ 0 may be zero. Let q n be the number of nonzero elements in θ 0 . Without loss of generality, we can write θ 0 = ( θ 10 , θ 20 ) = ( θ 10 , 0 ) with θ 1 = ( θ 1 , , θ q n ) R q n and θ 2 = ( θ q n + 1 , , θ p n ) R p n q n . For now, the sparsity assumption is not crucial. It is possible that q n = p n .
Let m i ( θ ) = m ( y i , θ ) and M i ( θ ) = m i ( θ ) / θ . Also, let m i = m i ( θ 0 ) and M i = M i ( θ 0 ) . We define Q n ( θ , λ ) = E [ log ( 1 λ m i ( θ ) ) ] and Q ^ n ( θ , λ ) = n 1 i = 1 n log ( 1 λ m i ( θ ) ) . Moreover, we use λ ( θ ) and λ ^ ( θ ) to denote arg max λ Λ n ( θ ) Q n ( θ , λ ) and arg max λ Λ ^ n ( θ ) Q ^ n ( θ , λ ) , respectively, where Λ n ( θ ) is a subset in R r n , such that 0 int ( Λ n ( θ ) ) . Let λ min ( A ) and λ max ( A ) denote the minimum and maximum eigenvalues of a matrix A. Also, let · denote the Euclidean (Frobenius) norm.
We impose the following conditions for n / p n -consistency.
Assumption 1.
(i) The true parameter vector θ 0 is the unique minimizer of Q n ( θ , λ ( θ ) ) and belongs to the interior of Θ n ; (ii) There are positive functions Δ 1 ( r , p ) and Δ 2 ( ϵ ) such that for any ϵ > 0
inf { θ Θ n : θ θ 0 > ϵ } Q n ( θ , λ ( θ ) ) Δ 1 ( r n , p n ) Δ 2 ( ϵ ) > 0 ,
where lim inf n Δ 1 ( r n , p n ) > 0 ; (iii) sup θ Θ n Q ^ n ( θ , λ ( θ ) ) Q n ( θ , λ ( θ ) ) = o p ( Δ 1 ( r n , p n ) ) .
Assumption 2.
(i) E [ sup θ Θ n ( m i ( θ ) r n 1 / 2 ) α ] < for some α > 4 ; (ii) lim n r n 4 / n = 0 .
Assumption 3.
(i) There exists C such that 0 < 1 / C λ min ( E [ m i ( θ ) m i ( θ ) ] ) λ max ( E [ m i ( θ ) m i ( θ ) ] ) < C < in a neighborhood of θ 0 ; (ii) There exists C such that λ max ( E [ M i ] E [ M i ] ) < C < ; (iii) There exists C such that λ max ( E [ M i ( θ ) M i ( θ ) ] ) < C < in a neighborhood of θ 0 .
Assumption 4.
(i) The moment function m ( y , θ ) is twice continuously differentiable in θ for all y in a neighborhood of θ 0 ; (ii) There exists C such that λ min d 2 Q ^ n ( θ , λ ^ ( θ ) ) d θ d θ C > 0 in a neighborhood of θ 0 with probability approaching one.
Assumption 5.
lim n q n κ n / min 1 j q n | θ j 0 | = 0 .
Assumption 1 is similar to condition 2.1 of Chang et al. (2015). Assumption 1 (iii) is an extension of the uniform convergence. If we restrict the parameter space such that Θ n is compact and E [ sup θ Θ n log ( 1 λ ( θ ) m i ( θ ) ) ] < , then Assumption 1 (iii) is satisfied with Δ 1 ( r , p ) = 1 . Assumption 1 is used to show that θ ^ n θ 0 = o p ( 1 ) . Any condition that guarantees consistency of the estimator can replace 1.
Assumptions 2 (i) and (ii) are similar to Assumptions 2 and 4 in Leng and Tang (2012). However, we do not assume that p n / r n c ( 0 , 1 ) . Thus, r n can grow faster than p n . We can allow the case where p n is fixed and only r n increases with the sample size.
Assumption 4 states that the objective function of the EL estimator is strictly convex in θ in a neighborhood of θ 0 . When r n and p n are fixed, this condition is satisfied under fairly weak conditions. We can also relax the condition so that λ min d 2 Q ^ n ( θ , λ ^ ( θ ) ) d θ d θ ρ n with a positive sequence ρ n such that ρ n 0 . In that case, we obtain a different convergence rate of the estimator. Under certain conditions, we have θ ^ n θ 0 = O p ( p n / n / ρ n ) .
Assumption 5 is similar to condition (B2) in Huang and Xie (2007), who obtained the convergence rate of the SCAD-penalized least squares estimator. Assumption 5 states that the minimum of nonzero elements in θ 0 may converge to 0, but the convergence rate must be sufficiently slow. If nonzero elements are too small compared to κ n , then the PEL estimator cannot distinguish between zero and nonzero elements. Following Huang and Xie (2007), we prove n / p n -consistency of the PEL estimator in two steps. We first prove θ ^ n θ 0 = O p ( p n / n + q n κ n ) under Assumptions 1–4 and q n κ n 2 0 (see Lemma A3 in the Appendix A). Then, we improve the convergence rate by using Assumption 5. Notice that if we assume q n κ n = O ( p n / n ) , then n / p n -consistency of the PEL estimator is obtained immediately from Lemma A3. However, as we will see later, this condition contradicts Assumption 6 (i), which is a key condition for the oracle property. Assumption 5 is imposed so that n / p n -consistency and the oracle property are satisfied simultaneously.
Theorem 1.
Suppose that Assumptions 1–5 hold. Then, we have θ ^ n θ 0 = O p ( p n / n ) .
The sparsity assumption is not necessary for this theorem. The same result is obtained even if all elements in θ 0 are nonzero. Moreover, because Assumption 5 does not exclude κ n = 0 , the theorem also applies to the non-penalized EL estimator, whose n / p n -consistency has been established by Chang et al. (2015). As we will see in the next theorem, if the truth is sparse, then we obtain n / q n -consistency of the PEL estimator under certain additional assumptions.
Our convergence rate of the PEL estimator is better than that of Chang et al. (2015). Roughly speaking, different convergence rates are based on different equalities. The asymptotic analyses of Leng and Tang (2012) and Chang et al. (2015) are based on the moment equality E [ m i ] = 0 , which implies n 1 i = 1 n m i = O p ( r n / n ) . Leng and Tang (2012) obtained n / p n -consistency of the non-penalized EL estimator by assuming r n = O ( p n ) and hence n 1 i = 1 n m i = O p ( p n / n ) . On the other hand, our asymptotic analysis is based on the first-order condition E d log ( 1 λ ( θ 0 ) m i ( θ 0 ) ) d θ = 0 , which implies d Q ^ n ( θ 0 , λ ^ n ( θ 0 ) ) d θ = O p ( p n / n ) . Therefore, our proof is not a straightforward extension of that of Leng and Tang (2012) and Chang et al. (2015).
To obtain a convergence rate in line with the proof of Leng and Tang (2012) and Chang et al. (2015), we need a rather strong condition on the regularization parameter. For instance, Chang et al. (2015) assumed that q n κ n r n 1 n M 1 = O ( 1 ) to prove n / r n -consistency, where M is the block length, which is equal to unity when the observations are independent. The condition of Chang et al. (2015) corresponds to the condition that q n κ n = o ( p n / n ) in our case. As stated before, although this condition simplifies the proof of n / p n -consistency, it causes a problem for the oracle property of the estimator.
Next, we show sparsity and asymptotic normality of the PEL estimator. Let θ ^ 1 n and θ ^ 2 n be the corresponding estimators of θ 10 and θ 20 , respectively. Furthermore, let M 1 i = m i ( θ 10 , 0 ) / θ 1 . We define V n = ( E [ M i ] E [ m i m i ] 1 E [ M i ] ) 1 and V 1 n = ( E [ M 1 i ] E [ m i m i ] 1 E [ M 1 i ] ) 1 .
We impose additional conditions.
Assumption 6.
(i) lim n n / p n κ n = ; (ii) lim n r n p n 3 / 2 / n = 0
Assumption 7.
There exists B j k l ( y ) such that | 2 m l ( y , θ ) / θ j θ k | B j k l ( y ) and E [ B j k l 2 ( y i ) ] < for all j , k = 1 , , p n and l = 1 , , r n in a neighborhood of θ 0 .
Assumption 8.
There exists C such that 0 < 1 / C λ min ( V n ) λ max ( V n ) C < .
Assumption 6 (i) is a key condition for sparsity of the PEL estimator. It requires that the regularization parameter is not too small so that zero elements in θ 0 are estimated as zero. The same condition is also employed by Leng and Tang (2012).
Theorem 2.
Suppose that Assumptions 1–8 hold. Let B n be an l × q n matrix such that B n B n G , where G is an l × l matrix with fixed l. Then, the PEL estimator satisfies the following:
1. 
Sparsity: θ ^ 2 n = 0 with probability approaching one.
2. 
n / q n -consistency: θ ^ 1 n θ 10 = O p ( q n / n ) .
3. 
Asymptotic normality: n B n V 1 n 1 / 2 ( θ ^ 1 n θ 10 ) d N ( 0 , G ) .
The selection of the matrix B n depends on the parameter of interest. For instance, suppose that the parameter of interest is the first element of θ 10 . Let θ ^ 1 n , 1 and θ 10 , 1 denote first elements of θ ^ 1 n and θ 10 , respectively. Then, we choose B n = ( 1 , 0 , , 0 ) and obtain n ( θ ^ 1 n , 1 θ 10 , 1 ) d N ( 0 , v 11 ) , where v 11 is the limit of the first diagonal element of V 1 n .
Although a detailed proof is given in the Appendix A, we give a sketch of the proof for asymptotic normality here. If λ ( θ ) were known, then θ 0 can be estimated by
θ ˜ n = arg min θ Θ n 1 n i = 1 n log ( 1 λ ( θ ) m i ( θ ) ) + j = 1 p n p κ n ( θ j ) ,
which is a penalized maximum likelihood estimator using a least favorable submodel of the moment restriction model (see Sueishi 2016, for instance). Because θ ˜ n is the penalized maximum likelihood estimator, its distribution can be obtained in a manner similar to Fan and Peng (2004). We derive the asymptotic distribution of θ ^ n by showing that θ ^ n is asymptotically equivalent to θ ˜ n .
By modifying the proof of Theorem 2, we can obtain easily the asymptotic distribution of the non-penalized EL estimator. Because the asymptotic distribution of the non-penalized EL estimator has already been derived by Leng and Tang (2012), we omit the derivation. We see that the efficiency of the PEL estimator for θ 10 is the same as that of the non-penalized EL estimator for which it is known a priori that θ 20 = 0 . Thus, our estimator satisfies the oracle property in the sense of Fan and Peng (2004).
Theorem 2 is similar to Theorem 3 of Leng and Tang (2012). However, they proved sparsity by assuming that the PEL estimator is n / p n -consistent. They did not state explicitly the conditions under which the non-penalized and penalized EL estimators have the same convergence rate.
Chang et al. (2015) showed a similar result to Theorem 2 for weakly dependent observations. They obtained n / r n -consistency and sparsity under two separate κ n rate conditions. Specifically, they assume: (i) q n κ n r n 1 n M 1 = O ( 1 ) for n / r n -consistency and (ii) κ n n / r n M 1 for sparsity. If condition (ii) is satisfied, however, condition (i) requires that q n n / r n 0 , which is clearly impossible. This causes a trouble because their proof of sparsity requires n / r n -consistency of the estimator. We relaxed condition (i) and obtained sufficient conditions under which both n / p n -consistency and sparsity are satisfied.

3. Conclusions

We investigated the asymptotic properties of the PEL estimator when the number of parameters and/or the number of moment restrictions increases with the sample size. In particular, we showed that the PEL estimator is n / p n -consistent under a reasonable condition on the regularization parameter. Although we cannot compare our results directly to those of Chang et al. (2015) because they allow weakly dependent observations, our convergence rate is improved over the existing ones. In terms of converge rate, our result is even better than Tang et al. (2018) and Chang et al. (2018), because their convergence rates depend also on the number of moment restrictions.
A crucial issue with the PEL estimation concerns selecting the size of the regularization parameter. The asymptotic theory does not tell us how to select the regularization parameter in practice. Although some selection methods are considered by Leng and Tang (2012), Shi (2016b), and Ando and Sueishi (2019), this is still an underdeveloped area of research.

Author Contributions

Both authors contributed equally to this work.

Funding

This research was supported by JSPS KAKENHI Grant Number 15K03396.

Acknowledgments

The authors would like to thank anonymous reviewers for their comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Throughout the Appendix, C denotes a generic positive constant which may vary according to context. The qualifier “with probability approaching one” is abbreviated as w.p.a.1. We define
H 11 ( θ , λ ) = E 2 log ( 1 λ m i ( θ ) ) θ θ = E θ ( M i ( θ ) λ ) 1 λ m i ( θ ) E M i ( θ ) λ λ M i ( θ ) ( 1 λ m i ( θ ) ) 2 H 12 ( θ , λ ) = E 2 log ( 1 λ m i ( θ ) ) θ λ = E M i ( θ ) 1 λ m i ( θ ) E M i ( θ ) λ m i ( θ ) ( 1 λ m i ( θ ) ) 2 H 22 ( θ , λ ) = E 2 log ( 1 λ m i ( θ ) ) λ λ = E m i ( θ ) m i ( θ ) ( 1 λ m i ( θ ) ) 2 .
We use H ^ i j ( θ , λ ) to denote the sample analog of H i j ( θ , λ ) . Moreover, we define Q ^ n ( θ ) = Q ^ n ( θ , λ ^ ( θ ) ) and Q n ( θ ) = Q n ( θ , λ ( θ ) ) .
We prepare some lemmas to prove Theorems 1 and 2.
Lemma A1.
Suppose that Assumptions 1, 2 and 3 (i) hold. Then, we have θ ^ n θ 0 = o p ( 1 ) if q n κ n 2 0 .
Proof of Lemma A1.
Let ξ satisfy 1 / α + 1 / 8 ξ < 3 / 8 and let Λ ¯ n = { λ R r n : λ n ξ } . Then, by Assumption 2, we have
max 1 i n sup θ Θ n | λ m i ( θ ) | n ξ max 1 i n sup θ Θ n m i ( θ ) = o p ( n ξ + 1 / α r n 1 / 2 ) = o p ( 1 )
for all λ Λ ¯ n . Let λ ˜ = arg max λ Λ ¯ n Q ^ n ( θ 0 , λ ) . Because Assumptions 2 (ii) and 3 (i) imply λ min ( n 1 i = 1 n m i m i ) > C w.p.a.1, by expanding log ( 1 x ) around x = 0 , we have
0 Q ^ n ( θ 0 , λ ˜ ) λ ˜ m ¯ n 1 2 λ ˜ 1 n i = 1 n m i m i ( 1 λ ˙ m i ) 2 λ ˜ λ ˜ m ¯ n C λ ˜ 2 ,
where m ¯ n = n 1 i = 1 n m i and λ ˙ lies between 0 and λ ˜ . Therefore, we obtain λ ˜ = O p ( m ¯ n ) = O p ( r n / n ) = o p ( n 3 / 8 ) by Assumption 2 (ii), and hence λ ˜ int ( Λ ¯ n ) . Because Λ ¯ n Λ ^ n ( θ 0 ) , the concavity of Q ^ n ( θ 0 , λ ) implies λ ˜ = λ ^ ( θ 0 ) . Moreover, we obtain
Q ^ n ( θ ^ n , λ ( θ ^ n ) ) Q ^ n ( θ ^ n ) Q ^ n ( θ 0 ) + j = 1 p n p κ n ( θ j 0 ) = o p ( 1 ) .
Now, suppose that θ ^ n is not consistent. Then, there exists a subsequence { n k } such that θ ^ n k θ 0 > ϵ for some ϵ > 0 almost surely. By Assumption 1 (iii) and Equation (A2), we have Q n k ( θ ^ n k ) = o p ( Δ 1 ( r n k , p n k ) ) + o p ( 1 ) . In contrast, Assumption 1 (ii) implies Q n k ( θ ^ n k ) > Δ 1 ( r n k , p n k ) Δ 2 ( ϵ ) . Because lim inf n Δ ( r n , p n ) > 0 , it is a contradiction. Therefore, we have θ ^ n θ 0 = o p ( 1 ) . □
Lemma A2.
Suppose that Assumptions 1–3 hold. Then, we have
d Q ^ n ( θ 0 ) d θ d Q ^ n ( θ 0 , λ ( θ 0 ) ) ) d θ = o p 1 n .
Proof of Lemma A2.
Let H i j ( θ ) = H i j ( θ , λ ( θ ) ) and H ^ i j ( θ ) = H ^ i j ( θ , λ ^ ( θ ) ) for i , j = 1 , 2 . Also, let H i j = H i j ( θ 0 ) and H ^ i j = H ^ i j ( θ 0 ) . Because λ ( θ 0 ) = 0 , we have
H ^ 12 H 12 1 n i = 1 n M i λ ^ ( θ 0 ) m i ( 1 λ ^ ( θ 0 ) m i ) 2 + 1 n i = 1 n M i 1 λ ^ ( θ 0 ) m i E [ M i ] .
From the proof of Lemma A1, we see that λ ^ ( θ 0 ) = O p ( r n / n ) . In addition, it follows from Assumptions 2 (ii) and 3 (iii) that λ max ( n 1 i = 1 n M i M i ) < C w.p.a.1. Because n 1 i = 1 n m i 2 = O p ( r n ) by Assumption 2 (i), we have
1 n i = 1 n M i λ ^ ( θ 0 ) m i ( 1 λ ^ ( θ 0 ) m i ) 2 C λ ^ ( θ 0 ) 1 n i = 1 n M i M i λ ^ ( θ 0 ) 1 n i = 1 n m i 2 = O p r n n .
Furthermore, because, | λ ^ ( θ 0 ) m i | = o p ( 1 ) for all i, we have ( 1 λ ^ ( θ 0 ) m i ) 1 = 1 + λ ^ ( θ 0 ) m i + o p ( | λ ^ ( θ 0 ) m i | ) . Hence, we have
1 n i = 1 n M i 1 λ ^ ( θ 0 ) m i E [ M i ]    1 n i = 1 n M i E [ M i ] + C 1 n i = 1 n λ ^ ( θ 0 ) m i M i = O p r n n ,
which implies H ^ 12 H 12 = O p ( r n / n ) . Similarly, we have
H ^ 22 H 22 1 n i = 1 n m i m i E [ m i m i ] + C 1 n i = 1 n ( λ ^ ( θ 0 ) m i ) m i m i 1 n i = 1 n m i m i E [ m i m i ] + C λ ^ ( θ 0 ) 1 n i = 1 n m i m i λ ^ ( θ 0 ) 1 n i = 1 n m i 4 = O p r n 3 / 2 n .
By the Taylor expansion,
d Q ^ n ( θ 0 ) d θ d Q ^ n ( θ 0 , λ ( θ 0 ) ) d θ = d d θ Q ^ n ( θ , λ ˙ ( θ ) ) λ θ = θ n 0 λ ^ ( θ 0 ) + λ ^ ( θ 0 ) θ λ ( θ 0 ) θ Q ^ n ( θ 0 , λ ˙ ( θ 0 ) ) λ ,
where λ ˙ ( θ ) locates between λ ^ ( θ ) and λ ( θ ) . By applying the implicit function theorem to the first-order conditions, we obtain
λ ^ ( θ 0 ) θ = H ^ 22 1 H ^ 21 and λ ( θ 0 ) θ = H 22 1 H 21 .
Here we have 1 / C λ min ( H ^ 22 ) λ max ( H ^ 22 ) < C by Assumptions 2 (ii) and 3 (i) and Equation (A3) w.p.a.1. Thus, by Assumption 3 (ii), we have
λ ^ ( θ 0 ) θ λ ( θ 0 ) θ H ^ 22 1 ( H ^ 21 H 21 ) + ( H ^ 22 1 H 22 1 ) H 21 = O p r n 3 / 2 n .
Moreover, some calculation yields
d d θ Q ^ n ( θ , λ ˙ ( θ ) ) λ θ = θ n 0   = H ^ 12 ( θ 0 , λ ˙ ( θ 0 ) ) + λ ˙ ( θ 0 ) θ H ^ 22 ( θ 0 , λ ˙ ( θ 0 ) )   H ^ 12 ( θ 0 , λ ˙ ( θ 0 ) ) H 12 + λ ˙ ( θ 0 ) θ λ ( θ 0 ) θ H ^ 22 ( θ 0 , λ ˙ ( θ 0 ) )     + H 12 H 22 1 H ^ 22 ( θ 0 , λ ˙ ( θ 0 ) ) H 22   = O p r n 3 / 2 n .
Combining these results, we obtain
d Q ^ n ( θ 0 ) d θ d Q ^ n ( θ 0 , λ ( θ 0 ) ) d θ = O p r n 2 n ,
which implies the desired result by Assumption 2 (ii). □
Lemma A3.
Suppose that Assumptions 1–4 hold. Then, we have θ ^ n θ 0 = O p ( p n / n + q n κ n ) .
Proof of Lemma A3.
We denote 2 Q ^ n ( θ ) = d 2 Q ^ n ( θ ) / d θ d θ . By Assumption 4, 2 Q ^ n ( θ ) is positive definite in a neighborhood of θ 0 w.p.a.1. By the definition of the PEL estimator, we have
Q ^ n ( θ 0 ) + j = 1 p n p κ n ( θ j 0 ) Q ^ n ( θ ^ n ) .
Because p κ n ( θ j 0 ) ( a + 1 ) κ 2 / 2 for j = 1 , , q n and p κ n ( θ j 0 ) = 0 for j = q n + 1 , , p n , expanding Equation (A4) yields
0 2 d Q ^ n ( θ 0 ) d θ ( θ ^ n θ 0 ) + ( θ ^ n θ 0 ) 2 Q ^ n ( θ ˙ n ) ( θ ^ n θ 0 ) ( a + 1 ) q n κ n 2 = 2 Q ^ n 1 / 2 ( θ ˙ n ) ( θ ^ n θ 0 ) + 2 Q ^ n 1 / 2 ( θ ˙ n ) d Q ^ n ( θ 0 ) d θ 2 d Q ^ n ( θ 0 ) d θ 2 Q ^ n 1 ( θ ˙ n ) d Q ^ n ( θ 0 ) d θ ( a + 1 ) q n κ n 2
for some θ ˙ n located between θ ^ n and θ 0 . Therefore, by the Loève’s C 2 -inequality, we obtain
2 Q ^ n 1 / 2 ( θ ˙ n ) ( θ ^ n θ 0 ) 2   2 2 Q ^ n 1 / 2 ( θ ˙ n ) ( θ ^ n θ 0 ) + 2 Q ^ n 1 / 2 ( θ ˙ n ) d Q ^ n ( θ 0 ) d θ 2 + 2 d Q ^ n ( θ 0 ) d θ 2 Q ^ n 1 ( θ ˙ n ) d Q ^ n ( θ 0 ) d θ   4 d Q ^ n ( θ 0 ) d θ 2 Q ^ n 1 ( θ ˙ n ) d Q ^ n ( θ 0 ) d θ + 2 ( a + 1 ) q n κ n 2 .
By Lemma A2, we obtain d Q ^ n ( θ 0 ) d θ = O p ( p n / n ) , and hence
C θ ^ n θ 0 2 2 Q ^ n 1 / 2 ( θ ˙ n ) ( θ ^ n θ 0 ) 2 = O p p n n + q n κ n 2
by Assumption 4 (ii). □
Proof of Theorem 1.
If q n κ n = O ( p n / n ) , then we trivially have θ ^ n θ 0 = O p ( p n / n ) by Lemma A3. Thus, we only consider the case where q n κ n / p n / n .
By Lemma A3, we have
θ ^ n θ 0 = O p ( u n ) with u n = p n n + q n κ n .
Furthermore, for any M and for any θ such that θ θ 0 2 M u n , we have
min 1 j q n | θ j | min 1 j q n | θ j 0 | 2 M u n .
By Assumption 5, we have u n / min 1 j q n | θ j 0 | < 2 M 1 for sufficiently large n, and hence
min 1 j q n | θ j | 1 2 min 1 j q n | θ j 0 | .
This implies that min 1 j q n | θ j | > a κ n for sufficiently large n.
Let { h n } be a positive sequence that converges to 0 as n . Following Huang and Xie (2007), we decompose Θ n \ { 0 } into shells S n , k = { θ : 2 k 1 h n θ θ 0 2 k h n } for k = 1 , 2 , . For θ S n , k such that 2 k h n 2 M u n , we obtain
Q ^ n ( θ ) Q ^ n ( θ 0 ) = d Q ^ n ( θ 0 ) d θ ( θ θ 0 ) + 1 2 ( θ θ 0 ) 2 Q ^ n ( θ ˙ n ) ( θ θ 0 )
and
1 2 ( θ θ 0 ) 2 Q ^ n ( θ ˙ n ) ( θ θ 0 ) 2 2 k 3 C h n 2
w.p.a.1. Let E n be the event such that Equation (A5) is satisfied. Because Lemma A2 implies that the difference between d Q ^ n ( θ 0 ) d θ and d Q ^ n ( θ 0 , λ ( θ 0 ) ) d θ is asymptotically negligible, we have
P θ ^ n θ 0 > 2 L h n   P θ ^ n θ 0 > 2 M u n + P 2 L h n < θ ^ n θ 0 2 M u n E n   = o ( 1 ) + k P θ ^ n S n , k E n   o ( 1 ) + k P inf θ S n , k Q ^ n ( θ ) + j = 1 p n p κ n ( θ j ) Q ^ n ( θ 0 ) + j = 1 p n p κ n ( θ j 0 ) E n   o ( 1 ) + k P sup θ S n , k d Q ^ n ( θ 0 , λ ( θ 0 ) ) d θ ( θ θ 0 ) 2 2 k 3 C h n 2 ,
where k stands for k : k > L , 2 k h n 2 M u n . Moreover, some calculation yields that
d Q ^ n ( θ 0 , λ ( θ 0 ) ) d θ = 1 n i = 1 n E [ M i ] E [ m i m i ] 1 m i .
Thus, it follows from the Markov and Cauchy-Schwarz inequalities that
k P sup θ S n , k d Q ^ n ( θ 0 , λ ( θ 0 ) ) d θ ( θ θ 0 ) 2 2 k 3 C h n 2   C k E sup θ S n , k d Q ^ n ( θ 0 , λ ( θ 0 ) ) d θ ( θ θ 0 ) 2 2 k 3 h n 2   C k : k > L 2 k h n ( tr { E [ M i ] E [ m i m i ] 1 E [ M i ] } / n ) 1 / 2 2 2 k 3 h n 2   C k : k > L p n / n 2 k 3 h n .
Notice that k is changed to k : k > L in the second inequality. By choosing h n = p n / n , we obtain the desired result. □
Lemma A4.
Suppose that Assumptions 2, 3, 4 (i) and 7 hold. Then, for any θ such that θ θ 0 = O p ( p n / n ) , we have
2 Q ^ n ( θ ) 2 Q n ( θ 0 ) = O p r n 3 / 2 n + O p r n p n n .
Proof of Lemma A4.
Let θ satisfy θ θ 0 = O p ( p n / n ) . By a simple calculation, we obtain
2 Q ^ n ( θ ) = H ^ 11 ( θ ) H ^ 12 ( θ ) H ^ 22 1 ( θ ) H ^ 21 ( θ )
and
2 Q n ( θ 0 ) = H 11 H 12 H 22 1 H 21 = E [ M i ] E [ m i m i ] 1 E [ M i ] .
Thus, it is sufficient to show that
H ^ 11 ( θ ) + H ^ 12 ( θ ) H ^ 22 ( θ ) 1 H ^ 21 ( θ ) E [ M i ] E [ m i m i ] 1 E [ M i ] = O p r n 3 / 2 n + O p r n p n n .
By using a similar argument as in Equation (A1), we have λ ^ ( θ ) = O p ( r n / n ) . Also, the ( j , k ) element of θ ( M i ( θ ) λ ^ ( θ ) ) is given by l = 1 r n 2 m l ( y i , θ ) / θ j θ k λ ^ l ( θ ) and
1 n i = 1 n l = 1 r n 2 m l ( y i , θ ) θ j θ k λ ^ l ( θ ) 1 n i = 1 n l = 1 r n B j k l 2 ( y i ) λ ^ ( θ ) = O p r n n
by Assumption 7. Therefore, we have
H ^ 11 ( θ ) C 1 n i = 1 n θ ( M i ( θ ) λ ^ ( θ ) ) + C 1 n i = 1 n M i ( θ ) λ ^ ( θ ) λ ^ ( θ ) M i ( θ ) = O p r n p n n .
Moreover, by doing similar calculations as in the proof of Lemma A2, we obtain
H ^ 12 ( θ ) E [ M i ] 1 n i = 1 n M i ( θ ) 1 n i = 1 n M i + O p r n n 1 n i = 1 n j = 1 p n M i ( θ ˙ ) θ j 2 θ θ 0 + O p r n n = O p r n 1 / 2 p n 3 / 2 n + O p r n n
and
H ^ 22 ( θ ) E [ m i m i ]   1 n i = 1 n m i ( θ ) m i ( θ ) 1 n i = 1 n m i m i + O p r n 3 / 2 n   2 1 n i = 1 n m i M i ( θ ˙ ) ( θ θ 0 ) + ( θ θ 0 ) 1 n i = 1 n M i ( θ ˙ ) M i ( θ ˙ ) ( θ θ 0 ) + O p r n 3 / 2 n   = O p r n 3 / 2 n
for some θ ˙ that is located between θ and θ 0 . Hence, we obtain the result. □
Proof of Theorem 2.
We first prove sparsity. Theorem 1 and Assumption 6 (i) imply that θ ^ n θ 0 κ n w.p.a.1. Thus, it is sufficient to show that w.p.a.1,
d Q ^ n ( θ 0 + v ) d θ j + p κ n ( v j ) > 0 ( 0 < v j < κ n ) d Q ^ n ( θ 0 + v ) d θ j + p κ n ( v j ) < 0 ( κ n < v j < 0 )
for any v = ( v 1 , , v p n ) such that v = O ( p n / n ) and for j = q n + 1 , , p n . Because p κ n ( u ) = κ n sgn ( u ) for | u | κ n , we have
d Q ^ n ( θ 0 + v ) d θ j + p κ n ( v j ) = d Q ^ n ( θ 0 ) d θ j + d 2 Q ^ n ( θ 0 + v ˙ ) d θ j d θ v + κ n sgn ( v j ) I 1 + I 2 + I 3
for j = q n + 1 , , p n and for some v ˙ such that v ˙ = O p ( p n / n ) . By Lemma A2, we have | I 1 | = O p ( p n / n ) . Moreover, by Assumption 8 and Lemma A4, we have
d 2 Q ^ n ( θ 0 + v ˙ ) d θ j d θ = O p ( 1 ) ,
and thus | I 2 | = O p ( p n / n ) . Therefore, I 1 and I 2 are asymptotically dominated by I 3 . The sign of d Q ^ n ( θ 0 + v ) / d θ j + p κ n ( v j ) is determined by the sign of v j .
Next, we show asymptotic normality. Let Q ^ 1 n ( θ 1 ) = Q ^ n ( θ 1 , 0 ) . Lemma A3 and Assumption 5 imply that min 1 j q n | θ ^ j | > a κ n w.p.a.1. Moreover, we have P ( θ ^ 2 n = 0 ) 1 . Thus, expanding the first-order condition for θ ^ 1 n yields
0 = d Q ^ 1 n ( θ 10 ) d θ 1 + d 2 Q ^ 1 n ( θ ˙ 1 n ) d θ 1 d θ 1 ( θ ^ 1 n θ 10 )
for some θ ˙ 1 n that is located between θ ^ 1 n and θ 10 . Combining this with Lemmas A2 and A4 and Assumptions 2 (ii) and 6 (ii), we have
V 1 n 1 ( θ ^ 1 n θ 10 ) = d Q ^ n ( θ 0 , λ ( θ 0 ) ) d θ 1 + o p 1 n ,
which immediately implies that θ ^ 1 n θ 10 = O p ( q n / n ) . Moreover, because tr ( B n V 1 n B n ) < C tr ( B n B n ) < C by the assumption of Theorem 2 and Assumption 8, we have
n B n V 1 n 1 / 2 ( θ ^ 1 n θ 10 ) = n B n V 1 n 1 / 2 d Q ^ n ( θ 0 , λ ( θ 0 ) ) d θ 1 + o p ( B n V 1 n 1 / 2 ) = i = 1 n z n i + o p ( 1 ) ,
where
z n i = 1 n B n V 1 n 1 / 2 E [ M 1 i ] E [ m i m i ] 1 m i .
Here, by Assumptions 2 (i) and 8, we have
E z n i 4 = 1 n 2 E m i E [ m i m i ] 1 E [ M 1 i ] V 1 n 1 / 2 B n B n V 1 n 1 / 2 E [ M 1 i ] E [ m i m i ] 1 m i 2 C n 2 E { m i m i } 2 = O r n 2 n 2 .
Furthermore, because B n B n G , we have i = 1 n E [ z n i z n i ] G and
P ( z n i > ϵ ) E [ z n i z n i ] ϵ 2 = O 1 n .
Therefore, we obtain
i = 1 n E z n i 2 1 { z n i 2 > ϵ } n E z n i 4 1 / 2 P ( z n i > ϵ ) 1 / 2 = o ( 1 ) ,
and thus i = 1 n z n i d N ( 0 , G ) by the Lindeberg-Feller central limit theorem. □

References

  1. Ando, Tomohiro, and Naoya Sueishi. 2019. Regularization parameter selection for penalized empirical likelihood estimator. Economics Letters 178: 1–4. [Google Scholar] [CrossRef]
  2. Bai, Jushan, and Serena Ng. 2009. Selecting instrumental variables in a data rich environment. Journal of Time Series Econometrics 1: 4. [Google Scholar]
  3. Belloni, Alexandre, Daniel Chen, Victor Chernozhukov, and Christian Hansen. 2012. Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80: 2369–429. [Google Scholar]
  4. Breiman, Leo. 1996. Heuristics of instability and stabilization in model selection. Annals of Statistics 24: 2350–83. [Google Scholar] [CrossRef]
  5. Caner, Mehmet, and Qingliang Fan. 2015. Hybrid generalized empirical likelihood estimators: Instrument selection with adaptive lasso. Journal of Econometrics 187: 256–74. [Google Scholar] [CrossRef]
  6. Caner, Mehmet, and Hao Helen Zhang. 2014. Adaptive elastic net for generalized methods of moments. Journal of Business & Economic Statistics 32: 30–47. [Google Scholar]
  7. Caner, Mehmet. 2009. Lasso-type gmm estimator. Econometric Theory 25: 270–90. [Google Scholar] [CrossRef]
  8. Chang, Jinyuan, Song Xi Chen, and Xiaohong Chen. 2015. High dimensional generalized empirical likelihood for moment restrictions with dependent data. Journal of Econometrics 185: 283–304. [Google Scholar] [CrossRef] [Green Version]
  9. Chang, Jinyuan, Cheng Yong Tang, and Tong Tong Wu. 2018. A new scope of penalized empirical likelihood with high-dimensional estimating equations. Annals of Statistics 46: 3185–216. [Google Scholar] [CrossRef]
  10. Cheng, Xu, and Zhipeng Liao. 2015. Select the valid and relevant moments: An information-based lasso for gmm with many moments. Journal of Econometrics 186: 443–64. [Google Scholar] [CrossRef]
  11. Donald, Stephen G., and Whitney K. Newey. 2001. Choosing the number of instruments. Econometrica 69: 1161–91. [Google Scholar] [CrossRef]
  12. Donald, Stephen G., Guido W. Imbens, and Whitney K. Newey. 2003. Empirical likelihood estimation and consistent tests with conditional moment restrictions. Journal of Econometrics 117: 55–93. [Google Scholar] [CrossRef]
  13. Fan, Jianqing, and Yuan Liao. 2014. Endogeneity in high dimensions. Annals of Statistics 42: 872–917. [Google Scholar] [CrossRef]
  14. Fan, Jianqing, and Runze Li. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96: 1348–60. [Google Scholar] [CrossRef]
  15. Fan, Jianqing, and Heng Peng. 2004. Nonconcave penalized likelihood with a diverging number of parameters. Annals of Statistics 32: 928–61. [Google Scholar]
  16. Frank, Ildiko E., and Jerome H. Friedman. 1993. A statistical view of some chemometrics regression tools. Technometrics 35: 109–35. [Google Scholar] [CrossRef]
  17. Huang, Jian, and Huiliang Xie. 2007. Asymptotic oracle properties of scad-penalized least squares estimators. IMS Lecture Notes–Monograph Series 55: 149–66. [Google Scholar]
  18. Kuersteiner, Guido, and Ryo Okui. 2010. Constructing optimal instruments by first-stage prediction averaging. Econometrica 78: 697–718. [Google Scholar]
  19. Leng, Chenlei, and Cheng Yong Tang. 2012. Penalized empirical likelihood and growing dimensional general estimating equations. Biometrika 99: 703–16. [Google Scholar] [CrossRef]
  20. Qin, Jin, and Jerry Lawless. 1994. Empirical likelihood and general estimating equations. Annals of Statistics 22: 300–25. [Google Scholar] [CrossRef]
  21. Shi, Zhentao. 2016a. Econometric estimation with high-dimensional moment equalities. Journal of Econometrics 195: 104–19. [Google Scholar] [CrossRef]
  22. Shi, Zhentao. 2016b. Estimation of sparse structral parameter with many endogenous variables. Econometric Reviews 35: 1582–608. [Google Scholar] [CrossRef]
  23. Sueishi, Naoya. 2016. A simple derivation of the efficiency bound for conditional moment restriction models. Economics Letters 138: 57–59. [Google Scholar] [CrossRef]
  24. Tang, Niansheng, Xiaodong Yan, and Puying Zhao. 2018. Exponentially tilted likelihood inference on growing dimensional unconditional moment models. Journal of Econometrics 202: 57–74. [Google Scholar] [CrossRef] [Green Version]
  25. Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B 58: 267–88. [Google Scholar] [CrossRef]
  26. Zhang, Cun-Hui. 2010. Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38: 894–942. [Google Scholar] [CrossRef] [Green Version]
  27. Zou, Hui, and Trevor Hastie. 2005. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 67: 301–20. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Ando, T.; Sueishi, N. On the Convergence Rate of the SCAD-Penalized Empirical Likelihood Estimator. Econometrics 2019, 7, 15. https://doi.org/10.3390/econometrics7010015

AMA Style

Ando T, Sueishi N. On the Convergence Rate of the SCAD-Penalized Empirical Likelihood Estimator. Econometrics. 2019; 7(1):15. https://doi.org/10.3390/econometrics7010015

Chicago/Turabian Style

Ando, Tomohiro, and Naoya Sueishi. 2019. "On the Convergence Rate of the SCAD-Penalized Empirical Likelihood Estimator" Econometrics 7, no. 1: 15. https://doi.org/10.3390/econometrics7010015

APA Style

Ando, T., & Sueishi, N. (2019). On the Convergence Rate of the SCAD-Penalized Empirical Likelihood Estimator. Econometrics, 7(1), 15. https://doi.org/10.3390/econometrics7010015

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop