Abstract
This paper develops model selection and averaging methods for moment restriction models. We first propose a focused information criterion based on the generalized empirical likelihood estimator. We address the issue of selecting an optimal model, rather than a correct model, for estimating a specific parameter of interest. Then, this study investigates a generalized empirical likelihood-based model averaging estimator that minimizes the asymptotic mean squared error. A simulation study suggests that our averaging estimator can be a useful alternative to existing post-selection estimators.
1. Introduction
This paper develops model selection and averaging methods for moment restriction models. We first propose a focused information criterion (FIC) based on the generalized empirical likelihood (GEL) estimator [1,2], which nests the empirical likelihood (EL) [3,4] and exponential tilting (ET) [5,6] estimators as special cases. Motivated by Claeskens and Hjort [7], we address the issue of selecting an optimal model for estimating a specific parameter of interest, rather than identifying a correct model or selecting a model with good global fit. Then, as an extension of FIC, this study presents a GEL-based frequentist model averaging (FMA) estimator that is designed to minimize the mean squared error (MSE) of the estimator.
Traditional model selection methods, such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), select a single model regardless of the specific goal of inference [8,9]. AIC selects a model that is close to the true data generating process (DGP) in terms of Kullback-Leibler discrepancy, while BIC selects the model with the highest posterior probability. However, a model with good global fit is not necessarily a good model for estimating a specific parameter. For instance, Hansen [10] considers the problem of deciding the order of autoregressive models. His simulation study demonstrates that the AIC-selected model does not necessarily produce a good estimate of the impulse response. This result reveals that the best model generally differs for different intended uses of the model.
In their seminal work, Claeskens and Hjort [7] established an FIC that is designed to select the optimal model depending on its intended use. Their goal is to select the model that attains the minimum MSE of the maximum likelihood estimator for the parameter of interest, which they call the focus parameter. The FIC is constructed from an asymptotic estimate of the MSE.
Since then, an FIC has been derived for several models. Claeskens, Croux and Kerckhoven [11] proposed an FIC for logistic regressions. Hjort and Claeskens [12] proposed an FIC for the Cox hazard regression model. Zhang and Liang [13] developed an FIC for the generalized additive partial linear model. Models studied in those papers are likelihood-based. However, econometric models are often specified via moment restrictions rather than parametric density functions. This paper indicates that the idea of Claeskens and Hjort [7] is applicable to moment restriction models. Our FIC is constructed using an asymptotic estimate of the MSE of the GEL estimator.
Model selection for moment restriction models is still underdeveloped. Andrews and Lu [14] proposed selection criteria based on the J-statistic of the generalized method of moments (GMM) estimator [15]. Hong, Preston and Shum [16] extended the results of Andrews and Lu to the GEL estimation. Sueishi [17] developed information criteria similar to the AIC. The goal of Andrews and Lu [14] and Hong, Preston and Shum [16] was to identify the correct model, whereas Sueishi [17] selects the best approximating model in terms of Cressie-Read discrepancy. Although these criteria are useful in many applications, they do not address the issue of selecting the model that best serves its intended purpose.
Model averaging is an alternative to model selection. Inference after model selection is typically conducted as if the selected model is the true DGP. However, this ignores uncertainty introduced by model selection. Rather than conditioning on the single selected model, the averaging technique uses all candidate models to incorporate model selection uncertainty. Although Bayesian methods are predominant in the literature [18], there is also a growing FMA literature for likelihood-based models [19,20,21]. See also Yang [22], Leung and Barron [23] and Goldenshluger [24] for related issues.
In the FMA literature, it is often of particular interest to obtain an optimal averaging estimator in terms of a certain loss [25,26,27,28]. This study investigates a GEL-based averaging method that minimizes the asymptotic mean squared error in a framework similar to that of Hjort and Claeskens [21]. A simulation study indicates that our averaging estimator outperforms existing post-model-selection estimators.
Although this study investigates GEL-based methods, in general, its results are readily applied to the two-step GMM estimator, because our results rely only on first-order asymptotic theory. However, the two-step GMM estimator often suffers from a large bias that cannot be captured by first-order asymptotics, even if the model is correctly specified. Because the FIC addresses a trade-off between misspecification bias and estimation variance, the GEL estimator will be more suitable for our framework.
Now, we review related works. DiTraglia [29] proposes an instrument selection criterion for GMM that is based on the concept of FIC. Our approach resembles DiTraglia’s, but his interest is instrument selection, whereas ours is model selection. DiTraglia intentionally uses an invalid large set of instruments to improve efficiency; we intentionally use a wrong small model to improve efficiency. Liu [30] proposes an averaging estimator for the linear regression model by using a local asymptotic framework. Although Liu considers exogenous regressors, we allow endogenous regressors. Martins and Gabriel [31] consider GMM-based model averaging estimators under a framework different from ours.
The remainder of the paper is organized as follows. Section 2 describes our local misspecification framework. Section 3 derives the FIC. Section 4 discusses the FMA estimator. Section 5 provides a simple example for which our methods are applicable. Section 6 presents the result of Monte Carlo study. Section 7 concludes.
2. Local Misspecification Framework
We first introduce our setup. The basic construct follows Claeskens and Hjort [7]. There is a smallest and a largest model in our set of candidate models. The smallest, which we call the reduced model, has a p dimensional unknown parameter vector, . The largest, or the full model, has an additional q dimensional unknown parameter vector, . The full model is assumed to be correctly specified and nests the reduced model; i.e., the reduced model corresponds to the special case of the full model in which for some known . Typically, is a vector of zeros: . An example is given in Section 5.
There are up to submodels, all of which have as the common parameter vector. A submodel treats some elements of as unknown parameters and is indexed by a subset, S, of . The model, S, contains parameters, , such that . Thus, the reduced and full models correspond to and , respectively. We use “red” and “full” to denote the reduced and full models, respectively.
The focus parameter, , which is the parameter of interest, is a function of and : . It could be merely an element of . Prior knowledge or economic theories suggest that should be estimated, but we are unsure which elements of should be treated as unknown parameters. Estimating a larger model usually implies a lesser modeling bias and a larger estimation variance. However, if the reduced model is globally misspecified in the sense that the violation of the moment restriction does not disappear even in the limit, then the misspecification bias asymptotically dominates the variance of the GEL estimator. Thus, we cannot make a reasonable comparison of bias and variance in the asymptotic framework.
A local misspecification framework is introduced to take into account the bias-variance trade-off. Let be i.i.d.random vectors from an unknown density, , which depends on the sample size, n.1The functional form of is not specified. The full model is defined via the following moment restriction:
where is a known vector-valued function up to the parameters. For each n, the true parameter values of and are and , respectively. Note that is known, but and are unknown. We assume that ; i.e., the model is over-identified.
The moment function of the reduced model is . The reduced model is misspecified in the sense that there is no value , such as , for any fixed n. However, if the moment function is differentiable with respect to , then (1) implies that the reduced model satisfies:
for some vector, between and . Thus, even though the moment restriction is invalid at , the violation disappears in the limit. A similar relationship also holds for the other submodels. As the next section reveals, under this framework, the squared bias and variance of the GEL estimator are both of the order, . Hence, the trade-off between bias and variance can be considered. If is sufficiently small, it might be better to set rather than estimate .
In general, the dimension of the moment function can differ among submodels. For instance, consider a linear instrumental variable model. The model (structural form) can be estimated as long as the number of instruments exceeds or equals the number of unknown parameters. Thus, it is possible to use only a subset of instruments to estimate a submodel. For ease of exposition, however, we consider only the case where the dimension of the moment function is fixed for all submodels.
3. Focused Information Criterion
To construct an FIC, we first derive the asymptotic distribution of the GEL estimator under the local misspecification framework. Newey [32] and Hall [33] obtained a similar result in the case of GMM estimation to analyze the local power properties of specification tests.
A model, S, contains unknown parameters. The moment function of the model is denoted as , where is the complementary set of S. The values of are set to be their null values for .
Let be a concave function on its domain, , which is an open interval containing zero. We normalize , so that , where . The GEL estimator of is obtained by solving the saddle-point problem:
where is the parameter space of and is the set of feasible values of . The EL and ET estimators are special cases with and , respectively. Although has p elements for any S, we adopt the subscript, S, to emphasize that the value of the estimator depends on S.
Let , , and . Furthermore, let . We define:
where E denotes the expectation with respect to . It is assumed that satisfies:
For the full model, we denote:
Then, we can write and , where is the projection matrix of size, , that maps to the subvector, : .
Let and . Furthermore, let . To obtain the asymptotic distribution of the GEL estimator, we impose the following conditions:
Assumption 3.1
- 1.
- , , and are compact.
- 2.
- is continuous in and for almost every y.
- 3.
- under the sequence of .
- 4.
- as for all , , and .
- 5.
- is nonsingular for all and .
- 6.
- is the unique solution to and .
- 7.
- is twice continuously differentiable in a neighborhood of zero.
- 8.
- and are of full rank.
- 9.
- for some .
- 10.
- is continuously differentiable in and in a neighborhood, , of .
- 11.
- and under the sequence of .
- 12.
- and as .
- 13.
- as .
Conditions are rather high-level and strong. Some conditions can be replaced with primitive and weaker conditions [34].
We obtain the following lemma.
Lemma 3.1 Suppose Assumption 3.1 holds. Then, under the sequence of , we have:
The proof is given in the Appendix.
If the model, S, is correctly specified, then the limiting distribution of the GEL estimator is . Therefore, as usual, local misspecification affects only the mean of the limiting distribution.
Next, we get the asymptotic distribution of the GEL estimator for the focus parameter. Additional notations are introduced. Let and ; i.e., Q and are the lower right block matrices of and , respectively. Let . We assume that is differentiable with respect to and . Let:
where the partial derivatives are evaluated at . The true focus parameter is denoted as . Moreover, the GEL estimator of for the model, S, is denoted as . Lemma 3.1 and the delta method imply the following theorem:
Theorem 3.1 Suppose Assumption 3.1 holds. Then, under the sequence of , we have:
and:
where is independent of D.
The proof is almost the same as that of Lemma 3.3 in Hjort and Claeskens [21], so it is omitted.
Because and , as the special cases of the theorem, we have:
Therefore, in terms of the asymptotic MSE, the reduced model is better than the full model if , which is the case when the deviation of the reduced model from the true DGP is small.
More generally, Theorem 3.1 implies that the MSE of the limiting distribution of is:
The idea behind FIC is to estimate (2) for each model and select the model that attains the minimum estimated MSE.
All components in (2) except can be estimated easily by using their sample analogs. However, a consistent estimator for is unavailable, because converges in distribution to a normal random variable. This difficulty is inevitable, as long as we utilize the local misspecification framework. Because the mean of is , following Claeskens and Hjort [7], we use to estimate . Then, the sample counterpart of (2) is:
which is an asymptotically unbiased estimator for (2). Because the last two terms do not depend on the model, we can ignore them for the purpose of model selection. Let and . Then, our FIC for the model, S, is:
where . The bigger the model is, the smaller the first term and the larger the second term in (3). Since w depends on , FIC can be used to select an appropriate submodel, depending on the parameter of interest.
Although we consider only the case where is a scalar, our FIC is also applicable to a vector-valued focus parameter by viewing each element of the vector as a different scalar-valued focus parameter. Different models might be used to estimate different elements of the vector.
We conclude this section with a remark on the estimation of . Because we estimate by , the estimate can be negative definite in finite sample. That means that the squared bias term can be negative. To avoid such cases, as suggested by Claeskens and Hjort [35], we can also use the following bias-corrected FIC:
where is the event of negligible bias:
See Section 6.4 of Claeskens and Hjort [35] for details.
4. Model Averaging
This section extends the result of Section 3 to the averaging problem. In the FMA literature, it is often of particular interest to obtain an optimal averaging estimator in terms of a certain loss. We consider a possibility of obtaining the best averaging weights that minimize the MSE in the local misspecification framework. A similar analysis is presented in Liu [30] in the case of linear regression.
Let be the set of all candidate models. We consider an averaging estimator for the focus parameter of the form:
where the weights, , add up to unity. Note that a post-selection estimator of can also be written in this form. Let be the FIC-selected model. Then the post-selection estimator using FIC is:
where is the indicator function. Thus, the post-selection estimator is a special case of the averaging estimator.
If the weights are not random, then it is straightforward from Theorem 3.1 that:
where . Therefore, the asymptotic mean and variance of the averaging estimator are given by:
Thus, there is a set of weights that minimizes the asymptotic MSE of .
Suppose there are M candidate models: . Let be a vector of averaging weights, which is in the unit simplex in :
Ignoring , which does not depend on the model, the optimal weight vector, , that minimizes the asymptotic MSE is:
where A is an matrix, whose element is given by:
If we replace A with its appropriate estimate, , we obtain a feasible estimator:
For instance, if we estimate by , then:
Although there is no closed-form solution for (4), it can be solved numerically by a usual quadratic programing algorithm.
Unfortunately, cannot be a consistent estimator for , because there is no consistent estimator for A. Suppose that for a random matrix, , and for all . Then, we have:
Thus, is random, even in the limit.
Let and be the i-th element of and , respectively. Furthermore, let denote the averaging estimator using . Because and are both determined through , and converge jointly to and . Therefore, the limiting distribution of is given by:
Because weights are random, the limiting distribution is no longer normal. Thus, (5) is not readily applicable for inference. However, as suggested by Hjort and Claeskens [21], (5) implies that:
where is a consistent estimator for . This result can be used to construct a confidence interval for .
5. Example
This section gives a simple example to which our methods are applicable. One of the most popular models described by moment restrictions is the linear instrumental variable model. The full model we consider here is:
where and are and vectors of explanatory variables. Some elements of are potentially correlated with . The vector of instruments, , is , which may contain elements of and . Economic theory suggests that should be included in the model, but we are unsure which components of should be included. Thus, the reduced model corresponds to the case that .
In this model, is given by:
Let be the residual from the full model: . Then, for instance, can be estimated by:
Other components of can be estimated in a similar manner. It also is possible to replace the empirical probability, , with the GEL-induced probability.
If the focus parameter is the k-th element of , then we have:
where is the k-th unit vector, which have one in the k-th element and zero, elsewhere. On the other hand, if the focus parameter is for a fixed covariate value , then:
To obtain a good estimate of for a range of covariate values, rather than a single covariate value, we can utilize the idea of Claeskens and Hjort [36], who address minimizing an averaged risk over the range of covariates, rather than the pointwise risk.
6. Monte Carlo Study
We now investigate the performance of post-selection and averaging estimators by a simple Monte Carlo study. Our EL-based methods are compared with EL-based selection methods of Hong, Preston and Shum [16]. The following post-selection and averaging estimators are considered: (i) AIC-like model selection (ii) BIC-like model selection, (iii) FIC model selection and (iv) an averaging estimator, whose weights are given by (4). AIC- and BIC-like criteria are proposed by Hong, Preston and Shum [16] and are given by:
We use (6) to estimate J.
We consider the linear instrumental variable model. The DGP is specified by the following equations:
where and for some vector . Exogenous variables, , are normally distributed with mean zero and variance one, and the correlation between and is for . The vector of instruments is fixed to be . The error term, , is independent of and is generated from a standard normal distribution. Thus, the moment restriction for the full model is:
Table 1.
Estimation results; DGP, data generating process; AIC, Akaike information criterion; BIC, Bayesian information criterion; FIC, focused information criterion.
| DGP | |||||
| (1) | (2) | (3) | (4) | ||
| Full | Bias | -0.104 | -0.109 | - 0.089 | - 0.076 |
| Std | 0.544 | 0.533 | 0.509 | 0.489 | |
| RMSE | 0.554 | 0.544 | 0.516 | 0.495 | |
| Reduced | Bis | -0.279 | -0.057 | -0.148 | -0.048 |
| Std | 0.780 | 0.473 | 0.955 | 0.448 | |
| RMSE | 0.828 | 0.477 | 0.965 | 0.450 | |
| AIC | Bias | -0.113 | -0.099 | -0.101 | -0.079 |
| Std | 0.559 | 0.557 | 0.497 | 0.509 | |
| RMSE | 0.570 | 0.566 | 0.507 | 0.515 | |
| BIC | Bias | -0.136 | -0.088 | -0.104 | -0.073 |
| Std | 0.689 | 0.552 | 0.499 | 0.502 | |
| RMSE | 0.702 | 0.559 | 0.510 | 0.507 | |
| FIC | Bias | -0.139 | -0.095 | -0.112 | -0.076 |
| Std | 0.530 | 0.509 | 0.464 | 0.452 | |
| RMSE | 0.548 | 0.517 | 0.477 | 0.458 | |
| Averaging | Bias | -0.139 | -0.092 | -0.107 | -0.074 |
| Std | 0.511 | 0.476 | 0.455 | 0.444 | |
| RMSE | 0.529 | 0.484 | 0.468 | 0.450 | |
The focus parameter is . In many applications, it is often the case that the only parameter of interest in the linear model is the coefficient of the endogenous regressor. Exogenous regressors are included simply to avoid omitted variable bias. Thus, if the bias is small, it may be better to exclude some regressors to reduce the variance. In this simulation, we include the constant term, , and in all candidate models, but some elements of may be excluded. That is, some elements of are set to zero. Therefore, there are submodels in total.
To evaluate the performance of the post-selection and averaging estimators, we calculate the bias, standard deviation and root MSE (RMSE) of each estimator over 1,000 repetitions. For reference, we also report the results of the full and reduced models. The sample size is .2 We consider four DGPs: (1) , (2) , (3) and (4) . The DGPs (1) and (3) are favorable for the full model, while (2) and (4) are favorable for the reduced model. The results are summarized in Table 1.
Table 1 indicates that there are certain cases where we should avoid using the full model, even if it is the correct model. Performance of the full model is poorer than the FIC-selected model for all GDPs. As the theory suggests, the efficiency gain of FIC over the full model is large when is small. The averaging estimator outperforms all post-selection estimators. It is even better than FIC. As is consistent with findings in the literature, averaging is a useful method to reduce the risk of the estimator.
7. Conclusions
This paper studied GEL-based model selection and averaging methods that are designed to obtain an efficient estimator for the parameter of interest. We modified the local misspecification framework of Claeskens and Hjort [7], so that an FIC can be obtained for moment restriction models. Then, we proposed the averaging estimator by extending the idea of FIC.
In the simulation study, we considered the model selection/averaging problem for the linear instrumental variable model. Although some methods have been advocated for selecting/averaging instruments in the literature, there are few studies on the model selection/averaging problem. The result of the simulation suggests that our averaging can be a useful alternative to existing post-selection estimators.
Acknowledgments
The author thanks Ryo Okui for his comments and suggestions. The author also thanks three referees, seminar participants at the University of Tokyo and participants of a summer workshop on economic theory at Otaru University of Commerce for their comments. The author acknowledges financial support from the Japan Society for the Promotion of Science under KAKENHI 23730215.
References
- R.J. Smith. “Alternative Semi-Parametric Likelihood Approaches to Generalised Method of Moments Estimation.” Econ. J. 107 (1997): 503–519. [Google Scholar]
- W.K. Newey, and R.J. Smith. “Higher Order Properties of GMM and Generalized Empirical Likelihood Estimators.” Econometrica 72 (2004): 219–255. [Google Scholar]
- A.B. Owen. “Empirical Likelihood Ratio Confidence Intervals for a Single Functional.” Biometrika 75 (1988): 237–249. [Google Scholar]
- J. Qin, and J. Lawless. “Empirical Likelihood and General Estimating Equations.” Ann. Stat. 22 (1994): 300–325. [Google Scholar]
- Y. Kitamura, and M. Stutzer. “An Information-Theoretic Alternative to Generalized Method of Moments Estimation.” Econometrica 65 (1997): 861–874. [Google Scholar]
- G.W. Imbens, R.H. Spady, and P. Johnson. “Information Theoretic Approaches to Inference in Moment Condition Models.” Econometrica 66 (1998): 333–357. [Google Scholar]
- G. Claeskens, and N.L. Hjort. “The Focused Information Criterion.” J. Am. Stat. Assoc. 98 (2003): 900–916. [Google Scholar]
- H. Akaike. “Information Theory and an Extension of the Maximum Likelihood Principle.” In Second International Symposium on Information Theory. Edited by B. Petroc and F. Csake. 1973, pp. 267–281. Akademiai Kiado. [Google Scholar]
- G. Schwarz. “Estimating the Dimension of a Model.” Ann. Stat. 6 (1978): 461–464. [Google Scholar]
- B.E. Hansen. “Challenges for Econometric Model Selection.” Economet. Theor. 21 (2005): 60–68. [Google Scholar]
- G. Claeskens, C. Croux, and J.V. Kerckhoven. “Variable Selection for Logistic Regression Using a Prediction-Focused Information Criterion.” Biometrics 62 (2006): 972–979. [Google Scholar]
- N.L. Hjort, and G. Claeskens. “Focused Information Criteria and Model Averaging for the Cox Hazard Regression Model.” J. Am. Stat. Assoc. 101 (2006): 1449–1464. [Google Scholar]
- X. Zhang, and H. Liang. “Focused Information Criterion and Model Averaging for Generalized Additive Partial Linear Models.” Ann. Stat. 39 (2011): 174–200. [Google Scholar]
- D.W. Andrews, and B. Lu. “Consistent Model and Moment Selection Procedures for GMM Estimation with Application to Dynamic Panel Data Models.” J. Econometrics 101 (2001): 123–164. [Google Scholar]
- L.P. Hansen. “Large Sample Properties of Generalized Method of Moments Estimators.” Econometrica 50 (1982): 1029–1054. [Google Scholar]
- H. Hong, B. Preston, and M. Shum. “Generalized Empirical Likelihood-Based Model Selection Criteria for Moment Condition Models.” Economet. Theor. 19 (2003): 923–943. [Google Scholar]
- N. Sueishi. “Information Criteria for Moment Restriction Models.” Unpublished Manuscript. Kyoto University, 2013. [Google Scholar]
- J.A. Hoeting, D. Madigan, A.E. Raftery, and C.T. Volinsky. “Bayesian Model Averaging: A Tutorial.” Stat. Sci. 14 (1999): 382–417. [Google Scholar]
- S.T. Buckland, K.P. Burnham, and N.H. Augustin. “Model Selection: An Integral Part of Inference.” Biometrics 53 (1997): 603–618. [Google Scholar]
- K.P. Burnham, and D.R. Anderson. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Springer, 2002. [Google Scholar]
- N.L. Hjort, and G. Claeskens. “Frequentist Model Average Estimators.” J. Am. Stat. Assoc. 98 (2003): 879–899. [Google Scholar]
- Y. Yang. “Adaptive Regression by Mixing.” J. Am. Stat. Assoc. 96 (2001): 574–588. [Google Scholar]
- G. Leung, and A.R. Barron. “Information Theory and Mixing Least-Squares Regressions.” IEEE T. Inform. Theory 52 (2006): 3396–3410. [Google Scholar]
- A. Goldenshluger. “A Universal Procedure for Aggregating Estimators.” Ann. Stat. 37 (2009): 542–568. [Google Scholar]
- B.E. Hansen. “Least Squares Model Averaging.” Econometrica 75 (2007): 1175–1189. [Google Scholar]
- A.T.K. Wan, X. Zhang, and G. Zou. “Least Squares Model Averaging by Mallows Criterion.” J. Econometrics 156 (2010): 277–283. [Google Scholar]
- B.E. Hansen, and J.S. Racine. “Jackknife Model Averaging.” J. Econometrics 167 (2012): 38–46. [Google Scholar]
- Q. Liu, and R. Okui. “Heteroskedasticity-Robust Cp Model Averaging.” Economet. J., 2013. Forthcoming. [Google Scholar]
- F.J. DiTraglia. “Using Invalid Instruments on Purpose: Focused Moment Selection and Averaging for GMM.” Unpublished Manuscript. University of Pennsylvania, 2012. [Google Scholar]
- C.A. Liu. “A Plug-In Averaging Estimator for Regressions with Heteroskedastic Errors.” Unpublished Manuscript. National University of Singapore, 2012. [Google Scholar]
- L.F. Martins, and V.J. Gabriel. “Linear Instrumental Variables Model Averaging Estimation.” Comput. Stat. Data An., 2013. Forthcoming. [Google Scholar]
- W.K. Newey. “Generalized Method of Moments Specification Testing.” J. Econometrics 29 (1985): 229–256. [Google Scholar]
- A.R. Hall. “Hypothesis Testing in Models Estimated by Generalized Method of Moments.” In Generalized Method of Moments Estimation. Edited by L. Mátyás. Cambridge University Press, 1999, pp. 75–101. [Google Scholar]
- P.M. Parente, and R.J. Smith. “GEL Methods for Nonsmooth Moment Indicators.” Economet. Theor. 27 (2011): 74–113. [Google Scholar]
- G. Claeskens, and N.L. Hjort. Model Selection and Model Averaging. Cambridge University Press, 2008. [Google Scholar]
- G. Claeskens, and N.L. Hjort. “Minimizing Average Risk in Regression Models.” Economet. Theor. 24 (2008): 493–527. [Google Scholar]
A. Appendix
This appendix provides a proof for Lemma 3.1. In this appendix, symbols, and , denote convergence in probability and in distribution with respect to the local sequence, .
Let and . We define:
Condition 5 implies that is continuous with respect to . Moreover:
Thus, by concavity of , .
Let . Then, by construction:
Also, let . Then, Condition 6 and the saddle-point property imply that:
for and . Let be an open ball of radius, , around . Conditions 1–4 imply:
uniformly over and . Thus, for any , there exists , such that:
where is the probability under . Conditions 1–4 also imply . Therefore, we obtain:
Combining (7) and (8), we have and . Moreover, we have:
uniformly over . Thus, .
Next, we derive the asymptotic distribution. The first-order conditions for are:
where and . By Condition 11 and consistency of the estimator, expanding the first-order conditions around , we obtain:
Let , where is any vector, such that . Then, we have:
and:
by Condition 9. Thus, by the Lindeberg-Feller Theorem and Condition 13:
Furthermore, by Condition 12, we have:
Therefore, by the Cramer-Wold device, we obtain:
which implies the desired result.
- 1.Although is a triangular array, we suppress the additional subscript, n, on y for notational simplicity.
- 2.Simulations were also conducted for different sample sizes. The results are not reported here, because the difference among candidate models is so small for large n that RMSEs are the almost identical for all models.
© 2013 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).